r - Modifying terminal node in ctree(), partykit package -
i have dependent variable classify decision tree. it's composed 3 categories of frequences: 738 (19%), 426 (15%) , 1800 (66%). imagine predicted category third one, purpose of tree descriptive not matter. thing is, when plotting tree ctree()
function (package partykit
) terminal nodes display histograms showing probability of occurrence of 3 classes. need modify output: obtain proportions of occurrence of each class within terminal node respect class' absolute frequency. example, percentage of 738 participants in class1 belongs terminal node? each terminal node display values 3 classes compose dependent variable.
bellow plot of tree, default reports prevalence of each class within terminal nodes.
you can define own panel function draw goes each terminal panel window. if know little bit grid
graphics , @ how current terminal panel functions defined see how works.
one panel function ought want node_terminal()
in partykit
package (the improved re-implementation of old party
package). however, because ctree()
not store predictions in each terminal node, node_terminal()
function cannot out of box @ moment. i'll try improve implementation in future versions can facilitated. below involved example should want, hope.
first, fit classification tree using iris
data (for simple reproducible example):
library("partykit") (ct <- ctree(species ~ ., data = iris)) ## model formula: ## species ~ sepal.length + sepal.width + petal.length + petal.width ## ## fitted party: ## [1] root ## | [2] petal.length <= 1.9: setosa (n = 50, err = 0.0%) ## | [3] petal.length > 1.9 ## | | [4] petal.width <= 1.7 ## | | | [5] petal.length <= 4.8: versicolor (n = 46, err = 2.2%) ## | | | [6] petal.length > 4.8: versicolor (n = 8, err = 50.0%) ## | | [7] petal.width > 1.7: virginica (n = 46, err = 2.2%) ## ## number of inner nodes: 3 ## number of terminal nodes: 4
then compute table of predicted probabilities each terminal node:
(pred <- aggregate(predict(ct, type = "prob"), list(predict(ct, type = "node")), fun = mean)) ## group.1 setosa versicolor virginica ## 1 2 1 0.00000000 0.00000000 ## 2 5 0 0.97826087 0.02173913 ## 3 6 0 0.50000000 0.50000000 ## 4 7 0 0.02173913 0.97826087
then comes not obvious part: want include these predicted probabilities in terminal nodes of tree itself. this, coerce recursive node structure flat list, insert predictions (suitably formatted), , convert list node structure:
ct_node <- as.list(ct$node) for(i in 1:nrow(pred)) { ct_node[[pred[i,1]]]$info$prediction <- paste( format(names(pred)[-1]), format(round(pred[i, -1], digits = 3), nsmall = 3) ) } ct$node <- as.partynode(ct_node)
then, can draw picture of tree node_terminal
panel function , inserting our pre-formatted predictions:
plot(ct, terminal_panel = node_terminal, tp_args = list( fun = function(node) c("predictions", node$prediction)))
edit: coercing , forth between list
, party
implemented in package...i forgot ;-) if do
st <- as.simpleparty(ct)
then resulting party
has in each node more detailed information predictions etc. example, $distribution
contains absolute frequencies each response level. can formatted before
pred <- function(i) { tab <- i$distribution tab <- round(prop.table(tab), 3) tab <- paste0(names(tab), ":", format(tab, nsmall = 3)) c("predictions", tab) }
and can passed node_terminal
create plot above. might want change drop = false
drop = true
if want terminal nodes displayed in bottom row.
plot(st, terminal_panel = node_terminal, tp_args = list(fun = pred))
Comments
Post a Comment