r - Modifying terminal node in ctree(), partykit package -


i have dependent variable classify decision tree. it's composed 3 categories of frequences: 738 (19%), 426 (15%) , 1800 (66%). imagine predicted category third one, purpose of tree descriptive not matter. thing is, when plotting tree ctree() function (package partykit) terminal nodes display histograms showing probability of occurrence of 3 classes. need modify output: obtain proportions of occurrence of each class within terminal node respect class' absolute frequency. example, percentage of 738 participants in class1 belongs terminal node? each terminal node display values 3 classes compose dependent variable.

bellow plot of tree, default reports prevalence of each class within terminal nodes.

you can define own panel function draw goes each terminal panel window. if know little bit grid graphics , @ how current terminal panel functions defined see how works.

one panel function ought want node_terminal() in partykit package (the improved re-implementation of old party package). however, because ctree() not store predictions in each terminal node, node_terminal() function cannot out of box @ moment. i'll try improve implementation in future versions can facilitated. below involved example should want, hope.

first, fit classification tree using iris data (for simple reproducible example):

library("partykit") (ct <- ctree(species ~ ., data = iris)) ## model formula: ## species ~ sepal.length + sepal.width + petal.length + petal.width ##  ## fitted party: ## [1] root ## |   [2] petal.length <= 1.9: setosa (n = 50, err = 0.0%) ## |   [3] petal.length > 1.9 ## |   |   [4] petal.width <= 1.7 ## |   |   |   [5] petal.length <= 4.8: versicolor (n = 46, err = 2.2%) ## |   |   |   [6] petal.length > 4.8: versicolor (n = 8, err = 50.0%) ## |   |   [7] petal.width > 1.7: virginica (n = 46, err = 2.2%) ##  ## number of inner nodes:    3 ## number of terminal nodes: 4 

then compute table of predicted probabilities each terminal node:

(pred <- aggregate(predict(ct, type = "prob"),   list(predict(ct, type = "node")), fun = mean)) ##   group.1 setosa versicolor  virginica ## 1       2      1 0.00000000 0.00000000 ## 2       5      0 0.97826087 0.02173913 ## 3       6      0 0.50000000 0.50000000 ## 4       7      0 0.02173913 0.97826087 

then comes not obvious part: want include these predicted probabilities in terminal nodes of tree itself. this, coerce recursive node structure flat list, insert predictions (suitably formatted), , convert list node structure:

ct_node <- as.list(ct$node) for(i in 1:nrow(pred)) {   ct_node[[pred[i,1]]]$info$prediction <- paste(     format(names(pred)[-1]),     format(round(pred[i, -1], digits = 3), nsmall = 3)   ) } ct$node <- as.partynode(ct_node) 

then, can draw picture of tree node_terminal panel function , inserting our pre-formatted predictions:

plot(ct, terminal_panel = node_terminal, tp_args = list(   fun = function(node) c("predictions", node$prediction))) 

custom tree

edit: coercing , forth between list , party implemented in package...i forgot ;-) if do

st <- as.simpleparty(ct) 

then resulting party has in each node more detailed information predictions etc. example, $distribution contains absolute frequencies each response level. can formatted before

pred <- function(i) {   tab <- i$distribution   tab <- round(prop.table(tab), 3)   tab <- paste0(names(tab), ":", format(tab, nsmall = 3))   c("predictions", tab) } 

and can passed node_terminal create plot above. might want change drop = false drop = true if want terminal nodes displayed in bottom row.

plot(st, terminal_panel = node_terminal, tp_args = list(fun = pred)) 

Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -