r - Why does summarise on grouped data result in only overall summary in dplyr? -
suppose have following data:
dfx <- data.frame( group = c(rep('a', 8), rep('b', 15), rep('c', 6)), sex = sample(c("m", "f"), size = 29, replace = true), age = runif(n = 29, min = 18, max = 54) )
with old plyr
can create little table summarizing data following code:
require(plyr) ddply(dfx, .(group, sex), summarize, mean = round(mean(age), 2), sd = round(sd(age), 2))
the output this:
group sex mean sd 1 f 49.68 5.68 2 m 32.21 6.27 3 b f 31.87 9.80 4 b m 37.54 9.73 5 c f 40.61 15.21 6 c m 36.33 11.33
i'm trying move code dplyr
, %>%
operator. code takes df group group , sex , summarise it. is:
dfx %>% group_by(group, sex) %>% summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
but output is:
mean sd 1 35.56 9.92
what doing wrong?
thanks!
the problem here loading dplyr first , plyr, plyr's function summarise
masking dplyr's function summarise
. when happens warning:
require(plyr) loading required package: plyr ------------------------------------------------------------------------------------------ have loaded plyr after dplyr - cause problems. if need functions both plyr , dplyr, please load plyr first, dplyr: library(plyr); library(dplyr) ------------------------------------------------------------------------------------------ attaching package: ‘plyr’ following objects masked ‘package:dplyr’: arrange, desc, failwith, id, mutate, summarise, summarize
so in order code work, either detach plyr detach(package:plyr)
or restart r , load plyr first , dplyr (or load dplyr):
library(dplyr) dfx %>% group_by(group, sex) %>% summarise(mean = round(mean(age), 2), sd = round(sd(age), 2)) source: local data frame [6 x 4] groups: group group sex mean sd 1 f 41.51 8.24 2 m 32.23 11.85 3 b f 38.79 11.93 4 b m 31.00 7.92 5 c f 24.97 7.46 6 c m 36.17 9.11
or can explicitly call dplyr's summarise in code, right function called no matter how load packages:
dfx %>% group_by(group, sex) %>% dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Comments
Post a Comment