data.table - Emulate the window function of SQL in R -
i have table following:
id name amount year 001 10 2010 001 10 2011 001 12 2012 ----------------------- 002 3 2012 002 4 2013 ----------------------- 003 b 20 2011 003 b 20 2012 (note 2 entities have same name a different, id unique identifier.)
i want compute increment in amount on previous year, i.e. result should like:
id name increment year 001 0 2010 001 0 2011 001 2 2012 ---------------------------- 002 0 2012 002 1 2013 ---------------------------- 003 b 0 2011 003 b 0 2012 note increment of first year considered "0".
in mssql, can implemented by:
select id, name, amount - lag(amount, 1, amount) on (partition id order year) increment, year table i trying accomplish task in r data.table. found succinct example here:
dt[, increment := amount - shift(amount, 1), by=id]. error prompted: could not find function "shift".
the versions are:
- r: 3.2.0_1
- data.table: 1.9.4
the questions are:
- i found
shiftfunction implemented on data.table's github, why failed invoke function? - i think
byin data.table equivalentpartition byin sql, counterpart oforder byin r? have set key of data.table before carrying out aggregation data.table ordered?
this case falls under general structure of doing operation on column separate grouping column.
fun <- function(v) c(0, diff(v)) #to take difference , account starting value #function tapply() df1 <- df df1$amount <- unlist(with(df, by(amount, id, fun))) df1 id name amount year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #using data.table df2 <- df setdt(df2)[, list(name, increment = fun(amount), year), = id] id name increment year 1: 001 0 2010 2: 001 0 2011 3: 001 2 2012 4: 002 0 2012 5: 002 1 2013 6: 003 b 0 2011 7: 003 b 0 2012 #function: by() df3 <- df df3$amount <- unlist(with(df3, by(amount, id, fun))) df3 id name amount year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #using dplyr data.table dt %>% group_by(id) %>% summarise(name, increment = fun(amount), year) source: local data table [7 x 4] id name increment year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #using aggregate df5$amount <- unlist(aggregate(amount ~ id, data=df5, fun=fun)$amount) df5 id name amount year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #function: ave df6 <- df df6$amount <- with(df, ave(amount, id, fun-fun)) df6 id name amount year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #dplyr (non-data.table) df7 <- df df %>% group_by(id) %>% mutate(increment = fun(amount)) id name amount year increment 1 001 10 2010 0 2 001 10 2011 0 3 001 12 2012 2 4 002 3 2012 0 5 002 4 2013 1 6 003 b 20 2011 0 7 003 b 20 2012 0 #dplyr (with command 'select' give desired output of op) df %>% group_by(id) %>% mutate(increment = fun(amount)) %>% select(id, name, increment, year) source: local data frame [7 x 4] groups: id id name increment year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 data
df <- data.frame(id=factor(c('001', '001', '001', '002', '002', '003', '003')), name=c(rep('a', 5), rep('b', 2)), amount=c(10,10,12,3,4,20,20), year=c(2010, 2011, 2012, 2012, 2013, 2011, 2012) )
Comments
Post a Comment