data.table - Emulate the window function of SQL in R -
i have table following:
id name amount year 001 10 2010 001 10 2011 001 12 2012 ----------------------- 002 3 2012 002 4 2013 ----------------------- 003 b 20 2011 003 b 20 2012
(note 2 entities have same name a
different, id
unique identifier.)
i want compute increment in amount
on previous year, i.e. result should like:
id name increment year 001 0 2010 001 0 2011 001 2 2012 ---------------------------- 002 0 2012 002 1 2013 ---------------------------- 003 b 0 2011 003 b 0 2012
note increment of first year considered "0".
in mssql, can implemented by:
select id, name, amount - lag(amount, 1, amount) on (partition id order year) increment, year table
i trying accomplish task in r data.table. found succinct example here:
dt[, increment := amount - shift(amount, 1), by=id]
. error prompted: could not find function "shift"
.
the versions are:
- r: 3.2.0_1
- data.table: 1.9.4
the questions are:
- i found
shift
function implemented on data.table's github, why failed invoke function? - i think
by
in data.table equivalentpartition by
in sql, counterpart oforder by
in r? have set key of data.table before carrying out aggregation data.table ordered?
this case falls under general structure of doing operation on column separate grouping column.
fun <- function(v) c(0, diff(v)) #to take difference , account starting value #function tapply() df1 <- df df1$amount <- unlist(with(df, by(amount, id, fun))) df1 id name amount year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #using data.table df2 <- df setdt(df2)[, list(name, increment = fun(amount), year), = id] id name increment year 1: 001 0 2010 2: 001 0 2011 3: 001 2 2012 4: 002 0 2012 5: 002 1 2013 6: 003 b 0 2011 7: 003 b 0 2012 #function: by() df3 <- df df3$amount <- unlist(with(df3, by(amount, id, fun))) df3 id name amount year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #using dplyr data.table dt %>% group_by(id) %>% summarise(name, increment = fun(amount), year) source: local data table [7 x 4] id name increment year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #using aggregate df5$amount <- unlist(aggregate(amount ~ id, data=df5, fun=fun)$amount) df5 id name amount year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #function: ave df6 <- df df6$amount <- with(df, ave(amount, id, fun-fun)) df6 id name amount year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012 #dplyr (non-data.table) df7 <- df df %>% group_by(id) %>% mutate(increment = fun(amount)) id name amount year increment 1 001 10 2010 0 2 001 10 2011 0 3 001 12 2012 2 4 002 3 2012 0 5 002 4 2013 1 6 003 b 20 2011 0 7 003 b 20 2012 0 #dplyr (with command 'select' give desired output of op) df %>% group_by(id) %>% mutate(increment = fun(amount)) %>% select(id, name, increment, year) source: local data frame [7 x 4] groups: id id name increment year 1 001 0 2010 2 001 0 2011 3 001 2 2012 4 002 0 2012 5 002 1 2013 6 003 b 0 2011 7 003 b 0 2012
data
df <- data.frame(id=factor(c('001', '001', '001', '002', '002', '003', '003')), name=c(rep('a', 5), rep('b', 2)), amount=c(10,10,12,3,4,20,20), year=c(2010, 2011, 2012, 2012, 2013, 2011, 2012) )
Comments
Post a Comment