data.table - Emulate the window function of SQL in R -

May 15, 2013

i have table following:

id   name  amount  year 001      10      2010 001      10      2011 001      12      2012 ----------------------- 002      3       2012 002      4       2013 ----------------------- 003  b     20      2011 003  b     20      2012

(note 2 entities have same name a different, id unique identifier.)

i want compute increment in amount on previous year, i.e. result should like:

id   name   increment   year 001       0           2010 001       0           2011 001       2           2012 ---------------------------- 002       0           2012 002       1           2013 ---------------------------- 003  b      0           2011 003  b      0           2012

note increment of first year considered "0".

in mssql, can implemented by:

select id,        name,        amount - lag(amount, 1, amount) on (partition id order year) increment,        year table

i trying accomplish task in r data.table. found succinct example here:
dt[, increment := amount - shift(amount, 1), by=id]. error prompted: could not find function "shift".

the versions are:

r: 3.2.0_1
data.table: 1.9.4

the questions are:

i found shift function implemented on data.table's github, why failed invoke function?
i think by in data.table equivalent partition by in sql, counterpart of order by in r? have set key of data.table before carrying out aggregation data.table ordered?

this case falls under general structure of doing operation on column separate grouping column.

fun <- function(v) c(0, diff(v)) #to take difference , account starting value  #function tapply() df1 <- df df1$amount <- unlist(with(df, by(amount, id, fun))) df1    id name amount year 1 001         0 2010 2 001         0 2011 3 001         2 2012 4 002         0 2012 5 002         1 2013 6 003    b      0 2011 7 003    b      0 2012  #using data.table df2 <- df setdt(df2)[, list(name, increment = fun(amount), year), = id]     id name increment year 1: 001            0 2010 2: 001            0 2011 3: 001            2 2012 4: 002            0 2012 5: 002            1 2013 6: 003    b         0 2011 7: 003    b         0 2012  #function: by() df3 <- df df3$amount <- unlist(with(df3, by(amount, id, fun))) df3    id name amount year 1 001         0 2010 2 001         0 2011 3 001         2 2012 4 002         0 2012 5 002         1 2013 6 003    b      0 2011 7 003    b      0 2012  #using dplyr data.table dt %>%   group_by(id) %>%   summarise(name, increment = fun(amount), year) source: local data table [7 x 4]     id name increment year 1 001            0 2010 2 001            0 2011 3 001            2 2012 4 002            0 2012 5 002            1 2013 6 003    b         0 2011 7 003    b         0 2012  #using aggregate df5$amount <- unlist(aggregate(amount ~ id, data=df5, fun=fun)$amount) df5    id name amount year 1 001         0 2010 2 001         0 2011 3 001         2 2012 4 002         0 2012 5 002         1 2013 6 003    b      0 2011 7 003    b      0 2012  #function: ave df6 <- df df6$amount <- with(df, ave(amount, id, fun-fun)) df6    id name amount year 1 001         0 2010 2 001         0 2011 3 001         2 2012 4 002         0 2012 5 002         1 2013 6 003    b      0 2011 7 003    b      0 2012  #dplyr (non-data.table) df7 <- df df %>%   group_by(id) %>%   mutate(increment = fun(amount))    id name amount year increment 1 001        10 2010         0 2 001        10 2011         0 3 001        12 2012         2 4 002         3 2012         0 5 002         4 2013         1 6 003    b     20 2011         0 7 003    b     20 2012         0  #dplyr (with command 'select' give desired output of op) df %>%    group_by(id) %>%      mutate(increment = fun(amount)) %>%        select(id, name, increment, year) source: local data frame [7 x 4] groups: id     id name increment year 1 001            0 2010 2 001            0 2011 3 001            2 2012 4 002            0 2012 5 002            1 2013 6 003    b         0 2011 7 003    b         0 2012

data

df <- data.frame(id=factor(c('001', '001', '001', '002', '002', '003', '003')),                   name=c(rep('a', 5), rep('b', 2)),                  amount=c(10,10,12,3,4,20,20),                  year=c(2010, 2011, 2012, 2012, 2013, 2011, 2012) )

Search This Blog

Script

data.table - Emulate the window function of SQL in R -

data

Comments

Post a Comment

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

spring cloud - How to configure SpringCloud Eureka instance to point to https on non standard port -