data.table - Emulate the window function of SQL in R -


i have table following:

id   name  amount  year 001      10      2010 001      10      2011 001      12      2012 ----------------------- 002      3       2012 002      4       2013 ----------------------- 003  b     20      2011 003  b     20      2012 

(note 2 entities have same name a different, id unique identifier.)

i want compute increment in amount on previous year, i.e. result should like:

id   name   increment   year 001       0           2010 001       0           2011 001       2           2012 ---------------------------- 002       0           2012 002       1           2013 ---------------------------- 003  b      0           2011 003  b      0           2012 

note increment of first year considered "0".


in mssql, can implemented by:

select id,        name,        amount - lag(amount, 1, amount) on (partition id order year) increment,        year table 

i trying accomplish task in r data.table. found succinct example here:
dt[, increment := amount - shift(amount, 1), by=id]. error prompted: could not find function "shift".

the versions are:

  • r: 3.2.0_1
  • data.table: 1.9.4

the questions are:

  1. i found shift function implemented on data.table's github, why failed invoke function?
  2. i think by in data.table equivalent partition by in sql, counterpart of order by in r? have set key of data.table before carrying out aggregation data.table ordered?

this case falls under general structure of doing operation on column separate grouping column.

fun <- function(v) c(0, diff(v)) #to take difference , account starting value  #function tapply() df1 <- df df1$amount <- unlist(with(df, by(amount, id, fun))) df1    id name amount year 1 001         0 2010 2 001         0 2011 3 001         2 2012 4 002         0 2012 5 002         1 2013 6 003    b      0 2011 7 003    b      0 2012  #using data.table df2 <- df setdt(df2)[, list(name, increment = fun(amount), year), = id]     id name increment year 1: 001            0 2010 2: 001            0 2011 3: 001            2 2012 4: 002            0 2012 5: 002            1 2013 6: 003    b         0 2011 7: 003    b         0 2012  #function: by() df3 <- df df3$amount <- unlist(with(df3, by(amount, id, fun))) df3    id name amount year 1 001         0 2010 2 001         0 2011 3 001         2 2012 4 002         0 2012 5 002         1 2013 6 003    b      0 2011 7 003    b      0 2012  #using dplyr data.table dt %>%   group_by(id) %>%   summarise(name, increment = fun(amount), year) source: local data table [7 x 4]     id name increment year 1 001            0 2010 2 001            0 2011 3 001            2 2012 4 002            0 2012 5 002            1 2013 6 003    b         0 2011 7 003    b         0 2012  #using aggregate df5$amount <- unlist(aggregate(amount ~ id, data=df5, fun=fun)$amount) df5    id name amount year 1 001         0 2010 2 001         0 2011 3 001         2 2012 4 002         0 2012 5 002         1 2013 6 003    b      0 2011 7 003    b      0 2012  #function: ave df6 <- df df6$amount <- with(df, ave(amount, id, fun-fun)) df6    id name amount year 1 001         0 2010 2 001         0 2011 3 001         2 2012 4 002         0 2012 5 002         1 2013 6 003    b      0 2011 7 003    b      0 2012  #dplyr (non-data.table) df7 <- df df %>%   group_by(id) %>%   mutate(increment = fun(amount))    id name amount year increment 1 001        10 2010         0 2 001        10 2011         0 3 001        12 2012         2 4 002         3 2012         0 5 002         4 2013         1 6 003    b     20 2011         0 7 003    b     20 2012         0  #dplyr (with command 'select' give desired output of op) df %>%    group_by(id) %>%      mutate(increment = fun(amount)) %>%        select(id, name, increment, year) source: local data frame [7 x 4] groups: id     id name increment year 1 001            0 2010 2 001            0 2011 3 001            2 2012 4 002            0 2012 5 002            1 2013 6 003    b         0 2011 7 003    b         0 2012 

data

df <- data.frame(id=factor(c('001', '001', '001', '002', '002', '003', '003')),                   name=c(rep('a', 5), rep('b', 2)),                  amount=c(10,10,12,3,4,20,20),                  year=c(2010, 2011, 2012, 2012, 2013, 2011, 2012) ) 

Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -