r - Find value in previous and next year -
i have dataframe timeseries observations. each observation add variable value @ closest similar date in previous year , closest similar date in next year (e.g. value of 15 may 2014, might 13 may 2013 , 21 may 2015). there smart way, e.g. using dplyr, this? please find example code below (most code focused on creating random set of dates , value, earlier question). many in advance.
date value nearest_val_nextyear nearest_val_prevyear 1 2009-02-14 6.511781 0 0 2 2009-12-23 5.389843 0 0 3 2011-08-01 4.378759 0 0 4 2014-04-07 2.785300 0 0 5 2008-08-12 6.124931 0 0 6 2014-03-10 4.955066 0 0 7 2014-07-23 4.983810 0 0 8 2012-04-14 5.943836 0 0 9 2012-01-13 5.821221 0 0 10 2007-06-30 5.593901 0 0 11 2008-08-24 5.918977 0 0 12 2008-05-30 5.782136 0 0 13 2012-06-30 5.074565 0 0 14 2010-01-27 3.010648 0 0 15 2013-02-27 5.619826 0 0 16 2010-12-25 4.943871 0 0 17 2012-09-27 4.844204 0 0 18 2014-12-08 3.529248 0 0 19 2010-01-15 4.521850 0 0 20 2013-03-21 5.417942 0 0 # set start , end dates sample between day.start <- "2007/01/01" day.end <- "2014/12/31" set.seed(1) # define random date/time selection function rand.day.time <- function(day.start,day.end,size) { dayseq <- seq.date(as.date(day.start),as.date(day.end),by="day") dayselect <- sample(dayseq,size,replace=true) as.posixlt(paste(dayselect) ) } dateval=rand.day.time(day.start,day.end,size=20) value=rnorm(n=20,mean=5,sd=1) df=data.frame(date=dateval,value=value) df$nearest_val_nextyear=0 df$nearest_val_prevyear=0 df
this not smart way of doing this, i'm posting hopes someone, maybe you, can make smart/pretty.
library(dplyr) library(lubridate) dat <- data.frame(dateval, value) dat <- dat %>% mutate(year = year(dateval), nv_next = na, nv_prev = na) #you don't need dplyr this... shifts <- c(1, -1) #nextyear, prevyear (s in 1:2) { #once each shift (i in 1:nrow(dat)) { otheryear <- dat[dat[,"year"]==dat[i,"year"]+shifts[s],] #subset df dates of other year if (nrow(otheryear) == 0) { #ends if there's no other year dat[i,3+s] <- na } else { cands <- otheryear$dateval #candidates have value chosen cands_shifted <- cands year(cands_shifted) <- dat[i,"year"] #change year in cand's copy nearest_date <- which.min(abs(difftime(dat[i,"dateval"], cands_shifted))) #after years same, closest date can calculated difftime dat[i,3+s] <- dat[dat$dateval == cands[nearest_date],"value"] #we check on cands real date was, , assign it's value } } } this resulted in
> dat dateval value year nv_next nv_prev 1 2009-02-14 6.511781 2009 3.010648 5.782136 2 2009-12-23 5.389843 2009 4.943871 5.918977 3 2011-08-01 4.378759 2011 5.074565 4.943871 4 2014-04-07 2.785300 2014 na 5.417942 5 2008-08-12 6.124931 2008 5.389843 5.593901 6 2014-03-10 4.955066 2014 na 5.619826 7 2014-07-23 4.983810 2014 na 5.417942 8 2012-04-14 5.943836 2012 5.417942 4.378759 9 2012-01-13 5.821221 2012 5.619826 4.378759 10 2007-06-30 5.593901 2007 5.782136 na 11 2008-08-24 5.918977 2008 5.389843 5.593901 12 2008-05-30 5.782136 2008 6.511781 5.593901 13 2012-06-30 5.074565 2012 5.417942 4.378759 14 2010-01-27 3.010648 2010 4.378759 6.511781 15 2013-02-27 5.619826 2013 4.955066 5.821221 16 2010-12-25 4.943871 2010 4.378759 5.389843 17 2012-09-27 4.844204 2012 5.417942 4.378759 18 2014-12-08 3.529248 2014 na 5.417942 19 2010-01-15 4.521850 2010 4.378759 6.511781 20 2013-03-21 5.417942 2013 4.955066 5.943836 i nested loops instead of using copy each shift, must careful nv_next , nv_prev since selected index , not name.
Comments
Post a Comment