r - Impute variables within a data.frame group by factor column -
i have data.frame contain numeric columns, these columns have factor levels want impute missing values by...let me explain.
part id value 1 23.4 2 23.8 3 45.6 4 34.7 5 na b 1 45.2 b 2 34.6 b 3 na b 4 30.9 b 5 28.1
id impute na values mean of part. part a, i'd impute id 5 missing value mean of ids 1-4 in part a, , same part b, impute missing id3 mean of ids in part b etc.
i need across many columns (imagine having many more value columns). perhaps apply function etc.
using na.strings
argument in read.table/read.csv
can convert missing values real na
, thereby reading 'value' columns 'numeric'. dplyr
, can change replace
nas
in multiple value columns mean
of column.
library(dplyr) df1 %>% group_by(part) %>% mutate_each(funs(replace(., which(is.na(.)), mean(., na.rm=true))), starts_with('value'))
or similar option data.table
library(data.table) nm1 <- grep('value', names(df1)) setdt(df1)[, (nm1) := lapply(.sd, function(x) replace(x, which(is.na(x)), mean(x, na.rm=true))), = part,.sdcols=nm1]
data
df1 <- read.table(text="part id value 1 23.4 2 23.8 3 45.6 4 34.7 5 na b 1 45.2 b 2 34.6 b 3 na b 4 30.9 b 5 28.1", header=true, na.strings="na", stringsasfactors=false)
Comments
Post a Comment