r - Impute variables within a data.frame group by factor column -


i have data.frame contain numeric columns, these columns have factor levels want impute missing values by...let me explain.

part   id   value      1     23.4      2     23.8      3     45.6      4     34.7      5     na b      1     45.2 b      2     34.6 b      3     na b      4     30.9 b      5     28.1 

id impute na values mean of part. part a, i'd impute id 5 missing value mean of ids 1-4 in part a, , same part b, impute missing id3 mean of ids in part b etc.

i need across many columns (imagine having many more value columns). perhaps apply function etc.

using na.strings argument in read.table/read.csv can convert missing values real na , thereby reading 'value' columns 'numeric'. dplyr, can change replace nas in multiple value columns mean of column.

library(dplyr) df1 %>%     group_by(part) %>%     mutate_each(funs(replace(., which(is.na(.)), mean(., na.rm=true))),         starts_with('value')) 

or similar option data.table

library(data.table) nm1 <- grep('value', names(df1)) setdt(df1)[, (nm1) := lapply(.sd,  function(x) replace(x,      which(is.na(x)), mean(x, na.rm=true))), = part,.sdcols=nm1] 

data

df1 <- read.table(text="part   id   value      1     23.4      2     23.8      3     45.6      4     34.7      5     na b      1     45.2 b      2     34.6 b      3     na b      4     30.9 b      5     28.1", header=true, na.strings="na", stringsasfactors=false) 

Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -