r - Reading specific data from large dataset based on criteria to avoid reading entire file into memory -
software: r studio
version: 0.98.1102
operating system: windows 7 professional
issue #1: have .txt file 100mb+. has 4 variables , on 500,000 observations each variable.
issue #2: assuming column1 column dates factors. possible change class of column1 class of date using colclasses argument of read.csv()?
if read file via:
mydata <- read.csv("myfile", sep = ";", na.strings = "?", stringsasfactors = false)
issue #1
file loads indefinitely on computer due size of file.
the file has format
column1 column2 column3
dog bird apple
cat dove orange
rat sparrow kiwi
may bird apple
cat dove orange
rat sparrow kiwi
i'm trying figure out how following:
1. read rows of data set column 1 has "dog"
2. read rows of data set column 1 has dog , column2 has bird
things have been trying far 1. read can load entire data , subset avoid that. reason file large load initially. instead, load specific data based on criteria
issue #2
assuming column1 in form of 05/01/2015 had class of "factor". possible change class of column 1 class "date" using colclasses argument of read.csv? perhaps this?
mydata <- read.csv("myfile", sep = ";", na.strings = "?", stringsasfactors = false, colclasses = c(column1 =as.date(column1))
or perhaps this
mydata <- read.csv("myfile", sep = ";", na.strings = "?", stringsasfactors = false, colclasses = c(column1 =strptime(column1 %mm%dd%yy))
you can read data chunks, 1000 line @ time , subset them.
temp <- read.csv('file.csv', nrows=1000, stringsasfactors=false)
but using loop not idea in r. so, i'd prefer using sqldf
library(sqldf) power <- read.csv.sql("file.csv", sql = "select * file codition ", header = true)
see more options on how in question how read lines fulfil condition csv r
Comments
Post a Comment