How to split a column in a dataframe into multiple columns in R? -
i've dataframe 1 varaible supposedly containing data of multiple columns, such id, year, months etc in 1 single column.
this weather data internet. it's badly organized. has no column name, , data pushed 1 column.
each row says characters 1-11 represent id, , next 4 characters represent year, , next few characters represent climatic measure jan month, , flags (blanks or chatacters), , february month , again few flags (blanks or chatacters), , march , flags (blanks or chatacters).. , on. detailed structure of data explained here
ush00011084 1974 1628 1606 1363 1039 1343 903 2536 839 2048 358 1118 754 ush00011084 1975 1714 1837 1544f 2828 1758 1898 4848 2110 2217 1197 1512 1445 ush00011084 1976 825 541 989 600 2502 1448 971 1157 704 899 1340 856a ush00011084 1977 1319 528 2665 473 285 1590 2337 3733 961 434 1259 981 ush00011084 1978 2722 1023 1574 1214 2919 2136 1548 988 875 46 917 1379 ush00011084 1979 1927 2671 1285 1063 966 1160 2282 1120 979 292 1470 812 ush00011084 1980 1639 368 3799 2005 1423 1826. 1 917 423 1449 1353 1039 287 ush00011084 1981 38b 2846 1170 127 1334 995 2022 1343 467 413 513 1909 ush00011084 1982 1631 3097 910 1127 879 1416 2103 1482 1060 551 863 1702 ush00011084 1983 1207 2210 2604 1925 820 1714 662 1235 1204 394 1145 2219
now, i'm trying organize data dataframe separating data respective columns. tried multiple things failed badly.
require(reshape2)
colsplit(only_req_col, " ", c("id", "year","jan", "j1", "feb", "f1", "mar", "m1", "apr", "a1", "may", "my1","jun","jn1", "jul", "jl1", "aug","a1","sep", "s1","oct", "o1","nov", "n1", "dec","d1" ))
didn't work!
require(tidyr)
separate(data = data_2, col = v1, = c("id", "year","jan", "j1", "feb", "f1", "mar", "m1", "apr", "a1", "may", "my1","jun","jn1", "jul", "jl1", "aug","a1","sep", "s1","oct", "o1","nov", "n1", "dec","d1" ), sep = "")
didn't work!
require(reshape2)
data_3 <- colsplit(gsub(pattern = "[0-9]"," ",data_2), names= c("id", "year","jan", "j1", "feb", "f1", "mar", "m1", "apr", "a1", "may", "my1","jun","jn1", "jul", "jl1", "aug","a1","sep", "s1","oct", "o1","nov", "n1", "dec","d1" ))
didn't work!
i keep monthly data each year , id, , ignore flags.
i achieved using substr
each month's position, involved lot of manual counting correct positions of months:
id <- substr(data_3[,1], 1,11) year= substr(data_3[,1], 13,16) jan = substr(data_3[,1], 18,22) feb = substr(data_3[,1], 27,31) mar = substr(data_3[,1], 36,40) apr = substr(data_3[,1], 45,49) may = substr(data_3[,1], 53,57) jun = substr(data_3[,1], 63,67) jul = substr(data_3[,1], 72,76) aug = substr(data_3[,1], 81,85) sep = substr(data_3[,1], 90,95) oct = substr(data_3[,1], 99,103) nov = substr(data_3[,1], 108,112) dec = substr(data_3[,1], 117,121) prcp_data <- data.frame(id, year, jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec)
now output looks fine:
> prcp_data id year jan feb mar apr may jun jul aug sep oct nov dec 1 ush00011084 1890 -9999 -9999 -9999 -9999 -999 -9999 -9999 -9999 -9999 -9999 -9999 432 2 ush00011084 1891 1397 1425 1461 419 69 1702 1080 437 508 0 2362 1333 3 ush00011084 1892 3162 1118 650 406 96 1981 3114 3442 762 254 419 -9999 4 ush00011084 1893 1359 1544 1181 965 185 876 1080 1638 876 1613 1237 1237 5 ush00011084 1894 610 4188 2002 572 118 673 -9999 -9999 -9999 76 -9999 191 6 ush00011084 1895 -9999 381 1016 1016 101 1016 762 -9999 -9999 508 -9999 762 7 ush00011084 1896 1118 3404 1499 -9999 81 2794 2375 470 356 1270 864 572 8 ush00011084 1897 622 3124 1207 1105 12 64 1867 2489 0 0 -9999 -9999 9 ush00011084 1900 -9999 -9999 1857 1788 57 3292 1989 993 1552 1646 488 1542 10 ush00011084 1926 -9999 1404 2619 905 127 1723 2149 2950 3477 884 823 900
any better solutions achieve way did?
Comments
Post a Comment