plot - Scatterplot of Year-On-Year Correlation of Data in R using ggplot2 -
i have yearly football data test see if team metrics repeatable in next year. data in data.frame , looks this:
y2003 y2004 y2005 team 1 51.95455 51.00000 53.59091 team 2 54.18182 56.31818 49.09091 team 3 48.68182 46.86364 49.22727 team 4 50.86364 47.68182 48.72727
what want able scatterplot "year n" on x-axis , "year n+1" on y-axis. example 2003 vs. 2004, 2004 vs. 2005, 2005 vs. 2006 etc. on same plot.
i able draw line of best fit see how strong correlation is, whether repeatable or not.
what best way in r ggplot2? can initial plot with:
p=ggplot(df,aes(y2003,y2004)) p + geom_point()
then have add them manually? there inbuilt function sort of thing? , if add them one-by-one how best fit?
you want data frame row each team-year combination, containing data year , next year team name. can without split-apply-combine manipulation using base r functions:
(to.plot <- data.frame(yearn=unlist(df[-ncol(df)]), yearnp1=unlist(df[-1]), team=rep(row.names(df), ncol(df)-1))) # yearn yearnp1 team # y20031 51.95455 51.00000 team1 # y20032 54.18182 56.31818 team2 # y20033 48.68182 46.86364 team3 # y20034 50.86364 47.68182 team4 # y20041 51.00000 53.59091 team1 # y20042 56.31818 49.09091 team2 # y20043 46.86364 49.22727 team3 # y20044 47.68182 48.72727 team4
basically code converts last column of df
vector (using unlist
), storing them in variable yearn
. next year can obtained grabbing first column of df
vector. finally, team name can obtained repeated sequence of row names of df
.
getting line of best fit simple linear regression model:
(coefs <- coef(lm(yearnp1~yearn, data=to.plot))) # (intercept) yearn # 28.3611927 0.4308978
now ggplot
can used usual plotting:
library(ggplot2) ggplot(to.plot, aes(x=yearn, y=yearnp1, col=team)) + geom_point() + geom_abline(intercept=coefs[1], slope=coefs[2])
Comments
Post a Comment