October 31, 2014

Overview of the Series

  • Oct 14 – Intro to Reproducible Research
  • Oct 21 – RR Basic Skills (1): Data Manipulation
  • Oct 28 – Intro to Latent Class and Latent Transition Models
  • Nov 4 – RR Basic Skills (2): Graph Production
  • Nov 11
  • Nov 18 – RR Basic Skills (3): Statistical Modeling
  • Nov 25 – RR Basic Skills (4): Dynamic Reporting
  • Dec 2 – Migrating into R from other Statistical Software

Previously:

Tidy data

  1. Each variable forms a column
  2. Each observation forms a row
  3. Each type of observational unit forms a table

See Hadley Wickham's paper on tidy data

Previously:

"./Scripts/Data/dsL.R"

download the files to work along at GitHub

Previously:

Previously:

"./Scripts/Data/dsL.R"

imported the raw data files

myExtract <- "./Data/Extract/NLSY97_Attend_20141021/NLSY97_Attend_20141021"
pathSourceData <- paste0(myExtract,".csv") 
SourceData <- read.csv(pathSourceData,header=TRUE, skip=0,sep=",")
ds0 <- SourceData

cleaned, transformed , and at the end, exported it.

pathdsLrds <- "./Data/Derived/dsL.rds"
saveRDS(object=dsL, file=pathdsLrds, compress="xz")

we pick up right were we left off, by executing the script ./Scripts/Data/dsL.R

Load Data

source("./Scripts/Data/dsL.R")
str(dsL)
head(dsL)

Load Data

dplyr::filter(dsL, id==1)
   id sex race bmonth byear year attend age   sexF       raceF bmonthF         attendF
1   1   2    4      9  1981 2000      1  19 Female Non-B/Non-H     Sep           Never
2   1   2    4      9  1981 2001      6  20 Female Non-B/Non-H     Sep About once/week
3   1   2    4      9  1981 2002      2  21 Female Non-B/Non-H     Sep   Once or Twice
4   1   2    4      9  1981 2003      1  22 Female Non-B/Non-H     Sep           Never
5   1   2    4      9  1981 2004      1  23 Female Non-B/Non-H     Sep           Never
6   1   2    4      9  1981 2005      1  24 Female Non-B/Non-H     Sep           Never
7   1   2    4      9  1981 2006      1  25 Female Non-B/Non-H     Sep           Never
8   1   2    4      9  1981 2007      1  26 Female Non-B/Non-H     Sep           Never
9   1   2    4      9  1981 2008      1  27 Female Non-B/Non-H     Sep           Never
10  1   2    4      9  1981 2009      1  28 Female Non-B/Non-H     Sep           Never
11  1   2    4      9  1981 2010      1  29 Female Non-B/Non-H     Sep           Never
12  1   2    4      9  1981 2011      1  30 Female Non-B/Non-H     Sep           Never

Load Data

dplyr::filter(dsL, id==1) %>% select(id, year, attend, attendF)
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

The anatomy of a ggplot

Preview

plot of chunk graphFinalpreview

Canvas

dsM <- dsL
p <- ggplot2::ggplot(dsM,aes(x=year,y=attend))
p
Error: No layers in plot

Geom: point

dsM <- dplyr::filter(dsL, id==1)
p <- ggplot2::ggplot(dsM,aes(x=year,y=attend))
p <- p + geom_point()
p

plot of chunk graph01

dplyr::filter(dsL, id==1) %>% select(id,year, attend, attendF)
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Geom: line

dsM <- dplyr::filter(dsL, id==1)
p <- ggplot2::ggplot(dsM,aes(x=year,y=attend))
p <- p + geom_line()
p

plot of chunk graph02

dplyr::filter(dsL, id==1) %>% select(id,year, attend, attendF)
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Geom: line + point

dsM <- dplyr::filter(dsL, id==1)
p <- ggplot2::ggplot(dsM,aes(x=year,y=attend))
p <- p + geom_line()
p <- p + geom_point()
p

plot of chunk graph03

dplyr::filter(dsL, id==1) %>% select(id,year, attend, attendF)
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Define theme

baseSize <- 12
plotTheme <- ggplot2::theme_bw() +
  ggplot2::theme_bw(base_size=baseSize)+
  ggplot2::theme(title=ggplot2::element_text(colour="gray20",size = 12)) +
  ggplot2::theme(axis.text=ggplot2::element_text(colour="gray40"))+
  ggplot2::theme(axis.title=ggplot2::element_text(colour="gray40"))+
  ggplot2::theme(panel.border = ggplot2::element_rect(colour="gray80"))+
  ggplot2::theme(axis.ticks.length = grid::unit(0, "cm"))

Add theme

dsM <- dplyr::filter(dsL, id==1)
p <- ggplot2::ggplot(dsM,aes(x=year,y=attend))
p <- p + geom_line()
p <- p + geom_point()
p <- p + plotTheme
p

plot of chunk graph04

dplyr::filter(dsL, id==1) %>% select(id,year, attend, attendF)
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Scales: X-Axis

dsM <- dplyr::filter(dsL, id==1)
p <- ggplot2::ggplot(dsM,aes(x=year,y=attend))
p <- p + geom_line()
p <- p + geom_point()
p <- p + scale_x_continuous("Year", limits=c(2000,2011), breaks=2000:2011)
p <- p + plotTheme
p

Scales: X-Axis

plot of chunk graph05

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Scales: y-Axis

plot of chunk graph06

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Add title

plot of chunk graph07

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Multiple Lines

plot of chunk graph08

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Multiple Lines with grouping

plot of chunk graph09

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Edit geoms

plot of chunk graph10

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Facet

plot of chunk graph11

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never