November 18, 2014

Overview of the Series

  • Oct 14 – Intro to Reproducible Research
  • Oct 21 – RR Basic Skills (1): Data Manipulation
  • Oct 28 – Intro to Latent Class and Latent Transition Models
  • Nov 4 – RR Basic Skills (2): Graph Production
  • Nov 18 – RR Basic Skills (3): Statistical Modeling
  • Nov 25 – RR Basic Skills (4): Dynamic Reporting
  • Dec 2 – Migrating into R from other Statistical Software

Previously:

"./Scripts/Data/dsL.R"

download the files to work along at GitHub

Load Data

# loads basic NLSY97-religiosity data as defined in COAG-Colloquium-2014F repository
dsL <- readRDS("./Data/Derived/dsL.rds")
str(dsL)
'data.frame':   107772 obs. of  12 variables:
 $ id     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ sex    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ race   : int  4 4 4 4 4 4 4 4 4 4 ...
 $ bmonth : int  9 9 9 9 9 9 9 9 9 9 ...
 $ byear  : int  1981 1981 1981 1981 1981 1981 1981 1981 1981 1981 ...
 $ year   : int  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
 $ attend : int  1 6 2 1 1 1 1 1 1 1 ...
 $ age    : int  19 20 21 22 23 24 25 26 27 28 ...
 $ sexF   : Ord.factor w/ 3 levels "Male"<"Female"<..: 2 2 2 2 2 2 2 2 2 2 ...
 $ raceF  : Ord.factor w/ 4 levels "Black"<"Hispanic"<..: 4 4 4 4 4 4 4 4 4 4 ...
 $ bmonthF: Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 9 9 9 9 9 9 9 9 9 9 ...
 $ attendF: Ord.factor w/ 8 levels "Never"<"Once or Twice"<..: 1 6 2 1 1 1 1 1 1 1 ...

Load graphical theme

baseSize <- 12
plotTheme <- ggplot2::theme_bw() +
  ggplot2::theme_bw(base_size=baseSize)+
  ggplot2::theme(title=ggplot2::element_text(colour="gray20",size = baseSize + 3)) +
  ggplot2::theme(axis.text=ggplot2::element_text(colour="gray40", size= baseSize - 2))+
  ggplot2::theme(axis.title.x=ggplot2::element_text(colour="gray40", size = baseSize + 2, vjust=-.3))+
  ggplot2::theme(axis.title.y=ggplot2::element_text(colour="gray40", size = baseSize + 2, vjust=1.3))+
  ggplot2::theme(panel.border = ggplot2::element_rect(colour="gray80"))+
  ggplot2::theme(axis.ticks.length = grid::unit(0, "cm"))

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::select(id, year, attend, attendF)
dsM
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

plot of chunk graphBasic

Press(P): Note
Next: final graph from last time

Variables and Aesthetics

dsM <- dplyr::filter(dsL, id <=300, 
                     raceF != "Mixed (Non-H)") %>% 
  dplyr::select(id,sexF,raceF,year,attend,attendF) %>%
  dplyr::mutate(yearc = year - 2000)
#  
p <- ggplot2::ggplot(dsM,aes(x=yearc,y=attend))
p <- p + geom_line(aes(group=id), color='firebrick',
           alpha=.2,
           position=position_jitter(w=0.3, h=0.3))
p <- p + geom_point(shape=21, color=NA, fill="blue4",
           alpha=.4, size=1, 
           position=position_jitter(w=0.3, h=0.3))
p <- p + plotTheme
p <- p + scale_x_continuous(limits=c(0,11),
                            breaks=c(0:11))
p <- p + scale_y_continuous(limits=c(0,8), 
                            breaks=seq(1,8, by=1))
p <- p + labs(list(
 title="How often did you attend worship last year?",
 x="Years since 2000", y="Church attendance"))
p <- p + facet_grid(sexF~raceF)
p
plot of chunk graphFullPrevious
Press (P): Zoom
Next: What is modeling

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1


plot of chunk graph01 \({y_{it}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1


plot of chunk graph01
\({y_{1 t}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 2) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   2    0      2
2   2    1      2
3   2    2      1
4   2    3      1
5   2    4      2
6   2    5      2
7   2    6     NA
8   2    7     NA
9   2    8      3
10  2    9      1
11  2   10      2
12  2   11      2


plot of chunk graph02
\({y_{2 t}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 3) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   3    0      3
2   3    1      2
3   3    2      2
4   3    3      2
5   3    4      1
6   3    5     NA
7   3    6     NA
8   3    7     NA
9   3    8     NA
10  3    9      6
11  3   10     NA
12  3   11      2


plot of chunk graph03 \({y_{3 t}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5


plot of chunk graph04 \({y_{4 t}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5


plot of chunk graph05 \({y_{4 \cdot 0}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5


plot of chunk graph06 \({y_{4 \cdot 1}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5


plot of chunk graph07 \({y_{4 \cdot 2}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5


plot of chunk graph08 \({y_{4 \cdot 3}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 3) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   3    0      3
2   3    1      2
3   3    2      2
4   3    3      2
5   3    4      1
6   3    5     NA
7   3    6     NA
8   3    7     NA
9   3    8     NA
10  3    9      6
11  3   10     NA
12  3   11      2


plot of chunk graph09 \({y_{3 \cdot 3}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 2) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   2    0      2
2   2    1      2
3   2    2      1
4   2    3      1
5   2    4      2
6   2    5      2
7   2    6     NA
8   2    7     NA
9   2    8      3
10  2    9      1
11  2   10      2
12  2   11      2


plot of chunk graph10 \({y_{2 \cdot 2}}\)




Press(P): Note
Next:

Mapping data to graphics

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1


plot of chunk graph11 \({y_{1 \cdot 1}}\)




Press(P): Note
Next:

Modeling

What is a model?

  • simplifications of a complex reality (Rodgers, 2010)
  • "mechanism" for reproducing data

What does modeling involve?

  • generating data points
  • comparing observed and modeled data
  • describing properties and attribues of a model
  • comparing and contrasting models

Press(P): Citation
Next:

Modeling: Longitudinal

Linda Collins (2006) conceptualized three components involved in longitudinal modeling:

Theoretical model of change

  • describes the nature of change
    • shape
    • periodicity
    • scale

Temporal design

  • Timing
  • Frequency
  • Spacing

Statistical model of change

  • operationalization of theoretical model of change

Press(P): Citation
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM$model <- 4
dsM
   id time attend model
1   1    0      1     4
2   1    1      6     4
3   1    2      2     4
4   1    3      1     4
5   1    4      1     4
6   1    5      1     4
7   1    6      1     4
8   1    7      1     4
9   1    8      1     4
10  1    9      1     4
11  1   10      1     4
12  1   11      1     4


plot of chunk graph12 \({y_{1 t}} = model\)

\({y_{1 t}} = 4\)




Press(P): Note
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 4 - dsM$time
dsM
   id time attend model
1   1    0      1     4
2   1    1      6     3
3   1    2      2     2
4   1    3      1     1
5   1    4      1     0
6   1    5      1    -1
7   1    6      1    -2
8   1    7      1    -3
9   1    8      1    -4
10  1    9      1    -5
11  1   10      1    -6
12  1   11      1    -7


plot of chunk graph13 \({y_{1 t}} = model\)

\({y_{1 t}} = 4 -time\)




Press(P): Note
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 4 - (.2)*dsM$time
dsM
   id time attend model
1   1    0      1   4.0
2   1    1      6   3.8
3   1    2      2   3.6
4   1    3      1   3.4
5   1    4      1   3.2
6   1    5      1   3.0
7   1    6      1   2.8
8   1    7      1   2.6
9   1    8      1   2.4
10  1    9      1   2.2
11  1   10      1   2.0
12  1   11      1   1.8


plot of chunk graph14 \({y_{1 t}} = model\)

\({y_{1t}} = 4 - 0.2 \times time\)




Press(P): Note
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id == 2) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 1.8 + (.05)*dsM$time
dsM
   id time attend model
1   2    0      2  1.80
2   2    1      2  1.85
3   2    2      1  1.90
4   2    3      1  1.95
5   2    4      2  2.00
6   2    5      2  2.05
7   2    6     NA  2.10
8   2    7     NA  2.15
9   2    8      3  2.20
10  2    9      1  2.25
11  2   10      2  2.30
12  2   11      2  2.35


plot of chunk graph15 \({y_{2 t}} = model\)

\({y_{2t}} = 1.8 - 0.05 \times time\)




Press(P): Note
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id == 3) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 2 + (.17)*dsM$time
dsM
   id time attend model
1   3    0      3  2.00
2   3    1      2  2.17
3   3    2      2  2.34
4   3    3      2  2.51
5   3    4      1  2.68
6   3    5     NA  2.85
7   3    6     NA  3.02
8   3    7     NA  3.19
9   3    8     NA  3.36
10  3    9      6  3.53
11  3   10     NA  3.70
12  3   11      2  3.87


plot of chunk graph16 \({y_{3 t}} = model\)

\({y_{3t}} = 2 - 0.17 \times time\)




Press(P): Note
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 1.5 + (.22)*dsM$time
dsM
   id time attend model
1   4    0      2  1.50
2   4    1      1  1.72
3   4    2      3  1.94
4   4    3      1  2.16
5   4    4      2  2.38
6   4    5      2  2.60
7   4    6      2  2.82
8   4    7      2  3.04
9   4    8      2  3.26
10  4    9      1  3.48
11  4   10      2  3.70
12  4   11      5  3.92


plot of chunk graph17 \({y_{4 t}} = model\)

\({y_{4t}} = 1.5 - 0.22 \times time\)




Press(P): Note
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id <= 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 3 - (.14)*dsM$time
dplyr::filter(dsM,id==1)
   id time attend model
1   1    0      1  3.00
2   1    1      6  2.86
3   1    2      2  2.72
4   1    3      1  2.58
5   1    4      1  2.44
6   1    5      1  2.30
7   1    6      1  2.16
8   1    7      1  2.02
9   1    8      1  1.88
10  1    9      1  1.74
11  1   10      1  1.60
12  1   11      1  1.46


plot of chunk graph18 \({y_{it}} = model\)

\({y_{it}} = 3 - 0.14 \times time\)




Press(P): Note
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 3 - (.14)*dsM$time
dplyr::filter(dsM,id==1)
   id time attend model
1   1    0      1  3.00
2   1    1      6  2.86
3   1    2      2  2.72
4   1    3      1  2.58
5   1    4      1  2.44
6   1    5      1  2.30
7   1    6      1  2.16
8   1    7      1  2.02
9   1    8      1  1.88
10  1    9      1  1.74
11  1   10      1  1.60
12  1   11      1  1.46


plot of chunk graph19 \({y_{it}} = model\)

\({y_{it}} = 3 - 0.14 \times time\)




Press (P): Zoom
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- predict (lm(attend ~ 1, data=dsM))
dplyr::filter(dsM,id==1)
   id time attend model
1   1    0      1 2.477
2   1    1      6 2.477
3   1    2      2 2.477
4   1    3      1 2.477
5   1    4      1 2.477
6   1    5      1 2.477
7   1    6      1 2.477
8   1    7      1 2.477
9   1    8      1 2.477
10  1    9      1 2.477
11  1   10      1 2.477
12  1   11      1 2.477


plot of chunk graph20

length(unique(dsM$id))
[1] 155

\({y_{it}} = model\)

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

Press (P): Zoom
Next:

Modeling: recreating patterns

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- predict (lm(attend ~ 1 + time, data=dsM))
dplyr::filter(dsM,id==1)
   id time attend model
1   1    0      1 2.788
2   1    1      6 2.732
3   1    2      2 2.675
4   1    3      1 2.618
5   1    4      1 2.562
6   1    5      1 2.505
7   1    6      1 2.449
8   1    7      1 2.392
9   1    8      1 2.335
10  1    9      1 2.279
11  1   10      1 2.222
12  1   11      1 2.166


plot of chunk graph21

length(unique(dsM$id))
[1] 155

\({y_{it}} = model\)

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

Press (P): Zoom
Next:

Modeling: post processing

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
modelA <- lm(attend ~ 1, data=dsM)
modelB <- lm(attend ~ 1 + time, data=dsM)
dplyr::filter(dsM,id==1)
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

Press (P): Zoom
Next:

Modeling: post processing

summary(modelA)
Call:
lm(formula = attend ~ 1, data = dsM)

Residuals:
   Min     1Q Median     3Q    Max 
-1.477 -1.477 -0.477  0.523  5.523 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.4769     0.0416    59.6   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.79 on 1859 degrees of freedom
#
#
summary(modelB)
Call:
lm(formula = attend ~ 1 + time, data = dsM)

Residuals:
   Min     1Q Median     3Q    Max 
-1.788 -1.392 -0.562  0.721  5.834 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.7882     0.0777   35.86  < 2e-16 ***
time         -0.0566     0.0120   -4.73  2.4e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.78 on 1858 degrees of freedom
Multiple R-squared:  0.0119,    Adjusted R-squared:  0.0114 
F-statistic: 22.3 on 1 and 1858 DF,  p-value: 2.45e-06

Press (P): Zoom
Next:

Modeling: post processing

deviance(modelA)
[1] 5974
BIC(modelA)
[1] 7464
AIC(modelA)
[1] 7453
#
#


deviance(modelB)
[1] 5903
BIC(modelB)
[1] 7449
AIC(modelB)
[1] 7433

Press (P): Zoom
Next:

Modeling: manifestations

  1. Tabular
  2. Algebraic
  3. Syntactic
  4. Numeric
  5. Graphical
  6. Schematic
  7. Semantic

Question? Comments?