November 18, 2014

Information

You can forward your questions to
Ann Greenwood (ann.greenwood at popdata.bc.ca) or Vincenza Gruppuso (vincenza at uvic.ca).
Q&A will begin immediately after the presentation.

You can follow the presentation and review previous lectures at ialsa.github.io/COAG-colloquium-2014F/

Overview of the Series

  • Oct 14 – Intro to Reproducible Research
  • Oct 21 – RR Basic Skills (1): Data Manipulation
  • Oct 28 – Intro to Latent Class and Latent Transition Models
  • Nov 4 – RR Basic Skills (2): Graph Production
  • Nov 18 – RR Basic Skills (3): Statistical Modeling
  • Nov 25 – RR Basic Skills (4): Dynamic Reporting
  • Dec 2 – Migrating into R from other Statistical Software

Previously:

"./Scripts/Data/dsL.R"

download the files to work along at GitHub

Load Data

# loads basic NLSY97-religiosity data as defined in COAG-Colloquium-2014F repository
dsL <- readRDS("./Data/Derived/dsL.rds")
str(dsL)
'data.frame':   107772 obs. of  12 variables:
 $ id     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ sex    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ race   : int  4 4 4 4 4 4 4 4 4 4 ...
 $ bmonth : int  9 9 9 9 9 9 9 9 9 9 ...
 $ byear  : int  1981 1981 1981 1981 1981 1981 1981 1981 1981 1981 ...
 $ year   : int  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
 $ attend : int  1 6 2 1 1 1 1 1 1 1 ...
 $ age    : int  19 20 21 22 23 24 25 26 27 28 ...
 $ sexF   : Ord.factor w/ 3 levels "Male"<"Female"<..: 2 2 2 2 2 2 2 2 2 2 ...
 $ raceF  : Ord.factor w/ 4 levels "Black"<"Hispanic"<..: 4 4 4 4 4 4 4 4 4 4 ...
 $ bmonthF: Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 9 9 9 9 9 9 9 9 9 9 ...
 $ attendF: Ord.factor w/ 8 levels "Never"<"Once or Twice"<..: 1 6 2 1 1 1 1 1 1 1 ...

Focal outcome

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::select(id, year, attend, attendF)
dsM
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never



How often did you attend a worhsip service during the last year?

  attendLevels         attendLabels
1            8             Everyday
2            7   Several times/week
3            6      About once/week
4            5    About twice/month
5            4     About once/month
6            3 Less than once/month
7            2        Once or Twice
8            1                Never



Map data to graphics

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::select(id, year, attend, attendF)
dsM
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

plot of chunk graphBasic

Press (P): Note | Next: final graph from last time

Graph development

dsM <- dplyr::filter(dsL, id <=300, 
                     raceF != "Mixed (Non-H)") %>% 
  dplyr::select(id,sexF,raceF,year,attend,attendF) %>%
  dplyr::mutate(yearc = year - 2000)
#  
p <- ggplot2::ggplot(dsM,aes(x=yearc,y=attend))
p <- p + geom_line(aes(group=id), color='firebrick',
           alpha=.2,
           position=position_jitter(w=0.3, h=0.3))
p <- p + geom_point(shape=21, color=NA, fill="blue4",
           alpha=.4, size=1, 
           position=position_jitter(w=0.3, h=0.3))
p <- p + plotTheme
p <- p + scale_x_continuous(limits=c(0,11),
                            breaks=c(0:11))
p <- p + scale_y_continuous(limits=c(0,8), 
                            breaks=seq(1,8, by=1))
p <- p + labs(list(
 title="How often did you attend worship last year?",
 x="Years since 2000", y="Church attendance"))
p <- p + facet_grid(sexF~raceF)
p
plot of chunk graphFullPrevious

Press (P): Zoom | Next: Today's agenda

Today: Modeling

What is a model?

  • simplifications of a complex reality
  • "mechanism" for reproducing data
  • operationalization of substantive theory

What does modeling involve?

  • generating data points
  • comparing observed and modeled data
  • describing properties and attribues of a model
  • comparing and contrasting models
(Rodgers, 2010)



Theoretical model of change

  • Shape of change
  • Scale of change
  • Periodicity

Temporal design

  • Timing
  • Frequency
  • Spacing

Statistical model of change

  • Operationalization of theoretical model of change
(Collins, 2006)

Press (P): Citation
Next: Prelude to modeling

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1
plot of chunk graph01 Representation of information:
- tabular
- graphical
-

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1
plot of chunk graph01 Representation of information:
- tabular
- graphical
- algebraic \[{y_{1t}}\]

Press (P): notation, databox | Next: different person

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 2) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   2    0      2
2   2    1      2
3   2    2      1
4   2    3      1
5   2    4      2
6   2    5      2
7   2    6     NA
8   2    7     NA
9   2    8      3
10  2    9      1
11  2   10      2
12  2   11      2
plot of chunk graph02 Representation of information:
- tabular
- graphical
- algebraic \[{y_{2t}}\]

Press (P): notation, databox | Next: different person

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 3) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   3    0      3
2   3    1      2
3   3    2      2
4   3    3      2
5   3    4      1
6   3    5     NA
7   3    6     NA
8   3    7     NA
9   3    8     NA
10  3    9      6
11  3   10     NA
12  3   11      2
plot of chunk graph03 Representation of information:
- tabular
- graphical
- algebraic \[{y_{3t}}\]

Press (P): notation, databox | Next: different person

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5
plot of chunk graph04 Representation of information:
- tabular
- graphical
- algebraic \[{y_{4t}}\]

Press (P): notation, databox | Next: time dimension

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5
plot of chunk graph08 Representation of information:
- tabular
- graphical
- algebraic \[{y_{43}}\]

Press (P): notation, databox | Next: time dimension

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1
plot of chunk graph11 Representation of information:
- tabular
- graphical
- algebraic \[{y_{12}}\]

Press (P): notation, databox | Next: extend person dimension

Prelude to modeling

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dplyr::filter(dsM,row_number()<15)
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1
13  4    0      2
14  4    1      1
plot of chunk graph04a Representation of information:
- tabular
- graphical
- algebraic \[{y_{it}}\]

Press (P): notation, databox | Next: Model in EDA

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

plot of chunk graph01 Model in EDA (Tukey, 1977):

data  =  MODEL  +  error  
data  =  fit    +  residual    
data  =  smooth +  rough    
\[{y_{1t}}\]

Press (P): citation | Next: recreate data patterns

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

plot of chunk graph01 \({y_{1 t}} =\) ?

What should \({y}\) be for person \({_1}\) at times \({_t}\)?

data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM$model <- 4
dsM
   id time attend model
1   1    0      1     4
2   1    1      6     4
3   1    2      2     4
4   1    3      1     4
5   1    4      1     4
6   1    5      1     4
7   1    6      1     4
8   1    7      1     4
9   1    8      1     4
10  1    9      1     4
11  1   10      1     4
12  1   11      1     4


Press (P): Error plot of chunk graph12

\({y_{1 t}} = 4\)

\({y}\) should be 4 for person \({_1}\) at all times \({_t}\).

data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 4 - dsM$time
dsM
   id time attend model
1   1    0      1     4
2   1    1      6     3
3   1    2      2     2
4   1    3      1     1
5   1    4      1     0
6   1    5      1    -1
7   1    6      1    -2
8   1    7      1    -3
9   1    8      1    -4
10  1    9      1    -5
11  1   10      1    -6
12  1   11      1    -7


plot of chunk graph13

\({y_{1 t}} = 4 - time\)

\({y}\) should be 4 for person \({_1}\) at time \({_t=0}\), and decline with each occasion by the value of the time variable.

data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 4 - (.2)*dsM$time
dsM
   id time attend model
1   1    0      1   4.0
2   1    1      6   3.8
3   1    2      2   3.6
4   1    3      1   3.4
5   1    4      1   3.2
6   1    5      1   3.0
7   1    6      1   2.8
8   1    7      1   2.6
9   1    8      1   2.4
10  1    9      1   2.2
11  1   10      1   2.0
12  1   11      1   1.8


plot of chunk graph14

\({y_{1t}} = 4 - 0.2 \times time\)

\({y}\) should be 4 for person \({_1}\) at time \({_t=0}\), and decline with each occasion by the value of the time variable multiplied by a constant 0.2.

data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 2) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 1.8 + (.05)*dsM$time
dsM
   id time attend model
1   2    0      2  1.80
2   2    1      2  1.85
3   2    2      1  1.90
4   2    3      1  1.95
5   2    4      2  2.00
6   2    5      2  2.05
7   2    6     NA  2.10
8   2    7     NA  2.15
9   2    8      3  2.20
10  2    9      1  2.25
11  2   10      2  2.30
12  2   11      2  2.35


plot of chunk graph15

\({y_{2t}} = 1.8 + 0.05 \times time\)

\({y}\) should be 1.8 for person \({_2}\) at time \({_t=0}\), and increase with each occasion by the value of the time variable multiplied by a constant 0.05.

data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 3) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 2 + (.17)*dsM$time
dsM
   id time attend model
1   3    0      3  2.00
2   3    1      2  2.17
3   3    2      2  2.34
4   3    3      2  2.51
5   3    4      1  2.68
6   3    5     NA  2.85
7   3    6     NA  3.02
8   3    7     NA  3.19
9   3    8     NA  3.36
10  3    9      6  3.53
11  3   10     NA  3.70
12  3   11      2  3.87


plot of chunk graph16

\({y_{3t}} = 2 + 0.17 \times time\)

\({y}\) should be 2 for person \({_3}\) at time \({_t=0}\), and increase with each occasion by the value of the time variable multiplied by a constant 0.17.

data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 1.5 + (.22)*dsM$time
dsM
   id time attend model
1   4    0      2  1.50
2   4    1      1  1.72
3   4    2      3  1.94
4   4    3      1  2.16
5   4    4      2  2.38
6   4    5      2  2.60
7   4    6      2  2.82
8   4    7      2  3.04
9   4    8      2  3.26
10  4    9      1  3.48
11  4   10      2  3.70
12  4   11      5  3.92


plot of chunk graph17

\({y_{4t}} = 1.5 + 0.22 \times time\)

\({y}\) should be 1.5 for person \({_4}\) at time \({_t=0}\), and increase with each occasion by the value of the time variable multiplied by a constant 0.22.

data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id <= 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
# 
dplyr::filter(dsM,id==1)
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1


plot of chunk graph18a

\({y_{it}} =\) ?
data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id <= 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 3 - (.14)*dsM$time
dplyr::filter(dsM,id==1)
   id time attend model
1   1    0      1  3.00
2   1    1      6  2.86
3   1    2      2  2.72
4   1    3      1  2.58
5   1    4      1  2.44
6   1    5      1  2.30
7   1    6      1  2.16
8   1    7      1  2.02
9   1    8      1  1.88
10  1    9      1  1.74
11  1   10      1  1.60
12  1   11      1  1.46


plot of chunk graph18

\({y_{it}} = 3 - 0.14 \times time\)
data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
#
dplyr::filter(dsM,id==1)
   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1


plot of chunk graph19a

\({y_{it}} =\) ?
data  =  MODEL  +  error  
How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 3.5 - (.25)*dsM$time
dplyr::filter(dsM,id==1)
   id time attend model
1   1    0      1  3.50
2   1    1      6  3.25
3   1    2      2  3.00
4   1    3      1  2.75
5   1    4      1  2.50
6   1    5      1  2.25
7   1    6      1  2.00
8   1    7      1  1.75
9   1    8      1  1.50
10  1    9      1  1.25
11  1   10      1  1.00
12  1   11      1  0.75


plot of chunk graph19

\({y_{it}} = 3.5 - 0.25 \times time\)
data  =  MODEL  +  error  
How can data be simplified?

Next: Model estimation

Modeling: Finding solution

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
model <- nlme::gls(attend ~ 1, data=dsM)
dsM$model <- predict(model)
dplyr::filter(dsM,id==1)
   id time attend model
1   1    0      1 2.477
2   1    1      6 2.477
3   1    2      2 2.477
4   1    3      1 2.477
5   1    4      1 2.477
6   1    5      1 2.477
7   1    6      1 2.477
8   1    7      1 2.477
9   1    8      1 2.477
10  1    9      1 2.477
11  1   10      1 2.477
12  1   11      1 2.477

plot of chunk graph20

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)
data  =  MODEL  +  error  

Press (P): model summary

Modeling: Finding solution

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
model <- nlme::gls(attend ~ 1 + time, data=dsM)
dsM$model <- predict(model)
dplyr::filter(dsM,id==1)
   id time attend model
1   1    0      1 2.788
2   1    1      6 2.732
3   1    2      2 2.675
4   1    3      1 2.618
5   1    4      1 2.562
6   1    5      1 2.505
7   1    6      1 2.449
8   1    7      1 2.392
9   1    8      1 2.335
10  1    9      1 2.279
11  1   10      1 2.222
12  1   11      1 2.166

plot of chunk graph21

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)
data  =  MODEL  +  error  

Press (P): model summary

Which model is "better"?

  • what criteria used?
  • not used?
  • preferences in fit/parsimony
  • complex judgments

Challenges:

  1. Collecting results
    • What pieces of information do we need?
  2. Organizing comparisons
    • How do we overcome information overload?











Model A: \({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

plot of chunk graph20

Model B: \({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Collecting results: post-processing

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
modelA <- nlme::gls(attend ~ 1       , data=dsM)
modelB <- nlme::gls(attend ~ 1 + time, data=dsM)
dsM$modelA <- predict(modelA)
dsM$modelB <- predict(modelB)
dplyr::filter(dsM,id==1)
   id time attend modelA modelB
1   1    0      1  2.477  2.788
2   1    1      6  2.477  2.732
3   1    2      2  2.477  2.675
4   1    3      1  2.477  2.618
5   1    4      1  2.477  2.562
6   1    5      1  2.477  2.505
7   1    6      1  2.477  2.449
8   1    7      1  2.477  2.392
9   1    8      1  2.477  2.335
10  1    9      1  2.477  2.279
11  1   10      1  2.477  2.222
12  1   11      1  2.477  2.166
modelA
Generalized least squares fit by REML
  Model: attend ~ 1 
  Data: dsM 
  Log-restricted-likelihood: -3727

Coefficients:
(Intercept) 
      2.477 

Degrees of freedom: 1860 total; 1859 residual
Residual standard error: 1.793 

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

plot of chunk graph20

Collecting results: post-processing

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
modelA <- nlme::gls(attend ~ 1       , data=dsM)
modelB <- nlme::gls(attend ~ 1 + time, data=dsM)
dsM$modelA <- predict(modelA)
dsM$modelB <- predict(modelB)
dplyr::filter(dsM,id==1)
   id time attend modelA modelB
1   1    0      1  2.477  2.788
2   1    1      6  2.477  2.732
3   1    2      2  2.477  2.675
4   1    3      1  2.477  2.618
5   1    4      1  2.477  2.562
6   1    5      1  2.477  2.505
7   1    6      1  2.477  2.449
8   1    7      1  2.477  2.392
9   1    8      1  2.477  2.335
10  1    9      1  2.477  2.279
11  1   10      1  2.477  2.222
12  1   11      1  2.477  2.166
modelB
Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782 

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Collecting results: post-processing

model <- modelB 
logLik<- summary(model)$logLik
deviance<- -2*logLik
AIC<- AIC(model)
BIC<- BIC(model)
df.resid<- NA
N<- summary(model)$dims$N
p<- summary(model)$dims$p
ids<- length(unique(dsM$id))
df.resid<- N-p
mInfo<- data.frame("logLik" = logLik, 
                   "deviance"= deviance, 
                   "AIC" = AIC, "BIC" = BIC,
                   "df.resid" = df.resid, "N" = N, 
                   "p" = p, "ids" = ids)
t<- t(mInfo)
rownames(t)<-colnames(mInfo)
dsmInfo<- data.frame(new=t)
colnames(dsmInfo) <- c("modelB")
mB <- dsmInfo







modelB
Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782 

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Collecting results: post-processing

print(mB)
         modelB
logLik    -3719
deviance   7438
AIC        7444
BIC        7461
df.resid   1858
N          1860
p             2
ids         155











modelB
Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782 

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Collecting results: post-processing

print(mA)
         modelA
logLik    -3727
deviance   7453
AIC        7457
BIC        7468
df.resid   1859
N          1860
p             1
ids         155











modelA
Generalized least squares fit by REML
  Model: attend ~ 1 
  Data: dsM 
  Log-restricted-likelihood: -3727

Coefficients:
(Intercept) 
      2.477 

Degrees of freedom: 1860 total; 1859 residual
Residual standard error: 1.793 

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

plot of chunk graph20

Organizing comparisons

models <- data.frame(cbind(mA,mB))
models <- dplyr::mutate(models, 
  dif = round(modelB - modelA,2), 
Index = rownames(dsmInfo))
dplyr::select(models, Index, modelA, modelB, dif)
     Index modelA modelB    dif
1   logLik  -3727  -3719   7.61
2 deviance   7453   7438 -15.21
3      AIC   7457   7444 -13.21
4      BIC   7468   7461  -7.69
5 df.resid   1859   1858  -1.00
6        N   1860   1860   0.00
7        p      1      2   1.00
8      ids    155    155   0.00

Organizing comparisons: model manifestations

There are many ways to simplify patterns in the data.

Each has its own advantages and costs.

Model manifestations

Model manifestations

Model manifestations

Model manifestations

Model manifestations

Model manifestations

Model manifestations

Next time: Dynamic Reporting

  • Oct 14 – Intro to Reproducible Research
  • Oct 21 – RR Basic Skills (1): Data Manipulation
  • Oct 28 – Intro to Latent Class and Latent Transition Models
  • Nov 4 – RR Basic Skills (2): Graph Production
  • Nov 18 – RR Basic Skills (3): Statistical Modeling
  • Nov 25 – RR Basic Skills (4): Dynamic Reporting
  • Dec 2 – Migrating into R from other Statistical Software

Next time: model sequences

1. How to represent each model

2. How to construct sequence


example

Question? Comments?