RR Skills (3): Statistical Modeling

November 18, 2014

Information

You can forward your questions to
Ann Greenwood (ann.greenwood at popdata.bc.ca) or Vincenza Gruppuso (vincenza at uvic.ca).
Q&A will begin immediately after the presentation.

You can follow the presentation and review previous lectures at ialsa.github.io/COAG-colloquium-2014F/

Overview of the Series

Oct 14 – Intro to Reproducible Research
Oct 21 – RR Basic Skills (1): Data Manipulation
Oct 28 – Intro to Latent Class and Latent Transition Models
Nov 4 – RR Basic Skills (2): Graph Production
Nov 18 – RR Basic Skills (3): Statistical Modeling
Nov 25 – RR Basic Skills (4): Dynamic Reporting
Dec 2 – Migrating into R from other Statistical Software

Previously:

"./Scripts/Data/dsL.R"

download the files to work along at GitHub

Load Data

# loads basic NLSY97-religiosity data as defined in COAG-Colloquium-2014F repository
dsL <- readRDS("./Data/Derived/dsL.rds")

str(dsL)

'data.frame':   107772 obs. of  12 variables:
 $ id     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ sex    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ race   : int  4 4 4 4 4 4 4 4 4 4 ...
 $ bmonth : int  9 9 9 9 9 9 9 9 9 9 ...
 $ byear  : int  1981 1981 1981 1981 1981 1981 1981 1981 1981 1981 ...
 $ year   : int  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
 $ attend : int  1 6 2 1 1 1 1 1 1 1 ...
 $ age    : int  19 20 21 22 23 24 25 26 27 28 ...
 $ sexF   : Ord.factor w/ 3 levels "Male"<"Female"<..: 2 2 2 2 2 2 2 2 2 2 ...
 $ raceF  : Ord.factor w/ 4 levels "Black"<"Hispanic"<..: 4 4 4 4 4 4 4 4 4 4 ...
 $ bmonthF: Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 9 9 9 9 9 9 9 9 9 9 ...
 $ attendF: Ord.factor w/ 8 levels "Never"<"Once or Twice"<..: 1 6 2 1 1 1 1 1 1 1 ...

Focal outcome

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::select(id, year, attend, attendF)
dsM

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

How often did you attend a worhsip service during the last year?

  attendLevels         attendLabels
1            8             Everyday
2            7   Several times/week
3            6      About once/week
4            5    About twice/month
5            4     About once/month
6            3 Less than once/month
7            2        Once or Twice
8            1                Never

Map data to graphics

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::select(id, year, attend, attendF)
dsM

   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never

Press (P): Note | Next: final graph from last time

Graph development

dsM <- dplyr::filter(dsL, id <=300, 
                     raceF != "Mixed (Non-H)") %>% 
  dplyr::select(id,sexF,raceF,year,attend,attendF) %>%
  dplyr::mutate(yearc = year - 2000)
#  
p <- ggplot2::ggplot(dsM,aes(x=yearc,y=attend))
p <- p + geom_line(aes(group=id), color='firebrick',
           alpha=.2,
           position=position_jitter(w=0.3, h=0.3))
p <- p + geom_point(shape=21, color=NA, fill="blue4",
           alpha=.4, size=1, 
           position=position_jitter(w=0.3, h=0.3))
p <- p + plotTheme
p <- p + scale_x_continuous(limits=c(0,11),
                            breaks=c(0:11))
p <- p + scale_y_continuous(limits=c(0,8), 
                            breaks=seq(1,8, by=1))
p <- p + labs(list(
 title="How often did you attend worship last year?",
 x="Years since 2000", y="Church attendance"))
p <- p + facet_grid(sexF~raceF)
p

Press (P): Zoom | Next: Today's agenda

Today: Modeling

What is a model?

simplifications of a complex reality
"mechanism" for reproducing data
operationalization of substantive theory

What does modeling involve?

generating data points
comparing observed and modeled data
describing properties and attribues of a model
comparing and contrasting models

(Rodgers, 2010)

Theoretical model of change

Shape of change
Scale of change
Periodicity

Temporal design

Timing
Frequency
Spacing

Statistical model of change

Operationalization of theoretical model of change

(Collins, 2006)

Press (P): Citation
Next: Prelude to modeling

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

Representation of information:
- tabular
- graphical
-

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

Representation of information:
- tabular
- graphical
- algebraic \[{y_{1t}}\]

Press (P): notation, databox | Next: different person

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 2) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   2    0      2
2   2    1      2
3   2    2      1
4   2    3      1
5   2    4      2
6   2    5      2
7   2    6     NA
8   2    7     NA
9   2    8      3
10  2    9      1
11  2   10      2
12  2   11      2

Representation of information:
- tabular
- graphical
- algebraic \[{y_{2t}}\]

Press (P): notation, databox | Next: different person

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 3) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   3    0      3
2   3    1      2
3   3    2      2
4   3    3      2
5   3    4      1
6   3    5     NA
7   3    6     NA
8   3    7     NA
9   3    8     NA
10  3    9      6
11  3   10     NA
12  3   11      2

Representation of information:
- tabular
- graphical
- algebraic \[{y_{3t}}\]

Press (P): notation, databox | Next: different person

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5

Representation of information:
- tabular
- graphical
- algebraic \[{y_{4t}}\]

Press (P): notation, databox | Next: time dimension

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   4    0      2
2   4    1      1
3   4    2      3
4   4    3      1
5   4    4      2
6   4    5      2
7   4    6      2
8   4    7      2
9   4    8      2
10  4    9      1
11  4   10      2
12  4   11      5

Representation of information:
- tabular
- graphical
- algebraic \[{y_{43}}\]

Press (P): notation, databox | Next: time dimension

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

Representation of information:
- tabular
- graphical
- algebraic \[{y_{12}}\]

Press (P): notation, databox | Next: extend person dimension

Prelude to modeling

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dplyr::filter(dsM,row_number()<15)

   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1
13  4    0      2
14  4    1      1

Representation of information:
- tabular
- graphical
- algebraic \[{y_{it}}\]

Press (P): notation, databox | Next: Model in EDA

Prelude to modeling

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

plot of chunk graph01 Model in EDA (Tukey, 1977):

data  =  MODEL  +  error  
data  =  fit    +  residual    
data  =  smooth +  rough

\[{y_{1t}}\]

Press (P): citation | Next: recreate data patterns

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM

   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

plot of chunk graph01 \({y_{1 t}} =\) ?

What should \({y}\) be for person \({_1}\) at times \({_t}\)?

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
dsM$model <- 4
dsM

   id time attend model
1   1    0      1     4
2   1    1      6     4
3   1    2      2     4
4   1    3      1     4
5   1    4      1     4
6   1    5      1     4
7   1    6      1     4
8   1    7      1     4
9   1    8      1     4
10  1    9      1     4
11  1   10      1     4
12  1   11      1     4

Press (P): Error plot of chunk graph12

\({y_{1 t}} = 4\)

\({y}\) should be 4 for person \({_1}\) at all times \({_t}\).

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 4 - dsM$time
dsM

   id time attend model
1   1    0      1     4
2   1    1      6     3
3   1    2      2     2
4   1    3      1     1
5   1    4      1     0
6   1    5      1    -1
7   1    6      1    -2
8   1    7      1    -3
9   1    8      1    -4
10  1    9      1    -5
11  1   10      1    -6
12  1   11      1    -7

plot of chunk graph13

\({y_{1 t}} = 4 - time\)

\({y}\) should be 4 for person \({_1}\) at time \({_t=0}\), and decline with each occasion by the value of the time variable.

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 4 - (.2)*dsM$time
dsM

   id time attend model
1   1    0      1   4.0
2   1    1      6   3.8
3   1    2      2   3.6
4   1    3      1   3.4
5   1    4      1   3.2
6   1    5      1   3.0
7   1    6      1   2.8
8   1    7      1   2.6
9   1    8      1   2.4
10  1    9      1   2.2
11  1   10      1   2.0
12  1   11      1   1.8

plot of chunk graph14

\({y_{1t}} = 4 - 0.2 \times time\)

\({y}\) should be 4 for person \({_1}\) at time \({_t=0}\), and decline with each occasion by the value of the time variable multiplied by a constant 0.2.

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 2) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 1.8 + (.05)*dsM$time
dsM

   id time attend model
1   2    0      2  1.80
2   2    1      2  1.85
3   2    2      1  1.90
4   2    3      1  1.95
5   2    4      2  2.00
6   2    5      2  2.05
7   2    6     NA  2.10
8   2    7     NA  2.15
9   2    8      3  2.20
10  2    9      1  2.25
11  2   10      2  2.30
12  2   11      2  2.35

plot of chunk graph15

\({y_{2t}} = 1.8 + 0.05 \times time\)

\({y}\) should be 1.8 for person \({_2}\) at time \({_t=0}\), and increase with each occasion by the value of the time variable multiplied by a constant 0.05.

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 3) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 2 + (.17)*dsM$time
dsM

   id time attend model
1   3    0      3  2.00
2   3    1      2  2.17
3   3    2      2  2.34
4   3    3      2  2.51
5   3    4      1  2.68
6   3    5     NA  2.85
7   3    6     NA  3.02
8   3    7     NA  3.19
9   3    8     NA  3.36
10  3    9      6  3.53
11  3   10     NA  3.70
12  3   11      2  3.87

plot of chunk graph16

\({y_{3t}} = 2 + 0.17 \times time\)

\({y}\) should be 2 for person \({_3}\) at time \({_t=0}\), and increase with each occasion by the value of the time variable multiplied by a constant 0.17.

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id == 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 1.5 + (.22)*dsM$time
dsM

   id time attend model
1   4    0      2  1.50
2   4    1      1  1.72
3   4    2      3  1.94
4   4    3      1  2.16
5   4    4      2  2.38
6   4    5      2  2.60
7   4    6      2  2.82
8   4    7      2  3.04
9   4    8      2  3.26
10  4    9      1  3.48
11  4   10      2  3.70
12  4   11      5  3.92

plot of chunk graph17

\({y_{4t}} = 1.5 + 0.22 \times time\)

\({y}\) should be 1.5 for person \({_4}\) at time \({_t=0}\), and increase with each occasion by the value of the time variable multiplied by a constant 0.22.

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id <= 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
# 
dplyr::filter(dsM,id==1)

   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

plot of chunk graph18a

\({y_{it}} =\) ?

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id <= 4) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 3 - (.14)*dsM$time
dplyr::filter(dsM,id==1)

   id time attend model
1   1    0      1  3.00
2   1    1      6  2.86
3   1    2      2  2.72
4   1    3      1  2.58
5   1    4      1  2.44
6   1    5      1  2.30
7   1    6      1  2.16
8   1    7      1  2.02
9   1    8      1  1.88
10  1    9      1  1.74
11  1   10      1  1.60
12  1   11      1  1.46

plot of chunk graph18

\({y_{it}} = 3 - 0.14 \times time\)

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
#
dplyr::filter(dsM,id==1)

   id time attend
1   1    0      1
2   1    1      6
3   1    2      2
4   1    3      1
5   1    4      1
6   1    5      1
7   1    6      1
8   1    7      1
9   1    8      1
10  1    9      1
11  1   10      1
12  1   11      1

plot of chunk graph19a

\({y_{it}} =\) ?

data  =  MODEL  +  error

How can data be simplified?

Modeling: Recreating patterns

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
dsM$model <- 3.5 - (.25)*dsM$time
dplyr::filter(dsM,id==1)

   id time attend model
1   1    0      1  3.50
2   1    1      6  3.25
3   1    2      2  3.00
4   1    3      1  2.75
5   1    4      1  2.50
6   1    5      1  2.25
7   1    6      1  2.00
8   1    7      1  1.75
9   1    8      1  1.50
10  1    9      1  1.25
11  1   10      1  1.00
12  1   11      1  0.75

plot of chunk graph19

\({y_{it}} = 3.5 - 0.25 \times time\)

data  =  MODEL  +  error

How can data be simplified?

Next: Model estimation

Modeling: Finding solution

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
model <- nlme::gls(attend ~ 1, data=dsM)
dsM$model <- predict(model)
dplyr::filter(dsM,id==1)

   id time attend model
1   1    0      1 2.477
2   1    1      6 2.477
3   1    2      2 2.477
4   1    3      1 2.477
5   1    4      1 2.477
6   1    5      1 2.477
7   1    6      1 2.477
8   1    7      1 2.477
9   1    8      1 2.477
10  1    9      1 2.477
11  1   10      1 2.477
12  1   11      1 2.477

plot of chunk graph20

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

data  =  MODEL  +  error

Press (P): model summary

Modeling: Finding solution

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend) 
model <- nlme::gls(attend ~ 1 + time, data=dsM)
dsM$model <- predict(model)
dplyr::filter(dsM,id==1)

   id time attend model
1   1    0      1 2.788
2   1    1      6 2.732
3   1    2      2 2.675
4   1    3      1 2.618
5   1    4      1 2.562
6   1    5      1 2.505
7   1    6      1 2.449
8   1    7      1 2.392
9   1    8      1 2.335
10  1    9      1 2.279
11  1   10      1 2.222
12  1   11      1 2.166

plot of chunk graph21

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

data  =  MODEL  +  error

Press (P): model summary

Which model is "better"?

what criteria used?
not used?
preferences in fit/parsimony
complex judgments

Challenges:

Collecting results
- What pieces of information do we need?
Organizing comparisons
- How do we overcome information overload?

Model A: \({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

plot of chunk graph20

Model B: \({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Collecting results: post-processing

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
modelA <- nlme::gls(attend ~ 1       , data=dsM)
modelB <- nlme::gls(attend ~ 1 + time, data=dsM)
dsM$modelA <- predict(modelA)
dsM$modelB <- predict(modelB)
dplyr::filter(dsM,id==1)

   id time attend modelA modelB
1   1    0      1  2.477  2.788
2   1    1      6  2.477  2.732
3   1    2      2  2.477  2.675
4   1    3      1  2.477  2.618
5   1    4      1  2.477  2.562
6   1    5      1  2.477  2.505
7   1    6      1  2.477  2.449
8   1    7      1  2.477  2.392
9   1    8      1  2.477  2.335
10  1    9      1  2.477  2.279
11  1   10      1  2.477  2.222
12  1   11      1  2.477  2.166

modelA

Generalized least squares fit by REML
  Model: attend ~ 1 
  Data: dsM 
  Log-restricted-likelihood: -3727

Coefficients:
(Intercept) 
      2.477 

Degrees of freedom: 1860 total; 1859 residual
Residual standard error: 1.793

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

Collecting results: post-processing

dsM <- dplyr::filter(dsL, id <= 300) %>% 
  dplyr::filter(ave((!is.na(attend)), id, FUN = all)) %>%
  dplyr::mutate(time=year-2000) %>%
  dplyr::select(id, time, attend)
modelA <- nlme::gls(attend ~ 1       , data=dsM)
modelB <- nlme::gls(attend ~ 1 + time, data=dsM)
dsM$modelA <- predict(modelA)
dsM$modelB <- predict(modelB)
dplyr::filter(dsM,id==1)

   id time attend modelA modelB
1   1    0      1  2.477  2.788
2   1    1      6  2.477  2.732
3   1    2      2  2.477  2.675
4   1    3      1  2.477  2.618
5   1    4      1  2.477  2.562
6   1    5      1  2.477  2.505
7   1    6      1  2.477  2.449
8   1    7      1  2.477  2.392
9   1    8      1  2.477  2.335
10  1    9      1  2.477  2.279
11  1   10      1  2.477  2.222
12  1   11      1  2.477  2.166

modelB

Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

Collecting results: post-processing

model <- modelB 
logLik<- summary(model)$logLik
deviance<- -2*logLik
AIC<- AIC(model)
BIC<- BIC(model)
df.resid<- NA
N<- summary(model)$dims$N
p<- summary(model)$dims$p
ids<- length(unique(dsM$id))
df.resid<- N-p
mInfo<- data.frame("logLik" = logLik, 
                   "deviance"= deviance, 
                   "AIC" = AIC, "BIC" = BIC,
                   "df.resid" = df.resid, "N" = N, 
                   "p" = p, "ids" = ids)
t<- t(mInfo)
rownames(t)<-colnames(mInfo)
dsmInfo<- data.frame(new=t)
colnames(dsmInfo) <- c("modelB")
mB <- dsmInfo

modelB

Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

Collecting results: post-processing

print(mB)

         modelB
logLik    -3719
deviance   7438
AIC        7444
BIC        7461
df.resid   1858
N          1860
p             2
ids         155

modelB

Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

Collecting results: post-processing

print(mA)

         modelA
logLik    -3727
deviance   7453
AIC        7457
BIC        7468
df.resid   1859
N          1860
p             1
ids         155

modelA

Generalized least squares fit by REML
  Model: attend ~ 1 
  Data: dsM 
  Log-restricted-likelihood: -3727

Coefficients:
(Intercept) 
      2.477 

Degrees of freedom: 1860 total; 1859 residual
Residual standard error: 1.793

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

Organizing comparisons

models <- data.frame(cbind(mA,mB))
models <- dplyr::mutate(models, 
  dif = round(modelB - modelA,2), 
Index = rownames(dsmInfo))

dplyr::select(models, Index, modelA, modelB, dif)

     Index modelA modelB    dif
1   logLik  -3727  -3719   7.61
2 deviance   7453   7438 -15.21
3      AIC   7457   7444 -13.21
4      BIC   7468   7461  -7.69
5 df.resid   1859   1858  -1.00
6        N   1860   1860   0.00
7        p      1      2   1.00
8      ids    155    155   0.00

Organizing comparisons: model manifestations

There are many ways to simplify patterns in the data.

Each has its own advantages and costs.

Model manifestations

Next time: Dynamic Reporting

Oct 14 – Intro to Reproducible Research
Oct 21 – RR Basic Skills (1): Data Manipulation
Oct 28 – Intro to Latent Class and Latent Transition Models
Nov 4 – RR Basic Skills (2): Graph Production
Nov 18 – RR Basic Skills (3): Statistical Modeling
Nov 25 – RR Basic Skills (4): Dynamic Reporting
Dec 2 – Migrating into R from other Statistical Software

Next time: model sequences

1. How to represent each model

2. How to construct sequence

example

Question? Comments?

You can forward questions to Ann Greenwood (ann.greenwood@popdata.bc.ca) or Vincenza Gruppuso (vincenza@uvic.ca) during the live Q&A. They are standing by to take your questions.