December 2 , 2014

Information

You can forward your questions to
Ann Greenwood (ann.greenwood at popdata.bc.ca) or Vincenza Gruppuso (vincenza at uvic.ca).
Q&A will begin immediately after the presentation.

You can follow the presentation and review previous lectures at ialsa.github.io/COAG-colloquium-2014F/

Overview of the Series

  • Oct 14 – Intro to Reproducible Research
  • Oct 21 – RR Basic Skills (1): Data Manipulation
  • Nov 4 – RR Basic Skills (2): Graph Production
  • Nov 18 – RR Basic Skills (3): Statistical Modeling
  • Dec 2 – RR Basic Skills (4): Dynamic Reporting

Load Data

# loads basic NLSY97-religiosity data as defined in COAG-Colloquium-2014F repository
dsL <- readRDS("./Data/Derived/dsL.rds")
str(dsL)
'data.frame':   107772 obs. of  13 variables:
 $ id      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ year    : int  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
 $ sex     : int  2 2 2 2 2 2 2 2 2 2 ...
 $ race    : int  4 4 4 4 4 4 4 4 4 4 ...
 $ bmonth  : int  9 9 9 9 9 9 9 9 9 9 ...
 $ byear   : int  1981 1981 1981 1981 1981 1981 1981 1981 1981 1981 ...
 $ attend  : int  1 6 2 1 1 1 1 1 1 1 ...
 $ age     : int  19 20 21 22 23 24 25 26 27 28 ...
 $ sexF    : Ord.factor w/ 3 levels "Male"<"Female"<..: 2 2 2 2 2 2 2 2 2 2 ...
 $ raceF   : Ord.factor w/ 4 levels "Black"<"Hispanic"<..: 4 4 4 4 4 4 4 4 4 4 ...
 $ bmonthF : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 9 9 9 9 9 9 9 9 9 9 ...
 $ attendF : Ord.factor w/ 8 levels "Never"<"Once or Twice"<..: 1 6 2 1 1 1 1 1 1 1 ...
 $ attendPR: int  7 7 7 7 7 7 7 7 7 7 ...

Focal outcome

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::select(id, year, attend, attendF)
dsM
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never



How often did you attend a worhsip service during the last year?

  attendLevels         attendLabels
1            8             Everyday
2            7   Several times/week
3            6      About once/week
4            5    About twice/month
5            4     About once/month
6            3 Less than once/month
7            2        Once or Twice
8            1                Never



Data Origin

"./Scripts/Data/dsL.R"

download the files to work along at GitHub

Producing Graphs

plot of chunk graph21

Fitting Statistical Models

modelA
Generalized least squares fit by REML
  Model: attend ~ 1 
  Data: dsM 
  Log-restricted-likelihood: -3727

Coefficients:
(Intercept) 
      2.477 

Degrees of freedom: 1860 total; 1859 residual
Residual standard error: 1.793 









\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

plot of chunk graph20

Fitting Statistical Models

modelB
Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782 









\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Collecting results: post-processing

print(mA)
         modelA
logLik    -3727
deviance   7453
AIC        7457
BIC        7468
df.resid   1859
N          1860
p             1
ids         155











modelA
Generalized least squares fit by REML
  Model: attend ~ 1 
  Data: dsM 
  Log-restricted-likelihood: -3727

Coefficients:
(Intercept) 
      2.477 

Degrees of freedom: 1860 total; 1859 residual
Residual standard error: 1.793 

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

plot of chunk graph20

Collecting results: post-processing

print(mB)
         modelB
logLik    -3719
deviance   7438
AIC        7444
BIC        7461
df.resid   1858
N          1860
p             2
ids         155











modelB
Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782 

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Model manifestations


Producing Report

STRATEGIC GOAL
  • Evaluate a series of models
TACTICAL GOAL
  • Design a dynamic document for model sequencing

Dynamic Document = Software

  • use
  • replicate
  • hack

Producing Report

STRATEGIC GOAL
  • Evaluate a series of models
TACTICAL GOAL
  • Design a dynamic document for model sequencing

Goal for today

  • how to use sequence reports to compare models
  • how to run scripts to produce report
  • how to adapt code for other examples

Producing Report

Two recurring issues

What models to estimate?

What models to estimate?

Press (W): wide screen | Press (P): view by column

Press (P): view by column

Producing Report

Two recurring issues

How to express each result?

Producing Report

Producing Report

STRATEGIC GOAL
  • Evaluate a series of models
TACTICAL GOAL
  • Design a dynamic document for model sequencing

Goal for today

  • how to use sequence reports to compare models
  • how to
  • how to

Producing Report

STRATEGIC GOAL
  • Evaluate a series of models
TACTICAL GOAL
  • Design a dynamic document for model sequencing

Goal for today

  • how to
  • how to run scripts to produce report
  • how to

Scripts involved

model-SPECIFY.R - contains model specifications of models as gls/lmer syntax.

model-ESTIMATE.R - contains the loop that cycles through all available model specifications. Includes the following operations:

1. Derives the dataset to be used
2. Estimates models:
- lme4::lmer - if a model contains random effects
- nlme::gls - if a model contains only fixed effects
3. Post-process model solution and save datasets:
- dsmInfo.rds - model fit and information indices
- dsFERE.rds - fixed (FE) and random effects (RE) descriptors
- dsp.rds - predicted values from the model

model-COLLECT-SOLUTIONS.R - combines datasets containing model solutions

Scripts involved

graph-FERE.R - creates a tile graph of model solution

graph-PREDICT.R - creates a line graph of predicted trajectories

graph-FIT.R - creates a bar graph of model performance indices

graph-FIT-CUSTOM.R - creates a bar graph of fit for each group

Scripts involved

sequence.R
  • sources previous scripts
  • creates Mosaic graph for each model
sequence.Rmd
  • combines code, text, and images
  • produces an .html file

Producing Report

STRATEGIC GOAL
  • Evaluate a series of models
TACTICAL GOAL
  • Design a dynamic document for model sequencing

Goal for today

  • how to
  • how to
  • how to adapt code for other examples

Some hacking ideas

  • in sequence.R, change dplyr code so select different data

  • in model-SPECIFY.R change attendPR to a different time invariant predictor, for example sex.


  • in graph-PREDICT.R add forecast conditional on the value of the time invariant predictor

Another example of a sequence for self-study.

Final thoughts : Two concepts

Model Manifestation


Model Sequence


Final thoughts : Three ideas

report = software


model competition + model collaboration


science = art = design = technology

Question? Comments?