December 2 , 2014

Information

You can forward your questions to
Ann Greenwood (ann.greenwood at popdata.bc.ca) or Vincenza Gruppuso (vincenza at uvic.ca).
Q&A will begin immediately after the presentation.

You can follow the presentation and review previous lectures at ialsa.github.io/COAG-colloquium-2014F/

Overview of the Series

  • Oct 14 – Intro to Reproducible Research
  • Oct 21 – RR Basic Skills (1): Data Manipulation
  • Nov 4 – RR Basic Skills (2): Graph Production
  • Nov 18 – RR Basic Skills (3): Statistical Modeling
  • Dec 2 – RR Basic Skills (4): Dynamic Reporting

Previously:

"./Scripts/Data/dsL.R"

download the files to work along at GitHub

Load Data

# loads basic NLSY97-religiosity data as defined in COAG-Colloquium-2014F repository
dsL <- readRDS("./Data/Derived/dsL.rds")
str(dsL)
'data.frame':   107772 obs. of  12 variables:
 $ id     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ sex    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ race   : int  4 4 4 4 4 4 4 4 4 4 ...
 $ bmonth : int  9 9 9 9 9 9 9 9 9 9 ...
 $ byear  : int  1981 1981 1981 1981 1981 1981 1981 1981 1981 1981 ...
 $ year   : int  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
 $ attend : int  1 6 2 1 1 1 1 1 1 1 ...
 $ age    : int  19 20 21 22 23 24 25 26 27 28 ...
 $ sexF   : Ord.factor w/ 3 levels "Male"<"Female"<..: 2 2 2 2 2 2 2 2 2 2 ...
 $ raceF  : Ord.factor w/ 4 levels "Black"<"Hispanic"<..: 4 4 4 4 4 4 4 4 4 4 ...
 $ bmonthF: Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 9 9 9 9 9 9 9 9 9 9 ...
 $ attendF: Ord.factor w/ 8 levels "Never"<"Once or Twice"<..: 1 6 2 1 1 1 1 1 1 1 ...

Focal outcome

dsM <- dplyr::filter(dsL, id == 1) %>% 
  dplyr::select(id, year, attend, attendF)
dsM
   id year attend         attendF
1   1 2000      1           Never
2   1 2001      6 About once/week
3   1 2002      2   Once or Twice
4   1 2003      1           Never
5   1 2004      1           Never
6   1 2005      1           Never
7   1 2006      1           Never
8   1 2007      1           Never
9   1 2008      1           Never
10  1 2009      1           Never
11  1 2010      1           Never
12  1 2011      1           Never



How often did you attend a worhsip service during the last year?

  attendLevels         attendLabels
1            8             Everyday
2            7   Several times/week
3            6      About once/week
4            5    About twice/month
5            4     About once/month
6            3 Less than once/month
7            2        Once or Twice
8            1                Never



Which model is "better"?

  • what criteria used?
  • not used?
  • preferences in fit/parsimony
  • complex judgments

Challenges:

  1. Collecting results
    • What pieces of information do we need?
  2. Organizing comparisons
    • How do we overcome information overload?











Model A: \({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

plot of chunk graph20

Model B: \({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Collecting results: post-processing

print(mA)
         modelA
logLik    -3727
deviance   7453
AIC        7457
BIC        7468
df.resid   1859
N          1860
p             1
ids         155











modelA
Generalized least squares fit by REML
  Model: attend ~ 1 
  Data: dsM 
  Log-restricted-likelihood: -3727

Coefficients:
(Intercept) 
      2.477 

Degrees of freedom: 1860 total; 1859 residual
Residual standard error: 1.793 

\({y_{it}} = {\beta _0} + {\varepsilon _{it}}\)

plot of chunk graph20

Collecting results: post-processing

print(mB)
         modelB
logLik    -3719
deviance   7438
AIC        7444
BIC        7461
df.resid   1858
N          1860
p             2
ids         155











modelB
Generalized least squares fit by REML
  Model: attend ~ 1 + time 
  Data: dsM 
  Log-restricted-likelihood: -3719

Coefficients:
(Intercept)        time 
     2.7882     -0.0566 

Degrees of freedom: 1860 total; 1858 residual
Residual standard error: 1.782 

\({y_{it}} = {\beta _0} + {\beta _1}tim{e_t} + {\varepsilon _{it}}\)

plot of chunk graph21

Model manifestations


Model manifestations

Today

STRATEGIC GOAL
  • Evaluate a series of models
TACTICAL GOAL
  • Design a dynamic document for model sequencing
TECHNICAL ASSIGNMENT
  1. choose/specify what models should constitute the sequence
  2. estimate each model
  3. extract and organize solutions
  4. choose/create model manifestations (MM)
  5. combine MMs into a report on each model
  6. combine model reports into sequence report
  7. customize navigation through the sequence

Scripts involved

model-SPECIFY.R - contains model specifications of models as gls/lmer syntax.

model-ESTIMATE.R - contains the loop that cycles through all available model specifications. Includes the following operations:

1. Derives the dataset to be used
2. Estimates models:
- lme4::lmer - if a model contains random effects
- nlme::gls - if a model contains only fixed effects
3. Post-process model solution and save datasets:
- dsmInfo.rds - model fit and information indices
- dsFERE.rds - fixed (FE) and random effects (RE) descriptors
- dsp.rds - predicted values from the model

model-COLLECT-SOLUTIONS.R - combines datasets containing model solutions

Scripts involved

graph-FERE.R - creates a tile graph of model solution

graph-PREDICT.R - creates a line graph of predicted trajectories

graph-FIT - creates a bar graph of model performance indices

sequence.R(md)
  • produces sequence reports

Combining model manifestations

Question? Comments?