Today’s Goals

Preparing the workspace for statistical modeling and graphing may involve a number of steps and separate scripts.

Goals for today:
1. Open updated version of Hypertension-and-Cognition repository
2. Follow the path from RAW data files to DRAWing statistical graphs

I. Open updated version

  1. Open GitHub on your local machine Open GitHub
  2. Select the repository to work with. Clone if necessary. Select Repo
  3. Select the branch Select branch
  4. Sync (commit if necessary) Sync
  5. Open the repository in File Explorer (FE)
  1. In FE open the .Rpoj file to initiate RStudio Session Run Project Now you should have your RStudio open in the current state of your repository Load RStudio

II. Follow the RAW-to-DRAW path

Data Geneology

A graph or a model is meaningful only to the degree that its data is meaningful. Understanding the precise origin of your data, therefore, is a key component in construction of any graph. To simplify the exposition of this geneology we have introduced three data stages. This section walks the path from the initiation of R session to the declaration of the data geneology immediately before the graph.

Last time (link) we talked about the stages of the data as we progress through the data analysis project. To expand and reiterate, Data Stages include:

Raw files

  • optional stage
  • as-is in the secondary source
  • read-only

ds0

  • referred to as ds0
  • verbatim input from raw files
  • minimum or no processing
  • “patient zero”

dsL

  • span of the project
  • subset of ds0
  • long (L) or wide (W) data formats, respectively
  • provides richer context
  • invites future research
  • experimentation

dsM

  • model ready
  • used for estimating models and producing graphs
  • subset of dsL
  • usually adds custom data

ds

  • nameless dataset
  • temporary and recyclable
  • used to make code exchangeable

The process can be described the following diagram: Load RStudio