Data Roadmap: From Raw to Draw

Today’s Goals
I. Open updated version
II. Follow the RAW-to-DRAW path
- Data Geneology

Today’s Goals

Preparing the workspace for statistical modeling and graphing may involve a number of steps and separate scripts.

Goals for today:
1. Open updated version of Hypertension-and-Cognition repository
2. Follow the path from RAW data files to DRAWing statistical graphs

I. Open updated version

Open GitHub on your local machine
Select the repository to work with. Clone if necessary.
Select the branch
Sync (commit if necessary)
Open the repository in File Explorer (FE)

(Right-click on repository’s name in the left sidebar)

In FE open the .Rpoj file to initiate RStudio Session Now you should have your RStudio open in the current state of your repository

II. Follow the RAW-to-DRAW path

Data Geneology

A graph or a model is meaningful only to the degree that its data is meaningful. Understanding the precise origin of your data, therefore, is a key component in construction of any graph. To simplify the exposition of this geneology we have introduced three data stages. This section walks the path from the initiation of R session to the declaration of the data geneology immediately before the graph.

Last time (link) we talked about the stages of the data as we progress through the data analysis project. To expand and reiterate, Data Stages include:

Raw files

optional stage
as-is in the secondary source
read-only

ds0

referred to as ds0
verbatim input from raw files
minimum or no processing
“patient zero”

dsL

span of the project
subset of ds0
long (L) or wide (W) data formats, respectively
provides richer context
invites future research
experimentation

dsM

model ready
used for estimating models and producing graphs
subset of dsL
usually adds custom data

ds

nameless dataset
temporary and recyclable
used to make code exchangeable

The process can be described the following diagram: Load RStudio