Literate Statistical Programming An Introduction using R and RStudio Andres Martinez & Michael Clark Center for Social Research

May 14, 2015

Literate Statistical Programming

The term has many aliases, including: I

Reproducible research (RR)

I

Replicable science (RS)

I

Reproducible (data) analysis (RDA)

I

Dynamic data analysis

I

Dynamic report generation

I

Literate (data/statistical) analysis

LSP and RDA are here used interchangeably. In my view these are different from RR/RS.

Reproducible Data Analysis

The ultimate standard for strengthening scientific evidence is replication. RDA requires: I

Data

I

Code

I

Clear documentation (of data and code)

I

Standard means of distribution

Literate Statistical Programming

I

Think of a report (e.g., journal article, blog, research paper/memo) as a single stream of human-readable text and machine-readable code

I

This is not quite the same as having a commented script file in, say, R or Stata and certainly not the same as having scripts and report files living separate lives

Literate Statistical Programming

May be conceived as a stream of code chunks and human-readable text chunks: I

Code chunks: I I I

I

load and prepare data compute a result create a table or plot

Human-readable text chunks: I I I

Describe the data Explain analysis Present a result

Literate Statistical Programming

The programs or streams of text and code can then be: I

‘Weaved’ to produce human-readable documents

I

‘Tangled’ to produce machine-readable documents

The basic idea is to combine I

a machine-readable programming language

I

a human-readable documentation

Tools There is a growing number of (awareness about?) open source tools that facilitate LSP. We will discuss: I

R: software programming language / environment for statistical computing / graphics

I

R Studio: integrated development environment (IDE) for R

I

Sweave: R functionality

I

knitR: R functionality/package (subsumes Sweave)

I

LaTeX: document preparation system and markup language

I

HTML: standard markup language for web pages (HyperText Markup Language)

I

Markdown: plain text formatting syntax easily convertible to HTML

I

pandoc: (universal?) document converter

Tools

Sweave: I

Original system in R designed to do RDA

I

Focus mainly on LaTeX (which some find difficult to learn)

I

Lacks features like caching, multiple plots per chunk, support for multiple programming languages

I

Development mostly stalled

Tools

knitR: I

More recent (package)

I

Inspired by Sweave, builds on its functionality

I

Possible to use with other programming languages

I

Supports a variety of documentation languages (LaTeX, Markdown, HTML)

I

Frequently updated, actively developed (young developer)

See http://yihui.name/knitr/.

Examples

A couple to get you started: I

A web page using Markdown

I

A report using LaTeX

Setup in RStudio: I

May want to create a new project (say, using a new directory in NetFile)

I

Set weaving to be done by knitR (Tools, Options, Sweave)

I

Install knitr (if not already): install.packages(“knitr”)

What is Markdown? I

I

”A plain text formatting syntax designed so that it can optionally be converted to HTML using a tool by the same name” (Wikipedia) From R Studio: I

I

I

I

a simple markup language designed to facilitate authoring web content easy a format that enables easy authoring of reproducible web reports from R rather than writing HTML and CSS code, Markdown enables the use of a syntax much more like plain-text email combines the core syntax of Markdown (an easy-to-write plain text format for web content) with embedded R code chunks that are run so their output can be included in the final document

Markdown in RStudio

I

RStudio greatly facilitates the combination of R with Markdown (R is effectively used as a Markdown implementation)

I

Combination achieved via the inclusion of R code chunks within a R Markdown file (.Rmd or .rmd), as opposed to a Markdown file (.md) The process involves 2 major steps:

I

I

I

Weaving the R Markdown file (.Rmd) into a plain Markdown file (.md) — accomplished by the package knitR Converting the markdown files into an HTML document — accomplished by the package markdown

See http://www.rstudio.com/ide/docs/r_markdown.

LaTeX in RStudio

I

Much of what we have said about Markdown applies

I

Combination (of R and TeX) achieved via the inclusion of R code chunks within a R NoWeb file (.Rnw or .rnw), as opposed to TeX file (.tex) The process involves 2 major steps:

I

I

I

I

Weaving the R NoWeb file (.Rnw) into a plain TeX file (.tex) — accomplished by knitR Converting the .tex file into, say, a .pdf file.

R code chunks opened and closed differently

Presentation, examples, and some useful resources

Presentation and related materials: http://www3.nd.edu/~amarti38/RDA.zip A nice example of what’s possible: https://micl.shinyapps.io/texEx/texEx.Rmd Useful resources: I

Help from within RStudio and from RStudio.com

I

http://yihui.name/knitr/

I

http://stackexchange.com/

Literate Statistical Programming An Introduction using ... - Michael Clark

May 14, 2015 - Page 2 ... HTML: standard markup language for web pages (HyperText ... Markdown: plain text formatting syntax easily convertible to. HTML.

110KB Sizes 0 Downloads 148 Views

Recommend Documents

Software Metrics - Literate Programming
Dec 1, 1988 - The Software Engineering Institute (SEI) is a federally funded research and development center, operated by ...... a medium-sized software system that evolved counting ...... the size of a computerized business information sys-.

Software Metrics - Literate Programming
Dec 1, 1988 - I would like to express my appreciation to Norm Gibbs,. Capsule Description ...... the initial budgeted cost, and a time to initial opera-. 5 Practical ...

Introduction to Programming Using Java
languages such as Java, Pascal, or C++. A program written in a ...... If I say “the President went fishing,” I mean that George W. Bush went fishing. But if I say.

Introduction to Programming Using Java
Broadband connections to the Internet, such as DSL and cable modems, are .... But applets are only one aspect of Java's relationship with the Internet, and not ...

Python Programming : An Introduction to Computer Science
The translation process highlights another advantage that high-level languages have over ma- chine language: portability. The machine language of a computer is created by the designers of the particular CPU. Each kind of computer has its own machine

Programming mobile devices - an introduction for practitioners.pdf ...
Programming mobile devices - an introduction for practitioners.pdf. Programming mobile devices - an introduction for practitioners.pdf. Open. Extract. Open with.

An Introduction to MPI Programming
IBM Loadleveler talks about tasks not processes ... On the IBM all tasks execute the code before MPI_INIT ... useful when doing collective communications.