Data Processing with PC-‐SAS PubH 6325
J. Michael Oakes, PhD
Associate Professor Division of Epidemiology University of Minnesota
[email protected]
Lecture 2/4
Lecture 2
• DATA STEP & PROGRAMMING BASICS • BOOLEAN LOGIC • SAS OPERATORS • SUBSETTING DATA • LABELLING VARIABLES, DATA • SIMPLE PROCEDURES
PC SAS
Data
SAS Program
SAS Reads Data as per instruc1ons
Output SAS Writes Data and/or Text as per instruc1ons
SAS Fundamentals SAS does exactly(!) the stuff you tell it to do in your SAS program.
It cannot read your mind!
SAS Fundamentals The devil is in the syntax!
(Sarcophilus harrisi)
SAS Fundamentals Basic Programming Stuff: Statements must end with a semi-‐colon ; UPPER or lower case does not maaer
SAS Fundamentals Statements may take several lines Variable names must be 1-‐32 (8 is best) characters and begin with a leaer or an underscore, _
SAS Programming Two categories of SAS statements/commands:
Data Step (basically data mgt) Procedure Step (basically analysis)
Data Step Processing SUBMIT DATA STEP PROGRAM
COMPILE
End Data Step PROGRAM
CREATE
Set missing values
Input Buffer
PDV
process
DATA
statement
NO
read
INPUT
YES
RECORD TO READ?
record
execute other STATEMENTS
WRITES
observation to SAS data
RETURN
Descript. Info
Data Step Processing SUBMIT DATA STEP PROGRAM
COMPILE
Data step begins with the DATA statement in your program.
In this phase, SAS checks the syntax of the SAS statements and compiles them, that is, automaecally translates the statements into machine code. SAS then idenefies the type and length of each new variable, and determines whether a type conversion is necessary for each subsequent reference to a variable.
Data Step Processing
CREATE
Input Buffer
PDV
Descript. Info
In this phase, SAS creates: Input Buffer: A logical area in RAM into which SAS reads each record of raw data when SAS reads raw data. Program Data Vector (PDV): A logical area in RAM where SAS builds a data set, one observa1on at a 1me. From here, SAS writes the values to a SAS data set as a single observa1on. Along with data set variables and newly computed variables, the PDV contains two automa1c variables, _N_ and _ERROR_. Descriptor Informa1on: Informa1on that SAS creates and maintains about each SAS data set, including data set aPributes and variable aPributes. It contains, for example, the name of the data set and its member type, the date and 1me that the data set was created, and the number, names and data types (character or numeric) of the variables.
Data Step Processing data total_points (drop=TeamName); input TeamName $ ParticipantName $ Event1 Event2 Event3; TeamTotal = (Event1 + Event2 + Event3); datalines; Knights Sue 6 8 8 Cardinals Jane 9 7 8 Knights John 7 7 7 Knights Lisa 8 9 9 Knights Fran 7 6 6 Knights Walter 9 8 10; Run;
Data Step Processing Knights Sue 6 8 8!
TeamName
Par1cipantName
Event1
Event2
Event3
Drop
Build PDV for Named Variables
TeamTotal
_N_
_ERROR_
Drop
Drop
Data Step Processing Knights Sue 6 8 8!
Set missing values
TeamName
Par1cipantName
. .
Event1
Event2
.
Event3
Drop
TeamTotal
0
_N_
_ERROR_
1
0
Drop
Fill-‐in PDV place-‐holders for variables
Drop
Data Step Processing Knights Sue 6 8 8
TeamName
Par1cipantName
Event1
Event2
Sue
6
8
Knights
Event3
8
Drop
TeamTotal
_N_
_ERROR_
0
1
0
Drop
read
INPUT record
Fill PDV with “data”
Drop
Data Step Processing Knights Sue 6 8 8!
TeamName
Knights
Par1cipantName
Event1
Event2
Sue
6
8
Drop
Event3
8
TeamTotal
22
_N_
_ERROR_
1
0
Drop
Calculate “TeamTotal” variable
execute other STATEMENTS
Drop
Data Step Processing Knights Sue 6 8 8!
Par1cipantName
Event1
Event2
Sue
6
8
Write/Output to SAS dataset
WRITES
observation to SAS data
Event3
8
TeamTotal
22
Data Step Processing Cardinals Jane 9 7 8!
TeamName
Par1cipantName
. .
Event1
Event2
.
Event3
Drop
TeamTotal
0
_N_
_ERROR_
2
0
Drop
Return and set _N_ to 2, Repeat Sequence
RETURN
Drop
Knights Sue 6 8 8! TeamName
Par1cipantName
Drop TeamName
Knights
Par1cipantName
Sue
. . . . . .
Event1
Event1
6
Event2
Event2
8
Event3
Event3
8
TeamTotal
0
Knights
1
_ERROR_
0
Drop Drop TeamTotal 0
0
Drop TeamName
_N_
_N_ 1
1
_ERROR_ 0
0
Drop Drop Par1cipantName
Sue
Event1
6
Event2
8
Event3
TeamTotal
8
22
Drop
_N_
1
_ERROR_
0
Drop Drop Par1cipantName
Sue
Event1
6
Event2
8
Event3
TeamTotal
8
22
Cardinals Jane 9 7 8! TeamName
Drop
Par1cipantName
. . .
Event1
Event2
Event3
TeamTotal
0
_N_
2
_ERROR_
0
Drop Drop
Data Step Processing SUBMIT DATA STEP PROGRAM
COMPILE
End Data Step PROGRAM
CREATE
Set missing values
Input Buffer
PDV
process
DATA
statement
NO
read
INPUT
YES
RECORD TO READ?
record
execute other STATEMENTS
WRITES
observation to SAS data
RETURN
Descript. Info
Data Step Processing
Programming (very) Basics Programs:
• Document tasks • Permit replicaeon
Programming (very) Basics Good programming praceces:
• Comment on name of prog • Date wriaen • Author • Purpose • Use comments ohen
Programming Basics Example: ***This code appears in Chapter 1 of SAS Programming by Example.*** *** Example 1 ***; DATA LISTINP; INPUT ID HEIGHT WEIGHT GENDER $ AGE; DATALINES; 1 68 144 M 23 2 78 202 M 34 3 62 99 F 37 4 61 101 F 45 ; PROC PRINT DATA=LISTINP; TITLE 'Example 1'; RUN;
Programming Basics Beaer Example: * analysis.sas
;
* Program runs analyses on Ed Kaplan's Strep Data
;
* Originally written on 12/12/01
;
JMO
**********************************************************; data comb; set kaplan.combined; proc nlmixed data=comb; parms beta0=-1 s2u=1; eta=beta0 + u; expeta=exp(eta); p=expeta/(1+expeta); model endpoint ~ binary(p); random u ~ normal(0,s2u) subject=inv; estimate 'sigma2' s2u; run;
Reading SAS Data A DATA statement “writes” SAS data To read in exiseng SAS data…
Use the set command…
PC SAS
Data
SAS Program
SAS Reads Data as per instruc1ons
Output SAS Writes Data and/or Text as per instruc1ons
Reading SAS Data Version stuff:
Data Library Engines –
Indicate which version of SAS you want to read from and/or write to.
Wrieng SAS Data Two ways to go:
Temporary SAS file
Permanent SAS file
Wrieng SAS data Temporary SAS file dulldata work.dulldata Where the heck is the work.sas directory?
Boolean Logic Or | Not ! And &
George Boole, FRS (1815-‐1864)
A
B
C John Venn, FRS (1834-‐1923)
SAS Operators SAS operators are symbols that request a comparison, a logical operaeon, an arithmeec calculaeon, or a concatenaeon.
+ Addieon -‐ Subtraceon * Muleplicaeon / Division ** Exponeneaeon || Concatenate ‘
a+b a-‐b a*b a/b a**b a’|| ‘b’ yields ‘ab’
SAS Operators
< < <= >= = ~= >< <>
LT GT LE GE EQ NE
less than greater than less than or equal to greater than or equal to equal to not equal to
a
b a<=b a>=b a = b a ~=b
MIN minimum of MAX maximum of
z=(a>b)
z=(a MIN b) z=(a MAX b)
& |
AND Boolean “and” OR Boolean “or”
a & b a | b
a LT b a GT b a LT b a GE b a EQ b a NE b
a and b a or b
Manipulaeng SAS Data Sets
Subsewng
if where obs= keep drop
Manipulaeng a SAS Dataset Generate a new variable Formats Rename Variable Labels
SAS Procedures Procedure Steps • A proc statement runs a SAS proc. • Most procs use data created in the data step. • The syntax for most procs is about the same.
SAS Procedures Proc Contents The CONTENTS procedure prints the contents of a SAS data set (to output file).
SAS Procedures Proc Print The PRINT procedure prints the observaeons (i.e., data) in a SAS data set, using all or some of the variables as you select, (to the Output file).
Example programs See ‘day 2 programs.sas’
Subsewng, Manipulaeng, Generaeng, Labeling
Lab 2 (First Hour) Directed Learning
• SAS program wrieng and saving • Reading and wrieng SAS data • Basic Subsewng examples • Labeling variables, datasets • Basic procs (contents, print)
(Second Hour) Lab Assignment
• Write professional quality SAS program to read and write simple data, subset and manipulate as per direceons.