Data  Processing  with  PC-­‐SAS   PubH  6325  

J.  Michael  Oakes,  PhD  

 

Associate  Professor   Division  of  Epidemiology   University  of  Minnesota   [email protected]  

  Lecture  3/4  

Remember!   Unless  you  want  to     DON’T  WRITE  OVER       your  “master”  dataset!      

data  temp2;    set  temp1;    rename  x  =  y;   run;  

Lecture  3    

DATA  SET  MANIPULATION     FUNCTIONS     FORMATS     DATES  

PC  SAS    

Data  

SAS  Program  

SAS  Reads  Data   as  per  instruc1ons  

Output   SAS  Writes  Data     and/or  Text     as  per  instruc1ons  

 

Manipula\ng  Data  Sets    

Append

A  

append  

B  

A   B  

Both  A  and  B  should  have  (approx.)  same  variables.  

Append     SAS  Append    (some\mes  called  concatenate  or  adding  observa\ons)    

     

 Use  the  SET  statement  

data  temp_both;    set  temp1  temp2;   run;  

SAS  Append     Complica\ons:        Datasets  do  not  have  same  variables!  

Dataset  1     Dataset  2  

 X      Y      Z      X        Y                  P    

Missing  Values  are  inserted    

Append     SAS  Append    (some1mes  called  concatenate  or  adding  observa1ons)    

     

 Use  the  SET  statement  

data  temp_both;    set  temp1  temp2;   run;  

Manipula\ng  Data  Sets     Merge  

A  

merge  

B  

A   B   By  some  linking     Variable  (e.g.,  ID)  

SAS  Merge     SAS  Merge  

(some\mes  called  adding  variables)  

data  temp_both;    merge  demog  employee;   run;  

Unless  you’re  perfectly,  totally,  absolutely,  undoubteldy,  quite,   uferly,  and  wholly  100%  CERTAIN  of  what  you’re  doing,  NEVER   EVER  SHEVER  KEVER  BLEVER  merge  without  a  merge  “key”  -­‐-­‐   some\mes  called  a  one-­‐to-­‐one  merge.    

SAS  Merge     Use  a  merge  key  to  link/match/connect   observa\ons  (e.g.,  subjects)  from  one  data  set  to   another.  

Merge  keys  (eg,  ID)  should  be  unique  and  in  string  format;  they  may  be  several   variables  long  (eg,  state,  county,  tract).  

SAS  Merge     Use  by  statement  with  merge  key  to  link/match/ connect  observa\ons  (e.g.,  subjects)  from  one   data  set  to  another.  

data  temp_both;    merge  demog  employee;    by  ID;   run;  

SAS  Merge     You  must  sort  each  dataset  by  merge  key  before   afemping  to  merge.  Think  about  why  this  is  wrt  the  pdv!   proc  sort  data=demog;    by  ID;   proc  sort  data=employee;    by  ID;   Data  combined;    merge  demog  employee    by  ID;   run;  

SAS  Merge     As  with  Append,  missing  values  fill-­‐in  the  PDV   and  hence  the  output  file  if  merge-­‐keys  in  each   dataset  don’t  perfectly  conform.     Advanced  topics:     •   Mul\ple  variable  merge  keys   •   Iden\fying  “matches”  with  (IN=    )   •   Renaming  merge  keys   •   Fuzzy  merges  with  soundex  func\on   •   Proc  Sort  op\ons            

Manipula\ng  Data  Sets     Collapse  (for  hierarchical  data)    

A   Collapsed  and  sta\s\cs     (e.g.,  sums,  means)  are     generated.  

A’  

Collapse     ID CLINIC AGE SPB

Sta1s1cs  

SEX

1

1

46

116

M

2

1

53

150

F

3

1

38

112

F

4

1

63

.

M

5

2

18

112

M

6

2

24

121

M

7

2

19

145

M

CLINIC

n

age

sbp

male

1

4

50

126

50

2

3

20.3

126

100

Collapsed  to  clinic  level  

Collapse     SAS  Collapse?  Use  Proc  Means!     PROC  MEANS  DATA=clnc_dat;      CLASS  clinic;      VAR  age;      OUTPUT  OUT=fam2  MEAN(age)=age;   RUN;  

Proc  Means   Procedure   Name/Iden\fy   statement   Original  dataset  

PROC  MEANS  DATA=clnc_dat;      CLASS  clinic;  

Variable(s)  to  collapse  on  

   VAR  age;  

Variable(s)  to  get  generate  summary  stat(s)  for  

   OUTPUT  OUT=fam2  MEAN(age)=  age;    

Name  of     output  sta\s\c   in  output   dataset   Output     sta\s\c   And  its   target  variable  

Name/loca\on   of     Output  Dataset   Request   for   output   (collapsed)   dataset  

Collapse     PROC  MEANS  DATA=clnc_dat;      CLASS  clinic;      VAR  age;      OUTPUT  OUT=fam2  MEAN(age)=mean_age    MEDIAN(age)=med_age;   RUN;  

Keywords  for  Common  Output  Sta\s\cs        N  (Number  of  non-­‐missing  values)    NMISS  (Number  of  missing  values)    MEAN  (Mean/average)    MIN  (Minimum  value)    MAX  (Maximum  value)    SUM  (Sum  of  values)    STDDEV|STD  (Standard  devia\on  of  values)    VAR  (Variance  of  values)    STDERR  (Standard  error  of  Mean)    CLM  (Two-­‐sided  confidence  limit  for  the  mean)    LCLM  (Lower  confidence  limit  for  the  mean)    UCLM  (Upper  confidence  limit  for  the  mean)    MEDIAN|P50  (Median)    Q1|P25  (First  quar\le)  Q3|P75  (Third  quar\le)      QRANGE  (Interquar\le  range)  

SAS  Func\on  Junc\on     Useful  func\ons:  y  =  f  (x)  

Numeric     String  

SAS  Func\on  Junc\on     A  Few  of  my  favorite  func\ons:        Log  (  )      Upper(  )      Exp(  )      Lower(  )      Round(  )    Substr(  )      Ceil(  )      Length(  )      Mean(  )    Year(  )      ranuni(  )  

Put(  )  

Random  subseqng/sampling   Many  possibli\es,  but  you  can:     •   Generate  random  number  for  each  obs;  sort  on  the  random   number;  keep  if  _n  <  N,  where  N  is  desired  sample  size.       •   Proc  surveyselect  

SAS  Formats   A  format  is  an  instruc\on  that  SAS  uses  to  write  data  values.  You  use  formats  to   control  the  wrifen  appearance  of  data  values,  or,  in  some  cases,  to  group  data   values  together  for  analysis.       For  example,  the  WORDS22.  format,  which  converts  numeric  values  to  their   equivalent  in  words,  writes  the  numeric  value  692  as  six  hundred  ninety-­‐two.  

SAS  Formats   Syntax:      <$>format.      

     where  “$”  indicates  a  character  format;  its  absence    indicates  a   numeric  format;      “w”  specifies  the  format  width,  which  for  most    formats  is  the  number   of  columns  in  the  output  data;    and  “d”  specifies  an  op\onal  decimal  scaling  factor  (ie,    many   decimal  places)  in  the  numeric  formats.       Formats  always  contain  a  period  (.)  as  a  part  of  the  name  and  never  change  or   truncate  the  internal  stored  data  values.    

SAS  Formats  Examples   Syntax:      <$>format.      

        X  =  1257000    Format  x  best12.    

 

 1257000  

 Format  x  comma12.0    

 1,257,000  

 format  x  best6.    

 

 1.26E6  

 format  x  best3.    

 

 1E6  

 format  x  10.2  

 

 

 1257000.00  

 format  x  10.3  

 

 

 1257000.000  

SAS  Formats  Examples   Syntax:      <$>format.      

       

y  =  “Grateful  Dead”     Format  y  $Char13.

 Grateful  Dead  

Format  y  $13.

 Grateful  Dead  

 

Format  y  $Char3.  

 Gra  

Format  y  $3.  

 Gra  

 

SAS  Formats   Common  Formats  

BESTw.     COMMAw.d   w.d       $CHARw.   $w.       PERCENTw.d   PVALUEw.d    

 (default)    (numeric  comma  separated)    (basic  numeric)    (string  of  length  w)    (same  as  $CHARw.)    (percent,  %)    (p-­‐value  looking)  

SAS  Formats   Make  your  own  formats  and  use   them  to  categorize  con\nuous   variables.  

SAS  Dates     Dates  are  stored  as  number  of  days          since  January  1,  1960.        Dates  since  then  are  posi\ve  (+)        Dates  prior  to  then  are  nega\ve  (-­‐)  

SAS  Dates     7/4/1776  

-­‐67019  

1/1/1959  

-­‐365  

1/1/1960  

0  

1/1/1961  

366  

1/17/2002  

15357  

SAS  Dates     The  “trick”  is  to  covert  a  given  raw  string   date  into  a  SAS  data  and  then  format  the   SAS  date  for  your  easy  reading/ understanding.       •     Do  calcula\ons  and  such  with  SAS  dates.   •     Use  formafed  SAS  dates  for  condi\onal    statements.    

Example  Program    See  ‘day  3  programs.sas’  

Lab  3   (First  Hour)  Directed  Learning  

•   Appending,  Merging,  Collapsing   •   Func\ons   •   Formats   •   Dates   •   Procs     (Second  Hour)  Lab  Assignment  

 

•   Write  and  execute  program  for     appending,  merging,  collapsing,  and     for  using  func\ons  and  formats;  build  freq   tables.  

Lecture 3 of 4.pdf

Page 1 of 34. Data Processing with PC-SAS. PubH 6325. J. Michael Oakes, PhD. Associate Professor. Division of Epidemiology. University of Minnesota.

133KB Sizes 3 Downloads 274 Views

Recommend Documents

Lecture 3
Oct 11, 2016 - request to the time the data is available at the ... If you want to fight big fires, you want high ... On the above architecture, consider the problem.

Lecture 3.pdf
Page 1 of 36. Memory. It is generally agreed that there are three types of. memory or memory function: sensory buffers, short-term. memory or working memory, ...

Lecture 3 Mobile Network Generations.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

Macro 3: Lecture 3 - Consumption & Savings
consumers make optimal choices = maximize intertemporal utility given an intertemporal budget constraint. Burak Uras. Macro 3: Consumption & Savings ...

EE 396: Lecture 3 - UCLA Vision Lab
Feb 15, 2011 - (which we will see again in more detail when we study image registration, see [2]). • The irradiance R, that is, the light incident on the surface is ...

Week 3 Lecture Material.pdf
Page 2 of 33. 2. ASIMAVA ROY CHOUDHURY. MECHANICAL ENGINEERING. IIT KHARAGPUR. A cutting tool is susceptible to breakage, dulling and wear. TOOL WEAR AND TOOL LIFE. Rake. surface. Pr. flank. Aux. flank. Page 2 of 33. Page 3 of 33. 3. ASIMAVA ROY CHOU

EnvEcon13 - Lecture 3 - (Non)Renewable Resources.pdf ...
EnvEcon13 - Lecture 3 - (Non)Renewable Resources.pdf. EnvEcon13 - Lecture 3 - (Non)Renewable Resources.pdf. Open. Extract. Open with. Sign In.

EE 396: Lecture 3 - UCLA Vision Lab
Feb 15, 2011 - The irradiance R, that is, the light incident on the surface is directly recorded ... partials of u1 and u2 exist and are continuous by definition, and ...

Week 3 Lecture Material.pdf
Page 2 of 104. 2. Fuzzy Logic Controller. • Applications of Fuzzy logic. • Fuzzy logic controller. • Modules of Fuzzy logic controller. • Approaches to Fuzzy logic controller design. • Mamdani approach. • Takagi and Sugeno's approach. Deb

pdf-1490\record-of-agard-lecture-series-lecture ...
... the apps below to open or edit this item. pdf-1490\record-of-agard-lecture-series-lecture-series-i ... unne-j-c-north-atlantic-treaty-organization-vannucci.pdf.

Koons, Lecture #3, Plato, Plotinus, Pseudo-Dionysius, John Scotus ...
A large part of the book is devoted to physiology and psychology. ... according to a pre-existing model, the world of "intelligible beings" which is the whole realm of ... fire and earth must be joined by two intermediate elements, air and water. ...

Treatment Effects, Lecture 3: Heterogeneity, selection ...
articulate expression, are even less sanguine: I find it hard to make any sense of the LATE. ... risk neutrality, decision-making on the basis of gross benefits alone, etc— the basic setup has applications to many .... generalized Roy model—and t

PHY454H1S Continuum Mechanics. Lecture 3 ... - Peeter Joot's Blog
Jan 18, 2012 - (dxi ds. )2. (4) so that ds. 2 − ds2 ds2. = 2eiidx2 i. (5). Observe that here again we see this factor of two. If we have a diagonalized strain tensor, the tensor is of the form ... We will consider a volume big enough that we won't

PHY454H1S Continuum Mechanics. Lecture 3 ... - Peeter Joot's Blog
http://sites.google.com/site/peeterjoot2/math2012/continuumL3.pdf. Peeter Joot — [email protected]. Jan 18, 2012 continuumL3.tex. Contents. 1 Review.

Lecture to Oxford Farming Conference, 3 January 2013 [PORTUGUÊS ...
Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Lecture to ... TUGUÊS].pdf. Lecture to ... TUGUÊS].pdf.

Ilan-Gur-Zeev-Memorial-lecture-heb-poster2015-3-print.pdf
Ilan-Gur-Zeev-Memorial-lecture-heb-poster2015-3-print.pdf. Ilan-Gur-Zeev-Memorial-lecture-heb-poster2015-3-print.pdf. Open. Extract. Open with. Sign In.