The where expression allows for declaring filters on SAS System datasets. This presentation illustrates some uses in the data step and SAS Macro Language.
1. Introduction to Where
Expressions
Mark Tabladillo, Ph.D.
Software Developer, MarkTab Consulting
Associate Faculty University of Phoenix
Faculty,
January 30, 2007
2. Introduction
• WHERE expressions allow for processing
subsets of observations
• WHERE expressions can be used in the
DATA step or with PROC (procedure)
statements
• This presentation will contain a series of
features and examples of the WHERE
p
expression
• We end with some intensive macros
3. WHERE-expression Processing
WHERE expression
• Enables us to conditionally select a subset
of observations, so that SAS processes
only the observations that meet a set of
specified conditions.
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000999253.htm
4. Work Sales Dataset
Work.Sales
data work.sales (drop=i randomState);
length state $2 sales 8 randomState 3;
do i = 1 to 2500;
randomState = round(rand('gaussian',3,1)+0.5);
if randomState in (1,2,3,4,5) then do;
( )
select(randomState);
when(1) state='TN';
when(2) state='AL';
when(3) state= GA ;
state='GA';
when(4) state='FL';
when(5) state='MS';
end;
sales = int(rand('gaussian',1000000,500000));
output work.sales;
end;
end;
run;
5. Data Set Option or Statement
data work.highSales;
set work.sales (where=(sales>1500000));
run;
data work highSales;
work.highSales;
set work.sales;
where sales>1500000;
run;
proc means data=work.sales;
where sales>1500000;
run;
;
6. Data Set Option or Statement
data work.lowSales;
set work.sales (where=(sales<0));
run;
data work lowSales;
work.lowSales;
set work.sales;
where sales<0;
run;
proc means data=work.sales (where=(sales<0));
run;
7. Multiple Comparisons
data work.highFloridaSales;
set work.sales (where=(sales>1500000 and state = 'FL'));
run;
data work highFloridaSales;
work.highFloridaSales;
set work.sales;
where sales>1500000 and state = 'FL';
run;
proc freq data=work.sales;
tables state;
where sales>1500000 and state = 'FL';
;
run;
8. SAS Functions
data work.highFloridaSales;
set work.sales (where=(sales>1500000 and substr(state,1,1) = 'F'));
run;
data work highFloridaSales;
work.highFloridaSales;
set work.sales;
where sales>1500000 and substr(state,1,1) = 'F';
run;
proc means data=work.sales;
where sales>1500000 and substr(state,1,1) = 'F';
run;
;
9. Comparison Operators
Priority Order of Symbols Mnemonic
Evaluation Equivalent
Group I right to left **
+
-
ˆ¬~ NOT
>< MIN
<> MAX
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000780367.htm
10. Comparison Operators
Priority Order of Symbols Mnemonic
Evaluation Equivalent
Group II left to right *
/
Group left to right +
III
-
Group left to right || ¦¦ !!
IV
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000780367.htm
11. Comparison Operators
Priority Order of Symbols Mnemonic
Evaluation Equivalent
Group left to right < LT
V
<= LE
= EQ
¬= NE
>= GE
> GT
IN
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000780367.htm
12. Comparison Operators
Priority Order of Symbols Mnemonic
Evaluation Equivalent
Group left to right & AND
VI
Group left to right |¦! OR
VII
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000780367.htm
13. Comparison Operators
data work.extremeNonGeorgia;
set work.sales
(where=((sales<0 | sales>1500000) and state in ('TN','AL','FL','MS')));
run;
data work.extremeNonGeorgia;
set work.sales;
where (sales<0 | sales>1500000) and state in ('TN','AL','FL','MS');
run;
data work.extremeNonGeorgia;
set work.sales;
;
where ^ (0 <= sales <= 1500000) & state ne 'GA';
run;
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000999255.htm
14. “Between And”
Between And
data work.boundedNonGeorgia;
set work.sales (where=((sales between 1000000 and 1500000) &
state in ('TN','AL','FL','MS')));
run;
data work.boundedNonGeorgia;
set work.sales;
where (sales between 1000000 and 1500000) &
state in ('TN','AL','FL','MS');
t t i ('TN' 'AL' 'FL' 'MS')
run;
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000999255.htm
15. Contains ?
data work.LStates;
set work.sales (where=(state contains 'L'));
run;
data work LStates;
work.LStates;
set work.sales;
where state contains 'L';
run;
data work.LStates;
set work.sales;
where state ? 'L';
;
run;
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000999255.htm
16. Is Null/Is Missing
data work.nullStates;
set work.sales (where=(state is null));
run;
data work.missingStates;
se o sa es (where=(state s ss g));
set work.sales ( e e (s a e is missing));
run;
data work.nullSales;
set work.sales;
work sales;
where sales is missing;
run;
data work.nonNullSales;
set work.sales;
where sales is not missing;
run;
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000999255.htm
17. Like
data work.likeL;
set work.sales (where=(state like '%L'));
work sales
run;
data work.likeL;
set work.sales (where=(state like quot;%Lquot;));
run;
data work likeL;
work.likeL;
set work.sales (where=(state like quot;%%Lquot;));
run;
data work.notLikeG;
set work.sales;
where state not like 'G_';
run;
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000999255.htm
18. Sounds Like (Soundex)
data work.soundsLikeFill;
set work.sales (where=(state =* 'fill'));
run;
data work notSoundsLikeTin;
work.notSoundsLikeTin;
set work.sales;
where state not =* 'tin';
run;
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000999255.htm
19. “Same And”
Same And
data work.boundedNonGeorgia;
set work.sales (where=((sales between 1000000 and 1500000) &
state in ('TN','AL','FL','MS')));
run;
data work.boundedNonGeorgia;
set work.sales;
where (sales between 1000000 and 1500000);
where same and state i ('TN' 'AL' 'FL' 'MS')
h d t t in ('TN','AL','FL','MS');
run;
data work.boundedNonGeorgia;
g ;
set work.sales;
where same and (sales between 1000000 and 1500000);
where same and state in ('TN','AL','FL','MS');
run;
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000999255.htm
20. WHERE vs. Subsetting IF
vs
Task Method
Make the selection in a procedure without using a WHERE expression
preceding DATA step
Take advantage of the efficiency available with an indexed WHERE expression
data set
Use one of a group of special operators, such as WHERE expression
BETWEEN-AND, CONTAINS, IS MISSING or IS
NULL, LIKE, SAME-AND, and Sounds-Like
B th l ti thi th th i bl l b tti
Base the selection on anything other than a variable value subsetting IF
that already exists in a SAS data set. For example, you
can select a value that is read from raw data, or a
value that is calculated or assigned during the course
of the DATA step
f th t
Make the selection at some point during a DATA step subsetting IF
rather than at the beginning
Execute the selection conditionally subsetting IF
http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a001000521.htm
21. Intensive Dataset Generation
%macro OurCentury();
%local year interest;
y ;
%do year = 2001 %to 2100;
%let interest = %sysfunc(compound(1,.,0.05,%eval(&year.-2001)));
data work.sales&year. (drop=i randomState index=(state sales));
length state $2 stateName $20 sales 8 randomState 3;
g ;
do i = 1 to 2500;
randomState = round(56*rand('uniform')+0.5);
if randomState <= 56 and randomState not in (3,7,14,43,52) then do;
state = fipstate(randomState);
p ( )
stateName = fipnameL(randomState);
sales = int(rand('gaussian',1000000*&interest.,500000*&interest.));
output work.sales&year.;
end;
end;
run;
%end;
%mend OurCentury; y
%OurCentury;
22. Year/State Datasets
%macro SalesByYearState();
%local year stateCode state;
%do year = 2001 %to 2100;
%do stateCode = 1 %to 56;
%if &stateCode ne 3 & &stateCode ne 7 & &stateCode. ne 14 &
&stateCode. &stateCode. &stateCode
&stateCode. ne 43 & &stateCode. ne 52 %then %do;
%let state = %sysfunc(fipstate(&stateCode.));
data work.sales&year.&state.;
set work.sales&year.;
t k l &
where state = quot;&state.quot;;
run;
%end; ;
%end;
%end;
%mend SalesByYearState;
%SalesByYearState;
23. Year/State High Sales Datasets
%macro HighSalesByYearState();
%local year stateCode state interest keepDataset;
%do year = 2001 %to 2100;
%let interest = %sysfunc(compound(1,.,0.05,%eval(&year.-2001)));
%do stateCode = 1 %to 56;
%if &stateCode. ne 3 & &stateCode. ne 7 & &stateCode. ne 14 & &stateCode. ne 43 &
&stateCode. ne 52 %then %do;
%let state = %sysfunc(fipstate(&stateCode.));
%let keepDataset = 0;
data work.sales&year.&state.high;
set work.sales&year.;
where state = quot;&state.quot; and sales > 2000000*&i t
h t t quot;& t t quot; d l 2000000*&interest.;
t
call symput('keepDataset',left('1'));
run;
%if not(&keepDataset.) %then %do;
p
proc datasets lib=work nolist;
delete sales&year.&state.high;
run; quit;
%end;
%end;
%end;
%end;
%mend HighSalesByYearState;
%HighSalesByYearState;
24. Conclusion
• The WHERE expression allows for
efficient observation processing in the
DATA step and the PROC statements
• The SAS System Documentation provides
specific details on the syntax
• Using macros increases the processing
power of WHERE expressions
f i
25. Contact Information
• Mark Tabladillo
MarkTab Consulting
http://www.marktab.com/
http://www marktab com/