SlideShare una empresa de Scribd logo
1 de 6
Descargar para leer sin conexión
The Dirty Dozen: Twelve Common Programming Mistakes

                                                  Bob Virgile
                                         Robert Virgile Associates, Inc.



Overview                                                    INPUT D_BIRTH 21-28 MMDDYY8.;

As programmers begin using the SAS® software,               This type of error, while common, is not fatal. The
their programs often contain many of the same               software generates an error message, and the error
types of mistakes. A seemingly innocuous error,             message is somewhat clear. One hybrid exists, to
such as omitting a semicolon, may generate results          indicate the number of implied positions after a
ranging from an obvious error message to an                 decimal point. These two statements are both
impossible to decipher error message to destroying          legal, and are equivalent to one another:
an existing dataset. This paper covers typical
errors made at an introductory level, including those       INPUT AMOUNT 11-18 .2;
which produce no error message but still generate           INPUT @11 AMOUNT 8.2;
the wrong result.
                                                            Let's move on to situations where the error
                                                            message is difficult to decipher.
#1: Step-by-Step Execution
                                                            #3: Missing Semicolons
SAS programs execute one DATA or PROC step at
a time. A program cannot mix and match the
                                                            In this program, the software complains that the
steps. Here is a typical attempt:
                                                            ELSE statement has no matching IF / THEN
                                                            statement. What could be the problem?
DATA PURCHASE;
SET SALES;
TOTAL = PRICE * QUANTITY;                                   DATA NEW;
                                                            INFILE RAWDATA;
PROC MEANS DATA=PURCHASE;                                   INPUT NAME $16. /* First + Last */
VAR TOTAL;                                                        RACE 17   /* 1-digit code */
                                                                  AGE 18-20 /* Some oldies! */
WITHTAX = TOTAL * (1 + TAXRATE);
                                                            IF AGE >= 65 THEN STATUS='SENIOR';
PROC MEANS DATA=PURCHASE;                                   ELSE              STATUS='JUNIOR';
VAR WITHTAX;
                                                            You could stare at this one for a long time if you
The WITHTAX= assignment statement generates                 didn't know that the topic is "missing semicolons."
an error message. The syntax would have been                When the INPUT statement is missing a semicolon,
correct if the statement had appeared in a DATA             the software "thinks" that the IF / THEN statement
step. However, the statement is part of PROC                is part of the INPUT statement. The words within
MEANS. So the software generates an error                   the IF / THEN statement are taken to be additional
message saying that the assignment statement is             variables which the INPUT statement should read.
invalid or used out of proper order. The message            The INPUT statement ends with the semicolon
is only partially clear, since the statement is valid.      before the ELSE statement. With no IF / THEN
The problem is that it is used out of proper order.         statement, the ELSE statement is illegal.

                                                            The lesson here: when a statement appears to be
#2: Informat with Columns                                   perfectly valid but still generates an error message,
                                                            check for a missing semicolon on the PREVIOUS
In an INPUT statement, informats must immediately           statement.
follow a variable name. A legal example:

INPUT     @21 D_BIRTH MMDDYY8.;

An illegal version:
Consider a DATA step which creates three datasets     *****************************
at the same time:                                     **                         **
                                                      ** Program:                **
                                                      **                         **
DATA MALES FEMALES BADDATA;                           ** Date:                   **
SET IN.EVERYONE;                                      **                         **
IF GENDER='M' THEN OUTPUT MALES;                      ** Purpose:                **
ELSE IF GENDER='F' THEN OUTPUT FEMALES;               **                         **
ELSE OUTPUT BADDATA;                                  ** Programmer:             **
                                                      **                         **
A perfectly legitimate program. But what about this   *****************************;
one:
                                                      Despite the number of lines, this is still just a single
DATA MILK                                             SAS comment statement beginning with an asterisk
SET ALL.COWS;                                         and ending with a semicolon.
CUPS = 4 * QUARTS;
                                                      My personal preference is to use the first title line to
Because of the missing semicolon on the DATA          indicate the name of the program which created the
statement, this program is asking that three          output. Many times, the output gets separated from
datasets be created in one DATA step: MILK, SET,      the program. It's nice to be able to look at some
and ALL.COWS. The program contains no SET             output months down the road and to know
statement, and does not read from any source of       immediately which program created that output. As
data. Technically, however, these are legal           long as the client finds the output acceptable in this
statements. So all three datasets on the DATA         form, it makes good sense to reserve the first title
statement get created with one observation and        line for the program name.
two variables.
                                                      Beginning errors of comission fall into a few
The dataset ALL.COWS just got destroyed! This is      categories. First of all, it is perfectly okay to leave
a truly bad result! In addition, there would be no    a blank line in a SAS program. It's even a good
error or warning messages. (The LOG would             idea to use spacing (as well as indentation) to
contain a note about the number of observations in    make your programs easy to read. However, there
each of the datasets.) In my opinion, the names       is no need to comment out that blank line.
SET, MERGE, and UPDATE should be illegal as
dataset names to guard against this possibility.      Secondly, when using a comment statement, make
Version 7 of the SAS software will provide such an    sure it ends with a semicolon. While this might be
option. A global option will allow the words SET,     overkill, it is acceptable:
MERGE, UPDATE, and RETAIN (or other DATA
step key words as well) to trigger an error           * PROGRAM:       PROJECT/LIB/PROG1.SAS;
message.                                              *                                     ;
                                                      * PURPOSE:       ILLUSTRATE COMMENTS ;

#4: Comments                                          DATA SENIORS;
                                                      INFILE RAWDATA;
                                                      INPUT NAME $ 1-20;
Commenting errors fall into two categories: using
comments improperly and failing to use comments       Of course, it would work equally well if the final two
where they should be used.                            asterisks and the first two semicolons were
                                                      removed. Omitting a final semicolon is a common
As a general rule, programs should contain a          error:
comment at the beginning, showing (at a minimum)
the name and purpose of the program. Other            * PROGRAM:       PROJECT/LIB/PROG1.SAS
information, such as creation date, name of the
programmer, and relationship of the program to          PURPOSE:       ILLUSTRATE COMMENTS
other programs, can be added if appropriate.
                                                      DATA SENIORS;
Different styles exist, ranging from a single         INFILE RAWDATA;
comment statement to a formal block that looks        INPUT NAME $ 1-20;
more like this:
Now the comment statement ends with the first           98 = Refused to answer
semicolon, after the word SENIORS. The software         99 = Want to check with spouse
complains that the INFILE statement is invalid or
used out of proper order, because the program           These values cause havoc when processing the
does not contain a DATA statement.                      data later. For example, to avoid averaging in the
                                                        nonresponsive answers with legitimate answers,
Finally, when using embedded comments, avoid            PROC MEANS would have to process just one
placing /* in columns 1 and 2 of the program.           variable at a time:

DATA JUNIORS;                                           PROC MEANS DATA=SURVEY;
INFILE RAWDATA                                          VAR Q1;
/* SOMETIMES USE RAWDATA2 */;                           WHERE Q1 NOT IN (97, 98, 99);
INPUT NAME $ 1-20;
                                                        A better solution would be to recode these values
Under the MVS operating system, /* in columns 1         to missing values in the permanent dataset. The
and 2 is considered to be job control language, not     SAS software supports 28 different missing values
part of SAS language. This program would                (., .A through .Z, and ._ ), so you won't lose the
immediately end with an incomplete INFILE               distinction between 97, 98, and 99:
statement. Even if you're not working under MVS,
it is usually a good idea to design your programs to    IF      Q1=97 THEN Q1=.A;
be portable from one machine to the next. So            ELSE IF Q1=98 THEN Q1=.B;
avoid /* in columns 1 and 2.                            ELSE IF Q1=99 THEN Q1=.C;


#5: Lengths of Variables                                #7: $8 vs. $8.

The DATA step assigns a length to variables based       These INPUT statements produce quite different
on the first mention of the variable. For example,      results:
this code assigns GENDER a length of 4:
                                                        INPUT @21 LASTNAME $8.;
                                                        INPUT @21 LASTNAME $8;
IF TYPE=1 THEN GENDER='MALE';
ELSE GENDER='FEMALE';
                                                        With a dot, 8. means "read 8 characters beginning
The order of the data has no effect on the length of    at the current location on the line (column 21)."
GENDER. The first value of TYPE could be 1 or 2.        LASTNAME will contain the contents of columns 21
For that matter, all values of TYPE could be 2.         through 28. Without a dot, 8 means "read the
Still, the variable GENDER has a length of 4.           contents of column 8." Even when the INPUT
                                                        statement first moves to column 21 (@21),
This situation produces no error or warning             LASTNAME will be whatever is in column 8. The
message. GENDER takes on values of MALE and             second INPUT statement really says, "First, move
FEMA.                                                   to column 21. Then take the value of LASTNAME
                                                        from column 8." That's exactly the result, with no
A straightforward solution exists. Assign a length to   error or warning message.
GENDER BEFORE the IF / THEN statements:

LENGTH GENDER $ 6;                                      #8: Using ELSE

                                                        ELSE can cause trouble in a few ways. Failing to
#6: Missing versus 99                                   use ELSE makes a program take slightly longer to
                                                        run. For example, the second set of statements
Often, data contain a special code to indicate a        runs faster than the first:
missing value. For example, a survey might ask,
"On a scale of 1 to 10, please rate the importance      IF GENDER='F' THEN TYPE='FEMALE';
of these items." The survey might record answers        IF GENDER='M' THEN TYPE='MALE';
of 1 to 10, and use the following codes for missing     IF GENDER='F' THEN TYPE='FEMALE';
answers:                                                ELSE IF GENDER='M' THEN TYPE='MALE';

97 = Don't know                                         Failing to use ELSE above is a minor error. The
only bad thing that happens is that the program         It turns out that the first 100 observations contain
takes longer to run. When beginning to use ELSE,        zero observations having RANK='GENERAL'. So
programmers frequently forget to consider the           the programmer tries again using:
possibility of bad data:
                                                        OPTIONS OBS=1000;
IF GENDER='F' THEN TYPE='FEMALE';
ELSE TYPE='MALE';                                       Again, there are no generals in the first 1000
                                                        observations. So the programmer tries:
GENDER may have contained values other than M
or F. Finally, in allowing for bad data, programmers    OPTIONS OBS=5000;
may use poor logic:
                                                        All of a sudden the program runs for a while,
IF GENDER='F' THEN TYPE='FEMALE';                       because there were 2000 generals in the first 5000
IF GENDER='M' THEN TYPE='MALE';                         observations. To get around this trial and error
ELSE TYPE='??';
                                                        approach, count the number of generals in the
Since the program contains only one ELSE                DATA step:
statement, TYPE will never be FEMALE. Any
values of FEMALE get changed to ?? by the final         DATA SAMPLE;
                                                        INFILE RAWDATA;
statement. Again, the solution is straightforward:      INPUT NAME $ RANK $ SERIAL $;
                                                        IF RANK='GENERAL';
IF GENDER='F' THEN TYPE='FEMALE';                       N + 1;
ELSE IF GENDER='M' THEN TYPE='MALE';                    OUTPUT;
ELSE TYPE='??';                                         IF N=100 THEN STOP;

One final note. While the SAS software provides         Without an OUTPUT statement, SAMPLE would
extreme flexibility with respect to spacing and         contain only 99 observations. The STOP statement
indentation, there must be a blank before the word      means "right here, right now, without doing any
THEN. Many letters (T, D, X, B) trigger a different     futher work." The DATA step would have ended
interpretation of your expression if they immediately   before outputting the final observation.
follow a closing quote.

                                                        #10: Counting with PROC MEANS
#9: Testing Programs and Subsetting
                                                        Use each procedure for its intended purpose.
In the testing stage, programmers often run on a        PROC FREQ counts, while PROC MEANS
subset of the data until the bugs are worked out.       computes statistics. The most common error in this
Beginning programmers run through a few                 category is trying to count with PROC MEANS:
variations on subsetting technique, such as (from
worst to best):                                         PROC MEANS DATA=SALES N;
                                                        VAR AMOUNT;
IF _N_ <= 100;                                          BY STATE;
IF _N_ > 100 THEN STOP;
OPTIONS OBS=100;                                        versus

Still, all of these techniques can run into trouble     PROC FREQ DATA=SALES;
when trying to produce such a sample while also         TABLES STATE;
using a subsetting IF. For example, the objective
here is to test on a sample of 10 observations:         Given that SALES contains the numeric variable
                                                        AMOUNT, and given that AMOUNT never has a
OPTIONS OBS=100;                                        missing value, and given that SALES is in sorted
                                                        order by STATE, the N statistic in PROC MEANS
DATA SAMPLE;                                            counts how many times each STATE appears in
INFILE RAWDATA;                                         the dataset.
INPUT NAME $ RANK $ SERIAL $;
IF RANK='GENERAL';
Use PROC FREQ to count. Look at the complexity             on how the systems programmers have set up the
of the programs as well as all the conditions that         operating system:
have to be right for PROC MEANS to do the job
correctly. In addition, what happens if AMOUNT                1.   The system may bounce out the job with a
takes on a missing value? PROC MEANS still                         JCL error. That's the preferable result.
works. However, the N statistic no longer counts
the number of times each STATE appears. It                    2.   The job may run, but the system would
actually counts the number of nonmissing                           create a second, uncataloged version of the
AMOUNTs for each STATE. While there is no error                    dataset. The original version, containing
message, the output is incorrect.                                  the garbage, remains cataloged by name.
                                                                   So when a program refers to the dataset
                                                                   later, it always finds the dataset containing
#11: DISP=(NEW,CATLG,DELETE)                                       the garbage.

This is JCL, not SAS language. Although this error         If the programmer notices the situation, it's easy
applies to one operating system only, I have               enough to correct. Either delete the dataset before
included it here because its effects are quite difficult   rerunning the job, or else change the JCL to read
to trace.                                                  DISP=OLD instead of DISP=NEW. However, this
                                                           error is extremely difficult to notice. The only
DISP=(NEW,CATLG,DELETE) is part of the                     message is buried in the JCL log (not the SAS log)
standard JCL DD statement to create a new SAS              with a small note next to the dataset name saying
data library. The three parameters mean:                   NOT CATLGD 2. Even experienced programmers
                                                           rarely check that portion of the output. In fact,
NEW: the current status of the dataset. It doesn't         many programmers inspect the output on the
exist, so create one by that name                          terminal screen, discover there is no longer an error
                                                           message, and delete the job so it isn't even
CATLG: when the job successfully completes,                available to inspect later.
catalog the dataset by name.
                                                           On a related note, most MVS systems periodically
DELETE: if the job ends abnormally instead, delete         examine the disk packs for uncataloged datasets,
the dataset.                                               either deleting them or reporting them to the
                                                           owners. Usually (perhaps not if you are a
What happens when the program contains a SAS               timesharing customer) you would not rack up a bill
error, such as:                                            forever for storing an uncataloged dataset.

DATAIN.LARGE;
                                                           #12: Failing to Check the LOG
Despite the SAS error, the MVS operating system
determines that the job completes successfully.            Just because you get output, it doesn't mean the
The third parameter (DELETE) only kicks in if their        program is correct. Check the SAS log to confirm
was a SYSTEM error, not a SAS error. (For                  that the program ran as expected. Here are a few
example, the job requested 1 minute of CPU time            situations where you will find output, but the
but actually hit that limit without completing.) So        program is incorrect.
the operating system creates the dataset, and
catalogs it by name.                                       When a procedure uses a BY statement, the
                                                           software works with one value of the BY variable at
Next, the programmer fixes the SAS error, changing         a time. It does not check the entire dataset to
the statement to:                                          make sure that the data are in sorted order.
                                                           Instead, output gets generated for values of the BY
DATA IN.LARGE;                                             variable until the software discovers an observation
                                                           is not in sorted order. At that point, the software
When the new job gets submitted (using the original        generates an error message on the SAS log. Net
JCL), the operating system has a problem. The job          result: there is output (albeit
says DISP=NEW (create a new dataset under that
name), but a dataset already exists with that name.
The dataset may contain garbage, but it does exist.
Two things could happen at this point, depending
incomplete) but the SAS log contains an error             All comments and questions are welcome! Contact
message.                                                  information:

Some methods of execution involve submitting a                    Bob Virgile
program.SAS file and automatically generating a                   Robert Virgile Associates, Inc.
program.LOG and program.LIS (or LST) file. When                   3 Rock Street
running jobs in this fashion, the previous execution              Woburn, MA 01801
of the program already created a .LIS file. If a                  (781) 938-0307
revised job runs, it may contain errors and not                   virgile@mediaone.net
generate a .LIS file. However, the .LIS file still
exists with reasonable looking output, since it is left   SAS is a registered trademark of SAS Institute Inc.
over from the previous execution of a working
program.

Finally, the SAS log may contain notes about logic
errors which make the output incorrect. Consider
this program:

DATA INCOMES;
MERGE MOM POP;
BY ID;
TOTALINC = MOMINC + POPINC;
PROC MEANS DATA=INCOMES;
VAR TOTALINC;

When you inspect the output, it might look very
reasonable. But consider what would happen if
POPINC were actually a character variable that
usually took on numerals as a value, but could
occasionally be "N/A" or "> 100000." For most
observations, the software would convert the
numerals to the proper numeric value, and compute
TOTALINC. But when POPINC could not be
converted to numeric, TOTALINC would be missing.
PROC MEANS would automatically discard those
observations and print the average value of the
nonmissing TOTALINCs. How could you know this
was happening? Check the SAS log!

What should you check for? At a minimum, check
for error and warning messages. Verify that each
dataset contains approximately the expected
number of variables and observations. Check for
any notes indicating an unexpected action took
place, such as numeric to character conversion, or
missing values being generated as a result of
performing mathematical operations on missing
values. Finally, verify that each procedure
generated approximately the expected number of
pages of output.

Más contenido relacionado

Similar a Twelve Common SAS Programming Mistakes

A c program of Phonebook application
A c program of Phonebook applicationA c program of Phonebook application
A c program of Phonebook applicationsvrohith 9
 
CCv1.0 ppt02 - operators
CCv1.0 ppt02 - operatorsCCv1.0 ppt02 - operators
CCv1.0 ppt02 - operatorsappAcademy
 
Exception
ExceptionException
Exceptionwork
 
Exception
ExceptionException
Exceptionwork
 
Introduction to computer science
Introduction to computer scienceIntroduction to computer science
Introduction to computer scienceumardanjumamaiwada
 
Studying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityStudying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityNicolas Bettenburg
 
Java script the weird part (PART I )
Java script the weird part (PART I )Java script the weird part (PART I )
Java script the weird part (PART I )AndryRajohnson
 
Understanding SAS Data Step Processing
Understanding SAS Data Step ProcessingUnderstanding SAS Data Step Processing
Understanding SAS Data Step Processingguest2160992
 
Keywords used in javascript
Keywords used in javascriptKeywords used in javascript
Keywords used in javascriptMax Friel
 
The Sieve of Eratosthenes - Part 1
The Sieve of Eratosthenes - Part 1The Sieve of Eratosthenes - Part 1
The Sieve of Eratosthenes - Part 1Philip Schwarz
 
Measuring Code Quality in WTF/min.
Measuring Code Quality in WTF/min. Measuring Code Quality in WTF/min.
Measuring Code Quality in WTF/min. David Gómez García
 
Measuring code quality:WTF/min by @dgomezg
Measuring code quality:WTF/min by @dgomezgMeasuring code quality:WTF/min by @dgomezg
Measuring code quality:WTF/min by @dgomezgAutentia
 
COMMON ABEND CODES
COMMON ABEND CODESCOMMON ABEND CODES
COMMON ABEND CODESNirmal Pati
 
The Sieve of Eratosthenes - Part 1 - with minor corrections
The Sieve of Eratosthenes - Part 1 - with minor correctionsThe Sieve of Eratosthenes - Part 1 - with minor corrections
The Sieve of Eratosthenes - Part 1 - with minor correctionsPhilip Schwarz
 
C Condition and Loop part 2.pdf
C Condition and Loop part 2.pdfC Condition and Loop part 2.pdf
C Condition and Loop part 2.pdfvenkataprashanth3
 
Save time by applying clean code principles
Save time by applying clean code principlesSave time by applying clean code principles
Save time by applying clean code principlesEdorian
 

Similar a Twelve Common SAS Programming Mistakes (20)

A c program of Phonebook application
A c program of Phonebook applicationA c program of Phonebook application
A c program of Phonebook application
 
CCv1.0 ppt02 - operators
CCv1.0 ppt02 - operatorsCCv1.0 ppt02 - operators
CCv1.0 ppt02 - operators
 
Exception
ExceptionException
Exception
 
Exception
ExceptionException
Exception
 
lecture 6
 lecture 6 lecture 6
lecture 6
 
Introduction to computer science
Introduction to computer scienceIntroduction to computer science
Introduction to computer science
 
Studying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software QualityStudying the impact of Social Structures on Software Quality
Studying the impact of Social Structures on Software Quality
 
Java script the weird part (PART I )
Java script the weird part (PART I )Java script the weird part (PART I )
Java script the weird part (PART I )
 
nullcon 2011 - Fuzzing with Complexities
nullcon 2011 - Fuzzing with Complexitiesnullcon 2011 - Fuzzing with Complexities
nullcon 2011 - Fuzzing with Complexities
 
Vb introduction.
Vb introduction.Vb introduction.
Vb introduction.
 
Understanding SAS Data Step Processing
Understanding SAS Data Step ProcessingUnderstanding SAS Data Step Processing
Understanding SAS Data Step Processing
 
Keywords used in javascript
Keywords used in javascriptKeywords used in javascript
Keywords used in javascript
 
The Sieve of Eratosthenes - Part 1
The Sieve of Eratosthenes - Part 1The Sieve of Eratosthenes - Part 1
The Sieve of Eratosthenes - Part 1
 
Measuring Code Quality in WTF/min.
Measuring Code Quality in WTF/min. Measuring Code Quality in WTF/min.
Measuring Code Quality in WTF/min.
 
Measuring code quality:WTF/min by @dgomezg
Measuring code quality:WTF/min by @dgomezgMeasuring code quality:WTF/min by @dgomezg
Measuring code quality:WTF/min by @dgomezg
 
COMMON ABEND CODES
COMMON ABEND CODESCOMMON ABEND CODES
COMMON ABEND CODES
 
The Sieve of Eratosthenes - Part 1 - with minor corrections
The Sieve of Eratosthenes - Part 1 - with minor correctionsThe Sieve of Eratosthenes - Part 1 - with minor corrections
The Sieve of Eratosthenes - Part 1 - with minor corrections
 
C Condition and Loop part 2.pdf
C Condition and Loop part 2.pdfC Condition and Loop part 2.pdf
C Condition and Loop part 2.pdf
 
Save time by applying clean code principles
Save time by applying clean code principlesSave time by applying clean code principles
Save time by applying clean code principles
 
appledoc_style
appledoc_styleappledoc_style
appledoc_style
 

Último

8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,dollysharma2066
 
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdfBreath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdfJess Walker
 
Dhule Call Girls #9907093804 Contact Number Escorts Service Dhule
Dhule Call Girls #9907093804 Contact Number Escorts Service DhuleDhule Call Girls #9907093804 Contact Number Escorts Service Dhule
Dhule Call Girls #9907093804 Contact Number Escorts Service Dhulesrsj9000
 
Postal Ballot procedure for employees to utilise
Postal Ballot procedure for employees to utilisePostal Ballot procedure for employees to utilise
Postal Ballot procedure for employees to utiliseccsubcollector
 
call girls in candolim beach 9870370636] NORTH GOA ..
call girls in candolim beach 9870370636] NORTH GOA ..call girls in candolim beach 9870370636] NORTH GOA ..
call girls in candolim beach 9870370636] NORTH GOA ..nishakur201
 
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot AndCall Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot AndPooja Nehwal
 
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...ur8mqw8e
 
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girlsPooja Nehwal
 
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改atducpo
 
Cheap Rate ➥8448380779 ▻Call Girls In Mg Road Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Mg Road GurgaonCheap Rate ➥8448380779 ▻Call Girls In Mg Road Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Mg Road GurgaonDelhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual serviceanilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceanilsa9823
 
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...anilsa9823
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...Leko Durda
 
Lilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxLilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxABMWeaklings
 
Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666nishakur201
 

Último (20)

8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
 
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdfBreath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
 
Dhule Call Girls #9907093804 Contact Number Escorts Service Dhule
Dhule Call Girls #9907093804 Contact Number Escorts Service DhuleDhule Call Girls #9907093804 Contact Number Escorts Service Dhule
Dhule Call Girls #9907093804 Contact Number Escorts Service Dhule
 
Postal Ballot procedure for employees to utilise
Postal Ballot procedure for employees to utilisePostal Ballot procedure for employees to utilise
Postal Ballot procedure for employees to utilise
 
call girls in candolim beach 9870370636] NORTH GOA ..
call girls in candolim beach 9870370636] NORTH GOA ..call girls in candolim beach 9870370636] NORTH GOA ..
call girls in candolim beach 9870370636] NORTH GOA ..
 
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot AndCall Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
 
Model Call Girl in Lado Sarai Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Lado Sarai Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Lado Sarai Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Lado Sarai Delhi reach out to us at 🔝9953056974🔝
 
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
 
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
 
escort service sasti (*~Call Girls in Paschim Vihar Metro❤️9953056974
escort service  sasti (*~Call Girls in Paschim Vihar Metro❤️9953056974escort service  sasti (*~Call Girls in Paschim Vihar Metro❤️9953056974
escort service sasti (*~Call Girls in Paschim Vihar Metro❤️9953056974
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
 
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
 
Cheap Rate ➥8448380779 ▻Call Girls In Mg Road Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Mg Road GurgaonCheap Rate ➥8448380779 ▻Call Girls In Mg Road Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Mg Road Gurgaon
 
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
 
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
 
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
 
Lilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxLilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptx
 
Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666
 

Twelve Common SAS Programming Mistakes

  • 1. The Dirty Dozen: Twelve Common Programming Mistakes Bob Virgile Robert Virgile Associates, Inc. Overview INPUT D_BIRTH 21-28 MMDDYY8.; As programmers begin using the SAS® software, This type of error, while common, is not fatal. The their programs often contain many of the same software generates an error message, and the error types of mistakes. A seemingly innocuous error, message is somewhat clear. One hybrid exists, to such as omitting a semicolon, may generate results indicate the number of implied positions after a ranging from an obvious error message to an decimal point. These two statements are both impossible to decipher error message to destroying legal, and are equivalent to one another: an existing dataset. This paper covers typical errors made at an introductory level, including those INPUT AMOUNT 11-18 .2; which produce no error message but still generate INPUT @11 AMOUNT 8.2; the wrong result. Let's move on to situations where the error message is difficult to decipher. #1: Step-by-Step Execution #3: Missing Semicolons SAS programs execute one DATA or PROC step at a time. A program cannot mix and match the In this program, the software complains that the steps. Here is a typical attempt: ELSE statement has no matching IF / THEN statement. What could be the problem? DATA PURCHASE; SET SALES; TOTAL = PRICE * QUANTITY; DATA NEW; INFILE RAWDATA; PROC MEANS DATA=PURCHASE; INPUT NAME $16. /* First + Last */ VAR TOTAL; RACE 17 /* 1-digit code */ AGE 18-20 /* Some oldies! */ WITHTAX = TOTAL * (1 + TAXRATE); IF AGE >= 65 THEN STATUS='SENIOR'; PROC MEANS DATA=PURCHASE; ELSE STATUS='JUNIOR'; VAR WITHTAX; You could stare at this one for a long time if you The WITHTAX= assignment statement generates didn't know that the topic is "missing semicolons." an error message. The syntax would have been When the INPUT statement is missing a semicolon, correct if the statement had appeared in a DATA the software "thinks" that the IF / THEN statement step. However, the statement is part of PROC is part of the INPUT statement. The words within MEANS. So the software generates an error the IF / THEN statement are taken to be additional message saying that the assignment statement is variables which the INPUT statement should read. invalid or used out of proper order. The message The INPUT statement ends with the semicolon is only partially clear, since the statement is valid. before the ELSE statement. With no IF / THEN The problem is that it is used out of proper order. statement, the ELSE statement is illegal. The lesson here: when a statement appears to be #2: Informat with Columns perfectly valid but still generates an error message, check for a missing semicolon on the PREVIOUS In an INPUT statement, informats must immediately statement. follow a variable name. A legal example: INPUT @21 D_BIRTH MMDDYY8.; An illegal version:
  • 2. Consider a DATA step which creates three datasets ***************************** at the same time: ** ** ** Program: ** ** ** DATA MALES FEMALES BADDATA; ** Date: ** SET IN.EVERYONE; ** ** IF GENDER='M' THEN OUTPUT MALES; ** Purpose: ** ELSE IF GENDER='F' THEN OUTPUT FEMALES; ** ** ELSE OUTPUT BADDATA; ** Programmer: ** ** ** A perfectly legitimate program. But what about this *****************************; one: Despite the number of lines, this is still just a single DATA MILK SAS comment statement beginning with an asterisk SET ALL.COWS; and ending with a semicolon. CUPS = 4 * QUARTS; My personal preference is to use the first title line to Because of the missing semicolon on the DATA indicate the name of the program which created the statement, this program is asking that three output. Many times, the output gets separated from datasets be created in one DATA step: MILK, SET, the program. It's nice to be able to look at some and ALL.COWS. The program contains no SET output months down the road and to know statement, and does not read from any source of immediately which program created that output. As data. Technically, however, these are legal long as the client finds the output acceptable in this statements. So all three datasets on the DATA form, it makes good sense to reserve the first title statement get created with one observation and line for the program name. two variables. Beginning errors of comission fall into a few The dataset ALL.COWS just got destroyed! This is categories. First of all, it is perfectly okay to leave a truly bad result! In addition, there would be no a blank line in a SAS program. It's even a good error or warning messages. (The LOG would idea to use spacing (as well as indentation) to contain a note about the number of observations in make your programs easy to read. However, there each of the datasets.) In my opinion, the names is no need to comment out that blank line. SET, MERGE, and UPDATE should be illegal as dataset names to guard against this possibility. Secondly, when using a comment statement, make Version 7 of the SAS software will provide such an sure it ends with a semicolon. While this might be option. A global option will allow the words SET, overkill, it is acceptable: MERGE, UPDATE, and RETAIN (or other DATA step key words as well) to trigger an error * PROGRAM: PROJECT/LIB/PROG1.SAS; message. * ; * PURPOSE: ILLUSTRATE COMMENTS ; #4: Comments DATA SENIORS; INFILE RAWDATA; INPUT NAME $ 1-20; Commenting errors fall into two categories: using comments improperly and failing to use comments Of course, it would work equally well if the final two where they should be used. asterisks and the first two semicolons were removed. Omitting a final semicolon is a common As a general rule, programs should contain a error: comment at the beginning, showing (at a minimum) the name and purpose of the program. Other * PROGRAM: PROJECT/LIB/PROG1.SAS information, such as creation date, name of the programmer, and relationship of the program to PURPOSE: ILLUSTRATE COMMENTS other programs, can be added if appropriate. DATA SENIORS; Different styles exist, ranging from a single INFILE RAWDATA; comment statement to a formal block that looks INPUT NAME $ 1-20; more like this:
  • 3. Now the comment statement ends with the first 98 = Refused to answer semicolon, after the word SENIORS. The software 99 = Want to check with spouse complains that the INFILE statement is invalid or used out of proper order, because the program These values cause havoc when processing the does not contain a DATA statement. data later. For example, to avoid averaging in the nonresponsive answers with legitimate answers, Finally, when using embedded comments, avoid PROC MEANS would have to process just one placing /* in columns 1 and 2 of the program. variable at a time: DATA JUNIORS; PROC MEANS DATA=SURVEY; INFILE RAWDATA VAR Q1; /* SOMETIMES USE RAWDATA2 */; WHERE Q1 NOT IN (97, 98, 99); INPUT NAME $ 1-20; A better solution would be to recode these values Under the MVS operating system, /* in columns 1 to missing values in the permanent dataset. The and 2 is considered to be job control language, not SAS software supports 28 different missing values part of SAS language. This program would (., .A through .Z, and ._ ), so you won't lose the immediately end with an incomplete INFILE distinction between 97, 98, and 99: statement. Even if you're not working under MVS, it is usually a good idea to design your programs to IF Q1=97 THEN Q1=.A; be portable from one machine to the next. So ELSE IF Q1=98 THEN Q1=.B; avoid /* in columns 1 and 2. ELSE IF Q1=99 THEN Q1=.C; #5: Lengths of Variables #7: $8 vs. $8. The DATA step assigns a length to variables based These INPUT statements produce quite different on the first mention of the variable. For example, results: this code assigns GENDER a length of 4: INPUT @21 LASTNAME $8.; INPUT @21 LASTNAME $8; IF TYPE=1 THEN GENDER='MALE'; ELSE GENDER='FEMALE'; With a dot, 8. means "read 8 characters beginning The order of the data has no effect on the length of at the current location on the line (column 21)." GENDER. The first value of TYPE could be 1 or 2. LASTNAME will contain the contents of columns 21 For that matter, all values of TYPE could be 2. through 28. Without a dot, 8 means "read the Still, the variable GENDER has a length of 4. contents of column 8." Even when the INPUT statement first moves to column 21 (@21), This situation produces no error or warning LASTNAME will be whatever is in column 8. The message. GENDER takes on values of MALE and second INPUT statement really says, "First, move FEMA. to column 21. Then take the value of LASTNAME from column 8." That's exactly the result, with no A straightforward solution exists. Assign a length to error or warning message. GENDER BEFORE the IF / THEN statements: LENGTH GENDER $ 6; #8: Using ELSE ELSE can cause trouble in a few ways. Failing to #6: Missing versus 99 use ELSE makes a program take slightly longer to run. For example, the second set of statements Often, data contain a special code to indicate a runs faster than the first: missing value. For example, a survey might ask, "On a scale of 1 to 10, please rate the importance IF GENDER='F' THEN TYPE='FEMALE'; of these items." The survey might record answers IF GENDER='M' THEN TYPE='MALE'; of 1 to 10, and use the following codes for missing IF GENDER='F' THEN TYPE='FEMALE'; answers: ELSE IF GENDER='M' THEN TYPE='MALE'; 97 = Don't know Failing to use ELSE above is a minor error. The
  • 4. only bad thing that happens is that the program It turns out that the first 100 observations contain takes longer to run. When beginning to use ELSE, zero observations having RANK='GENERAL'. So programmers frequently forget to consider the the programmer tries again using: possibility of bad data: OPTIONS OBS=1000; IF GENDER='F' THEN TYPE='FEMALE'; ELSE TYPE='MALE'; Again, there are no generals in the first 1000 observations. So the programmer tries: GENDER may have contained values other than M or F. Finally, in allowing for bad data, programmers OPTIONS OBS=5000; may use poor logic: All of a sudden the program runs for a while, IF GENDER='F' THEN TYPE='FEMALE'; because there were 2000 generals in the first 5000 IF GENDER='M' THEN TYPE='MALE'; observations. To get around this trial and error ELSE TYPE='??'; approach, count the number of generals in the Since the program contains only one ELSE DATA step: statement, TYPE will never be FEMALE. Any values of FEMALE get changed to ?? by the final DATA SAMPLE; INFILE RAWDATA; statement. Again, the solution is straightforward: INPUT NAME $ RANK $ SERIAL $; IF RANK='GENERAL'; IF GENDER='F' THEN TYPE='FEMALE'; N + 1; ELSE IF GENDER='M' THEN TYPE='MALE'; OUTPUT; ELSE TYPE='??'; IF N=100 THEN STOP; One final note. While the SAS software provides Without an OUTPUT statement, SAMPLE would extreme flexibility with respect to spacing and contain only 99 observations. The STOP statement indentation, there must be a blank before the word means "right here, right now, without doing any THEN. Many letters (T, D, X, B) trigger a different futher work." The DATA step would have ended interpretation of your expression if they immediately before outputting the final observation. follow a closing quote. #10: Counting with PROC MEANS #9: Testing Programs and Subsetting Use each procedure for its intended purpose. In the testing stage, programmers often run on a PROC FREQ counts, while PROC MEANS subset of the data until the bugs are worked out. computes statistics. The most common error in this Beginning programmers run through a few category is trying to count with PROC MEANS: variations on subsetting technique, such as (from worst to best): PROC MEANS DATA=SALES N; VAR AMOUNT; IF _N_ <= 100; BY STATE; IF _N_ > 100 THEN STOP; OPTIONS OBS=100; versus Still, all of these techniques can run into trouble PROC FREQ DATA=SALES; when trying to produce such a sample while also TABLES STATE; using a subsetting IF. For example, the objective here is to test on a sample of 10 observations: Given that SALES contains the numeric variable AMOUNT, and given that AMOUNT never has a OPTIONS OBS=100; missing value, and given that SALES is in sorted order by STATE, the N statistic in PROC MEANS DATA SAMPLE; counts how many times each STATE appears in INFILE RAWDATA; the dataset. INPUT NAME $ RANK $ SERIAL $; IF RANK='GENERAL';
  • 5. Use PROC FREQ to count. Look at the complexity on how the systems programmers have set up the of the programs as well as all the conditions that operating system: have to be right for PROC MEANS to do the job correctly. In addition, what happens if AMOUNT 1. The system may bounce out the job with a takes on a missing value? PROC MEANS still JCL error. That's the preferable result. works. However, the N statistic no longer counts the number of times each STATE appears. It 2. The job may run, but the system would actually counts the number of nonmissing create a second, uncataloged version of the AMOUNTs for each STATE. While there is no error dataset. The original version, containing message, the output is incorrect. the garbage, remains cataloged by name. So when a program refers to the dataset later, it always finds the dataset containing #11: DISP=(NEW,CATLG,DELETE) the garbage. This is JCL, not SAS language. Although this error If the programmer notices the situation, it's easy applies to one operating system only, I have enough to correct. Either delete the dataset before included it here because its effects are quite difficult rerunning the job, or else change the JCL to read to trace. DISP=OLD instead of DISP=NEW. However, this error is extremely difficult to notice. The only DISP=(NEW,CATLG,DELETE) is part of the message is buried in the JCL log (not the SAS log) standard JCL DD statement to create a new SAS with a small note next to the dataset name saying data library. The three parameters mean: NOT CATLGD 2. Even experienced programmers rarely check that portion of the output. In fact, NEW: the current status of the dataset. It doesn't many programmers inspect the output on the exist, so create one by that name terminal screen, discover there is no longer an error message, and delete the job so it isn't even CATLG: when the job successfully completes, available to inspect later. catalog the dataset by name. On a related note, most MVS systems periodically DELETE: if the job ends abnormally instead, delete examine the disk packs for uncataloged datasets, the dataset. either deleting them or reporting them to the owners. Usually (perhaps not if you are a What happens when the program contains a SAS timesharing customer) you would not rack up a bill error, such as: forever for storing an uncataloged dataset. DATAIN.LARGE; #12: Failing to Check the LOG Despite the SAS error, the MVS operating system determines that the job completes successfully. Just because you get output, it doesn't mean the The third parameter (DELETE) only kicks in if their program is correct. Check the SAS log to confirm was a SYSTEM error, not a SAS error. (For that the program ran as expected. Here are a few example, the job requested 1 minute of CPU time situations where you will find output, but the but actually hit that limit without completing.) So program is incorrect. the operating system creates the dataset, and catalogs it by name. When a procedure uses a BY statement, the software works with one value of the BY variable at Next, the programmer fixes the SAS error, changing a time. It does not check the entire dataset to the statement to: make sure that the data are in sorted order. Instead, output gets generated for values of the BY DATA IN.LARGE; variable until the software discovers an observation is not in sorted order. At that point, the software When the new job gets submitted (using the original generates an error message on the SAS log. Net JCL), the operating system has a problem. The job result: there is output (albeit says DISP=NEW (create a new dataset under that name), but a dataset already exists with that name. The dataset may contain garbage, but it does exist. Two things could happen at this point, depending
  • 6. incomplete) but the SAS log contains an error All comments and questions are welcome! Contact message. information: Some methods of execution involve submitting a Bob Virgile program.SAS file and automatically generating a Robert Virgile Associates, Inc. program.LOG and program.LIS (or LST) file. When 3 Rock Street running jobs in this fashion, the previous execution Woburn, MA 01801 of the program already created a .LIS file. If a (781) 938-0307 revised job runs, it may contain errors and not virgile@mediaone.net generate a .LIS file. However, the .LIS file still exists with reasonable looking output, since it is left SAS is a registered trademark of SAS Institute Inc. over from the previous execution of a working program. Finally, the SAS log may contain notes about logic errors which make the output incorrect. Consider this program: DATA INCOMES; MERGE MOM POP; BY ID; TOTALINC = MOMINC + POPINC; PROC MEANS DATA=INCOMES; VAR TOTALINC; When you inspect the output, it might look very reasonable. But consider what would happen if POPINC were actually a character variable that usually took on numerals as a value, but could occasionally be "N/A" or "> 100000." For most observations, the software would convert the numerals to the proper numeric value, and compute TOTALINC. But when POPINC could not be converted to numeric, TOTALINC would be missing. PROC MEANS would automatically discard those observations and print the average value of the nonmissing TOTALINCs. How could you know this was happening? Check the SAS log! What should you check for? At a minimum, check for error and warning messages. Verify that each dataset contains approximately the expected number of variables and observations. Check for any notes indicating an unexpected action took place, such as numeric to character conversion, or missing values being generated as a result of performing mathematical operations on missing values. Finally, verify that each procedure generated approximately the expected number of pages of output.