1. 2- DAYS WORKSHOP ON SPSS SYNTAX
(28th and 29th October, 2010)
Organized by: Indian Institute of Psychometry,
Kolkata
Dr. Debdulal Dutta Roy, Ph.D.
Psychology Research Unit
Indian Statistical Institute, Kolkata
Dr. D. Dutta Roy, ISI., Kolkata
2. What is SPSS ?
Initially, SPSS is
considered as
statistical package for
social sciences. But it
is noted that SPSS is
used by many non
social scientists.
Therefore it is
considered as
software for statistical
data analysis. Now,
SPSS is managed by
IBM.
ICONS OF SPSS
Dr. D. Dutta Roy, ISI., Kolkata
3. SPSS facilities
The software includes several facilities as
File management
creating new file, opening spss formatted file, extracting non SPSS file,
merging file, splitting file, transposing data
Variable management
creating new variables, recoding variable
Case management
adding cases, select cases, sorting cases
Text data analysis or Text analytics
text categorization, text clustering, concept/entity extraction, document
summarization, and entity relation modeling (i.e., learning relations between
named entities).
Numeric data analysis
Describing the data, data quality or fitting the data into statistical models,
data association, data clustering, data reliability and validity using different
statistical tools.
Dr. D. Dutta Roy, ISI., Kolkata
4. SPSS WORKSHEET
Variable view
Data view
Create variables :
Name :
Type : String, Numeric, Comma and others
Width : Length of digit
Decimal:
Label: Meaning of variable code name
Values: m=male, f=female or 1=male and 2=female
Missing: np/ 9/99/ extreme values
Columns :
Align : left, right, center
Measure: nominal, ordinal, scale
Dr. D. Dutta Roy, ISI., Kolkata
5. Assignment
In SPSS worksheet
Prepare worksheet with five variables as gender, first
name, middle name , surname and age.
Prepare list of names.
Examine their distribution using graphs and tables.
Retrieving data from excel
Retrieving data from note pad
Write in this way <Ms., Ratna, kumari, Roy, 25> in
the note pad. Retrieve the list using SPSS command
Dr. D. Dutta Roy, ISI., Kolkata
6. Assignment
Cross tabulation is useful to determine
association of two categorical variables.
Prepare spss worksheet to compute cross
tabulation between gender and anxiety.
Use both text and numeric data.
Compute chi-square.
Dr. D. Dutta Roy, ISI., Kolkata
8. Summary -1
SPSS is useful software for analysis of both text
and numeric data.
SPSS worksheet has two windows – data
window and value window. Later is used to
customize the variable.
The data saved in SPSS file can be transformed
to Excel or text.
Again, the data saved in Excel or in text format
can be retrieved into SPSS worksheet.
Dr. D. Dutta Roy, ISI., Kolkata
10. What is SPSS-Syntax ?
Syntax is a set of rules that are associated with the
language or command. SPSS syntax is useful for data
management and archiving the procedure of data
analysis. In the dissertation, presence of syntax helps
examiner to understand the procedure followed by the
researcher.
The syntax can be written in notepad and in word
document. SPSS syntax is the alternative to the point
and click mode.
It is more user friendly as user can do repetitive tasks
using syntax and can see what procedures are followed
by him for data analysis.
Dr. D. Dutta Roy, ISI., Kolkata
11. Problems of point and click
Point and click procedure provides many information.
Sometimes they are not relevant to researcher.
Researcher can restrict analytical information according
to needs.
Point and click procedure varies with different interfaces
or versions of SPSS. But syntax works well in almost all
the versions.
Statistical tool not available in SPSS can be developed by
syntax if author knows how to write syntax for example,
moderated regression analysis.
Dr. D. Dutta Roy, ISI., Kolkata
12. Syntax error
A syntax error occurs when the researcher
or individual who wrote the code had not
followed the rules of the language, the
flow chart, causing the program to fail.
The common error is missing terminator
and columns for the command line.
General command is first line starts at the
first column and the others are in the
second line starts at second column.
Dr. D. Dutta Roy, ISI., Kolkata
14. ASSIGNMENT
Write the below in syntax window and run the
program.
DESCRIPTIVES VARIABLES = ABANY ABDEFECT
ABHLTH ABNOMORE ABPOOR ABRAPE
ABSINGLE ADULTS AGE
/STATISTICS=MEAN STDDEV.
Observation:
Do you get your results ? If not, what is missing ?
Put terminators in both lines and run the program.
What is your observation ?
Can you find out continuation line ?
Dr. D. Dutta Roy, ISI., Kolkata
15. Summary -2
Syntax rule guides program in analysis of
data according to user needs.
Statements are written systematically
following syntax rules in syntax window .
One can control unnecessary output by
using syntax.
Dr. D. Dutta Roy, ISI., Kolkata
17. What is flow chart ?
The flowchart is a
means of visually
presenting the flow of
data through an
information
processing systems,
the operations
performed within the
system and the
sequence in which
they are performed.
Dr. D. Dutta Roy, ISI., Kolkata
18. Standard symbols
Start or end of the program
Computational steps or
processing function of a
program
Input or output operation
Decision making and
branching
Connector or joining of two
parts of program
Dr. D. Dutta Roy, ISI., Kolkata
19. Guidelines of flow charting
In drawing a proper flowchart, all necessary requirements
should be listed out in logical order.
The flowchart should be clear, neat and easy to follow. There
should not be any room for ambiguity in understanding
the flowchart.
The usual direction of the flow of a procedure or system
is from left to right or top to bottom.
Only one flow line should come out from a process
symbol.
Only one flow line should enter a decision symbol, but
two or three flow lines, one for each possible answer,
should leave the decision symbol.
Only one flow line is used in conjunction with terminal
symbol.
Write within standard symbols briefly. As necessary, you
can use the annotation symbol to describe data or
computational steps more clearly.
If the flowchart becomes complex, it is better to use
connector symbols to reduce the number of flow lines.
Avoid the intersection of flow lines if you want to make
it more effective and better way of communication.
Ensure that the flowchart has a logical start and finish.
It is useful to test the validity of the flowchart by
passing through it with a simple test data.
Reference: http://www.nos.org/htm/basic2.htm
Dr. D. Dutta Roy, ISI., Kolkata
20. Flow chart of correlations
INPUT TWO
SETS OF
METRIC DATA
IS THERE
MISSING DATA ?
DELETE
IS THERE
OUTLIER ?
Y
Y
N
IS STANDARD
DEVIATION = 0 ?
Y
N
DO CORRELATIONS
N
Dr. D. Dutta Roy, ISI., Kolkata
21. Summary - 3
Use of any statistical tool requires set of
specific assumptions. Flow chart helps us
to incorporate all the assumptions
systematically. This will reduce errors in
data analysis.
Therefore, syntax writer should study
thoroughly all the assumptions and their
systematic uses before selection of
statistical tool in analysis.
Dr. D. Dutta Roy, ISI., Kolkata
23. Command
Each command must begin in the first column of a
new line.
Continuation lines must be indented at least one
space.
The period at the end of the command is
optional.
If you generate command syntax by pasting dialog
box choices into a syntax window, the format of
the commands is suitable for any mode of
operation.
Dr. D. Dutta Roy, ISI., Kolkata
24. Variable names
Variable names ending in a period can cause errors in commands
created by the dialog boxes. You cannot create such variable names
in the dialog boxes, and you should generally avoid them.
SPSS command syntax is case insensitive, and three-letter
abbreviations can be used for many command specifications. You
can use as many lines as you want to specify a single command.
You can add space or break lines at almost any point where a single
blank is allowed, such as around slashes, parentheses, arithmetic
operators, or between variable names. For example,
FREQUENCIES
VARIABLES=JOBCAT GENDER
/PERCENTILES=25 50 75
/BARCHART.
and
freq var=jobcat gender /percent=25 50 75 /bar.
Dr. D. Dutta Roy, ISI., Kolkata
25. Creating new variable
There are some situations
where in new variable is
to be created in research.
For example, you are
interested to add or
multiply some weight to
any variable or you want
to multiply two variables.
Use COMPUTE command
EXERCISE
* age2 is new variable
COMPUTE age2=Age - 5.
EXECUTE.
DESCRIPTIVES
VARIABLES=age, age2
/STATISTICS=MEAN
STDDEV MIN MAX.
Descriptive Statistics
N
Minimu
m
Maximu
m Mean
Std.
Deviatio
n
Age 542 7 15 9.54 1.117
age2 542 2 10 4.5406 1.11667
Valid N (listwise) 542
Dr. D. Dutta Roy, ISI., Kolkata
26. Finding out lost file
Researcher sometimes forgets the location
of file using click menu. He can find the
file using ‘GET FILE’ syntax.
Get the file
File>new>syntax
Write below syntax
GET FILE=‘c:windowsdesktopddr.sav’.
Dr. D. Dutta Roy, ISI., Kolkata
27. Check your file
You can check validity of lost file using DISPLAY
command. This will help you to get the variable names.
GET FILE='E:ses_data_final.sav'.
* Display all variables
DISPLAY.
/* Display data of all variables
LIST
/* Display data of single variable
LIST VARIABLES = <var1>.
Here * is used for beginning comment and /* is used for
middle comment.
Dr. D. Dutta Roy, ISI., Kolkata
28. Data checking by total score
Data checking is made using if
command. Box 8.5 represents
syntax for checking the data. Here
is the assumption that total score
should not be more than 10.
Therefore the command
‘if(total>10) t2=9’ is used. After the
if command, execute command
with period sign (.) is necessary.
Output file is saved in the specific
location finally.
Exercise
GET File=
'E:ses_data_final.sav'.
if(total>10) t2=9.
Execute.
LIST variables=name, total, t2.
save outfile='e:sesout.sav'.
Output
NAME total t2
TANIA PARVIN 8 .00
BACCHU MONDAL 9 .00
HABIBUL ISLAM 9 .00
KARIM RAHAMAN 10 .00
AKTAR HUSSAIN 10 .00
LALTU MONDAL 10 .00
RAHIM RAHAMAN 10 .00
NOOR ALAM 10 .00
***** 11 9.00
SADIK JAMAL 12 9.00
TAJMIR KHATUN 8 .00
FIROJ MONDAL . .
Dr. D. Dutta Roy, ISI., Kolkata
29. Is your data good for analysis ?
Data entry error is a serious concern
for analysis of data. Extreme data
or outlier is assumed as error.
Presence of outlier sometimes
changes mean and standard
deviation. SD becomes higher
than mean. It is not necessary to
delete the outlier first as outlier
sometimes provide valid
information. It gives you
information about inequality in
distribution of data. But finding
out the outlier is important. Box
whisker plot is useful to find out
outlier.
Write this in syntax window:
EXAMINE VARIABLES=abany abd
efect
/COMPARE VARIABLE
/PLOT=BOXPLOT
/STATISTICS=NONE
/NOTOTAL
/MISSING=LISTWISE.
Another way is to study
frequencies of variables.
Frequencies variables=abany.
Dr. D. Dutta Roy, ISI., Kolkata
30. How can you find out case error?
Box-whisker plot sometimes can
not find out the cases who have
done systematic error. Suppose
you have collected job satisfaction
data using five point rating scale
of 20 items where in 10 items are
in reverse. And one case assigns 3
across all the items. Box plot can
not locate the case.
Under such condition, you can
transpose the data and compute
mean and SD for each case. Case
error can be identified if SD is
0.00 or is higher than mean. By
using FLIP command you can
transpose the data.
EXERCISE
FLIP VARIABLES=
DESCRIPTIVES
VARIABLES=
Dr. D. Dutta Roy, ISI., Kolkata
31. Relational operator
Relational operator is
used to compare values.
It is used with if
command
A relation is a logical
expression that compares
two values using a
relational operator. In the
command
IF (X EQ 0) Y=1 the
variable X and 0 are
expressions that yield the
values to be compared by
the EQ relational
operator. The following
are the relational
operators:
Symbol Definition
EQ or = Equal to
NE or ~= or ¬ = or <> Not equal to
LT or < Less than
LE or <= Less than or equal to
GT or > Greater than
GE or >= Greater than or equal to
Dr. D. Dutta Roy, ISI., Kolkata
32. Select case
When researcher wants to compute specific
statistics for specific cases, the command
select case is useful.
SELECT IF (AGE=8).
DESCRIPTIVES VARIABLES=ACH.
Dr. D. Dutta Roy, ISI., Kolkata
33. Command to filter variable
Researcher can analyze the data of specific group. Box 8.2 shows
syntax for descriptive statistics of age for the cases who are living in
specific block of district (code=1).
USE ALL.
COMPUTE filter_$=(Block_code=1).
VARIABLE LABEL filter_$ 'Block_code=1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
DATASET ACTIVATE DataSet1.
DESCRIPTIVES variables=age.
Dr. D. Dutta Roy, ISI., Kolkata
34. Summary -4
Syntax rules are important to write the
programs in syntax window.
By writing the programs, one can import
and export file, check file, list variables,
evaluate data entry error, create new
variable, select case and filter variable.
Dr. D. Dutta Roy, ISI., Kolkata
36. Item-item correlation of
five point rating scale
GET
FILE='C:UsersddroyDesktopIIP_SPSS
syntax_workshopinnovation data.sav'.
CORRELATIONS
/VARIABLES=AW1 AW2 AW6 AW10 AW18
AW19
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
There are 6 items measuring
awareness of environment. It
is assumed that 6 items are
related to each other. One can
use AW1 TO AW19 also.
This program assesses inter
correlation among 6 items.
Pair wise missing data are
deleted and level of
significance is shown.
Two tail is applicable when
direction of relationship is not
pre assumed.
NOSIG is used to flag
significant values.
Dr. D. Dutta Roy, ISI., Kolkata
37. Item total correlations
GET
FILE='C:UsersddroyDesktopIIP_
SPSS syntax_workshopinnovation
data.sav'.
compute total=AW1+ AW2+ AW6 +A
W10 +AW18+ AW19.
CORRELATIONS
/VARIABLES=AW1 to AW19, total
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
Compute command is
used to determine
total score. Later it is
used for item total
correlation.
Dr. D. Dutta Roy, ISI., Kolkata
38. Multiple regression
GET
FILE='C:UsersddroyDesktopIIP_SPSS
syntax_workshopinnovation data.sav'.
compute total=AW1+ AW2+ AW6 +AW10
+AW18+ AW19.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT total
/METHOD=ENTER AW1 AW2 AW6 AW10 AW18.
Run command should
select all otherwise
total score will not be
used.
In this model total
score is predicted by
each item.
Dr. D. Dutta Roy, ISI., Kolkata
39. Mean differences
When data were collected from two
different groups. Command of
independent t-test is
T-TEST GROUPS=IC3(3)
/MISSING=LISTWISE
/VARIABLES=total
/CRITERIA=CI(.9500).
Here IC3 is independent
variable and total is
dependent variable.
Ic3 (3) indicates 3 as cut
off points to make two
different groups.
IC3(1 2) indicates
categorization based on
value 1 and 2.
Dr. D. Dutta Roy, ISI., Kolkata
40. Chi-square statistics
CROSSTABS
/TABLES=AW1 BY AW2
/FORMAT=AVALUE TABLES
/STATISTICS=CHISQ PHI
/CELLS=COUNT
/COUNT ROUND CELL.
This examines association
between items . For multiple
items command is
TABLES=AW1 BY AW2 AW10
AW18 AW19 AW6
In above AW1 IS ROW AND
OTHERS ARE IN COL.
Dr. D. Dutta Roy, ISI., Kolkata
41. One-WAY ANOVA
ONEWAY total BY EXP
/MISSING ANALYSIS.
Here total is
dependent variable
EXP is independent
variable.
Dr. D. Dutta Roy, ISI., Kolkata
42. COMPUTE SIZE OF SAMPLE
/*-----------------------------
GETTING INPUT FILE----------------------
-------------------- .
GET
FILE='C:UsersddroyDesktopIIP_SPSS
syntax_workshopinnovation data.sav'.
/*-----------------------------
SIZE OF SAMPLE --------------------------
---------------- .
compute n=0.
compute n=n+1.
descriptives n, AW1.
n=0 indicates initialization.
N=n+1 indicates summing value
following loop.
DESCRIPTIVES <n, AW1> indicates
comparison between computed n
and aw1.
Here AW1 (numeric type and
scaling measure) is used to verify
the computed N or size of sample.
Dr. D. Dutta Roy, ISI., Kolkata
43. Summary - 5
SPSS-Syntax makes the researcher more
systematic in analysis of data. Researcher
can fulfill all the assumptions of statistical
tool systematically by writing the
programs.
The compute command is very powerful
as it assists researcher to write own
program for analysis of data.
Dr. D. Dutta Roy, ISI., Kolkata