SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
Using SPSS for Statistical Analysis
A course for Beginners
by Leo Fernandez
Session II: Describing Data
1 Types of Variables
In statistics, variables describe attributes of the objects being studied. The value of the variable can
'vary' from one entity or sample element to another.
For example, a person's nationality could be a variable if we are studying people. One person
could be "Mexican" and another "Sudanese". Further, if we consider the two entities described
above (a Mexican and a Sudanese), we might also observe some other attributes of these entities.
For example, the Mexican's height could be 5ft 2in and that of the Sudanese, 5ft 10in.
Variables can be grouped under two broad categories: Qualitative vs. Quantitative Variables.
 Qualitative: Qualitative variables are also known as "categorical" variables. They describe
attributes of objects by names or labels. A person's religion (e.g Hindu, Muslim, Christian)
or the colour of the person's eyes (e.g., black, brown, blue) are examples of qualitative or
categorical variables.
 Quantitative: Quantitative variables are also know as "numeric" variables. They record a
measurable quantity. For example, when we speak of the population of a city, we are
talking about the number of people in the city - a measurable attribute of the city. Therefore,
population would be a quantitative variable.
In statistical data analysis variables are of following types:
Table - 01
Type Category Description Example
Nominal Categorical
Indicates membership to collection or
category.
There is no implied ordering.
eg: Nationality:
1 = Australian
2 = British
3 = Canadian
4 = Dane
5 = Other
Ordinal Categorical Indicates a difference, and indicates the
direction of the difference.
The items in the category can be arranged
from
low to high.
Difference between items are not in equal
intervals
eg: Education
1 = No education
2 = Primary School
3 = High School
4 = Graduate
5 = Postgraduate
Interval Numeric
Indicates a difference with direction.
Amount of difference are in equal intervals.
eg: Age
Recorded in whole
years
Ratio Numeric
Indicates a difference with direction.
Amount of difference are in equal intervals
A zero point is defined.
eg: Income
2 SPSS: Reading data into SPSS
The SPSS program has an interface for data entry. We were introduced to that interface in Session
I. When a researcher decides to use SPSS for data analysis, it is more than likely that the data has
already been collected and stored using an office productivity tool like a spreadsheet program.
Data from external sources can be read into SPSS through the following steps:
1. In the SPSS program, navigate to File → Open → Data
A dialogue box will open.
2. In the dialogue box, click on the down arrow against the field named Files of Type.
Choose “Excel (*.xls, *.xlsx, *.xlsm)”
3. Navigate to the folder containing the Excel file that holds your data and select that file. [Use
the file titanic_ex_II.xlsx that was sent to you.]
4. Click “Open”
A dialogue box appears.
5. Make sure the check-box is ticked against the label “Read Variable names from the first
row of data”.
6. Click “OK”
The Excel file is loaded into SPSS.
Click on the Data View tab at the bottom of the screen.
Viola! You see the data just as you did in your spreadsheet program.
Click on the Variable View tab at the bottom of the screen.
This screen displays the names of the columns from the imported Excel file and the properties
associated with each column.
You have successfully read an external data source into SPSS.
SPSS can recognize and read data directly from a select list of formats (as can be seen in the
drop-down for File of Type field of the File → Open → Data dialogue box.
Now that we have imported the data into SPSS, you can view the imported the data in the Data
View and Variable View screens.
The Data View screen displays the data in rows and columns (like a spreadsheet). You can scroll
down the screen to verify that all the data has been correctly imported into the appropriate
columns.
The Variable View screen displays the column names and properties of the data contained in each
column.
3 SPSS: Defining Variables
When you examine the imported data closely, you may notice that the column names are cryptic
(or if you had spaces in the column names of the spreadsheet, the spaces are removed and the
column name is a string of concatenated words). SPSS column names cannot contain spaces and
a few other special characters.
In SPSS, column names are called 'variables'.
It is considered good practice to assign descriptive labels to these variables and define their
properties before proceeding with the analysis of the data.
Defining a variable involves giving it a name, specifying its type, the values the variable can take
(e.g., 1, 2, 3), the scale of measurement and so on.
Variable definitions can be done in SPSS any of the following two screens:
1. The Variable View screen
2. Data → Define Variable Properties screen
1. The Variable View screen
The Variable View screen lists the variables (columns) in the data file and the properties
associated with each of those variables:
Table - 02
Property Description
Name The name of the variable. Variable names can not contain spaces. To
change a variable's name, double-click on the variable that you wish
to re-name. Type your new variable name.
Type The type of variable. This column refers to how the data is stored, the
number of characters it can contain besides other formatting
information. This is not to be confused with the Type of Variables
discussed at the beginning of Session II.
SPSS recognizes the following types:
Numeric, Comma, Dot, Scientific notation, Date, Dollar, Custom
currency, String and Restricted Numeric (integer with leading zeros)
To change a variable's type, click inside the cell corresponding to the
“Type” column for that variable. A square "..." button will appear; click
on it to open the Variable Type window. Click the option that best
matches the type of variable. Click OK.
Width The number of digits displayed for numerical values or the number of
characters for a string variable.
Decimals The number of digits after a decimal point for each value of the
variable (applicable to numeric variables)
Label A descriptive definition or display name for the variable. The variable
label appears in the output in place of its name (often vriptic)
Example: The variable sibsp might be described by the label
“Number of Siblings or Spouse on board".
Value For coded categorical variables, the value label(s) that should be
associated with each category code. Value labels are useful primarily
for categorical (i.e., nominal or ordinal) variables, especially if they
have been recorded as codes (e.g., 1, 2, 3). It is good practice to give
each value a label so that you (and anyone looking at your data or
results) understands what each value represents.
Example: In the sample dataset, the variable pclass represents the
Passenger Class. The values 1, 2, 3 represent the categories “1st
Class”, “2nd Class” and “3rd Class”, respectively.
Missing The user-defined values that indicate data are missing for a variable
(e.g., -99). Note that this does not affect or eliminate SPSS's default
missing value code ("."). This column merely allows the user to specify
alternative codes for missing values.
Columns The width of each column in the Data View spreadsheet.
Align The alignment of content in the cells of the Data View spreadsheet.
Measure The level of measurement for the variable (e.g., nominal, ordinal, or
scale).
Role The role that a variable will play in your analyses (i.e., independent
variable, dependent variable, both independent and dependent). Some
options in SPSS allow you to pre-select variables for particular
analyses based on their defined roles. Any variable that meets the role
requirements will be available for use in such analyses. You can choose
from the following roles for each variable:
 Input: The variable will be used as a predictor (independent
variable). This is the default assignment for variables.
 Target: The variable will be used as an outcome (dependent
variable).
 Both: The variable will be used as both a predictor and an
outcome (independent and dependent variable).
2. Data → Define Variable Properties screen
The Define Variable Properties window is an efficient way of defining many variables at once, or
defining many variables that share the same formatting. Click Data → Define Variable Properties.
Figure - 01
The Define Variable Properties window will open.
Figure - 02
Select the variables you wish to define in the box on the left and click on the blue arrow button.
The selected variables will be moved to the box on the right under the heading 'Variables to
Scan”. The Continue button is now enabled.
Click on Continue.
SPSS will scan the selected variables and identify the existing properties associated with those
variables and display them in a screen where you can view and change the properties for each
variable as shown in the following screen.
Figure - 03
On the screen in Figure - 03 you select each variable in turn from the scanned variables list and
enter the properties as described in Table - 02.
When you are done describing all the variables click OK
ADVANCED:
When you have completed defining the properties of all the variables, instead of clicking on the
OK button, you can click on the Paste button. This will open the SPSS Syntax Editor screen
into which all the SPSS commands used to define the variable properties will be pasted.
You can save this syntax into a file for future use. The next time if you have to import your file
again into SPSS, you will not need to go through all the steps shown above to define the
variable properties. You can open the syntax file you save and execute all the commands in it.
The variable properties will be defined.
4 Inspecting the data: Frequency Distributions
Before we get on with the analysis of the data, we need to inspect the data in order to:
 spot abnormalities and data entry errors
 observe extreme values (example Age could have been entered as 250 in a particular
case)
 check if data for each variable is within the defined range
 check for missing values
 identify variables that can be recoded into groups (e.g. Fare could be recoded into: Low,
Medium and High)
 get a general feel about the integrity and suitability of the data for further analysis
A useful first step is to use the SPSS Frequencies command found from the menu.
1. Click on Analyze → Descriptive Statistics → Frequencies
2. Select all the variables in the list (except ones that represent serial number of cases or in
the example data set the “Name of Passenger” variable – because one would expect a
name to be unique to a passenger).
3. Click on the Statistics button
4. In the Frequency statistics window, place a check mark against: Mean, Median, Mode and
any other optional statistic that you may be interested in examining.
5. Click on Continue
6. Click on Close
SPSS opens an Output Window and displays pages of summary statistics and frequency tables
Concept Check:
1) Give 3 examples of Nominal variables in the Titanic dataset.
ANSWER:
3) What is the difference between Nominal and Ordinal variables?
ANSWER:
4) List the variables in the Titanic dataset that:
a) Can be placed on a scale of measurement.
ANSWER:
b) Can be considered Ordinal Variables.
ANSWER:
c) Are strings.
ANSWER:
5) Can .docx files be read into SPSS ?
ANSWER:
for all the selected variables.
The summary statistics table gives the mean, median and mode for each variable. The mean is
meaningful only for numeric scale variables like Age and Fare. It also shows the number of
missing cases for each variable.
Inspect the frequency distribution table of each variable.
From the frequency tables, it is easy to spot:
 abnormal and extreme values (example Age could have been entered as 250 in a
particular case)
 data that is outside the defined range for a variable
 number of cases with missing values ( i.e cases which have no data recorded for the
variable)
 identify variables that can be recoded into groups (e.g. Fare could be recoded into: Low,
Medium and High)
 get a general feel about the integrity and suitability of the data for further analysis
As you would have observed, for variables measured on a scale (like Age and Fare), the
frequency table could be very long because each case is likely to have a unique number.
For scale variables, it is more informative to generate descriptive statistics.
1. Go to Analyze → Descriptive Statistics → Descriptives
2. Select the variables Age and Fare
3. Set the Options for the statistics you wish to see
4. Click OK.
We have used the Frequency distribution here to detect wrongly coded variables, to spot
abnormalities / extreme values in the data.
However the Frequency distribution plays a greater role in statistics. It provides a useful summary
of the data being studied. It is a part of a collection of statistics known as Descriptive Statistics
which are used to describe the data. In particular the frequency distribution gives measures of
central tendency and dispersion, indicating the mean, median and mode and spread of the data for
each variable.
Test - 1
Look at the outputs of the Descriptive Statistics and Frequencies command and answer the
following:
1) What is the mean Fare paid by passengers on the Titanic ?
ANSWER:
2) What is the mode of the Fares paid by passengers on the Titanic ?
ANSWER:
3) How many cases in the Titanic dataset do not have Age entered ?
ANSWER:
4) What is the mean Age of passengers on the Titanic ?
ANSWER:
5) What is the median Age of passengers on the Titanic ?
ANSWER:
6) What is the proportion of passengers on the Titanic who survived ?
ANSWER:
7) How many passengers on the Titanic did not pay any fare ?
ANSWER:
5 SPSS: Histograms
While the Frequency distribution displays a table of numbers that summarizes the distribution of
values of each variable, showing how the values are spread from minimum to maximum, the
Histogram provides a graphical representation of the distribution.
In SPSS, histograms are produced from the same menu option that produced frequency tables.
1. Click on Analyze → Descriptive Statistics → Frequencies
2. Select the variables for which you want to produce histograms (select Age and Pclass as
an example)
3. At the bottom of the variable select screen, uncheck the check-box against the label
“Display Frequency Tables”
4. Click on the Charts button
5. Select the radio button Histograms
6. Click on Continue
7. Click on Close
The histogram will be displayed in the currently open SPSS output window.
Figure - 04
6 Correcting and Cleaning Data
The process of inspecting the data through frequency distributions and histograms, often reveal
input errors and other problems with the data. The errors identified in the previous section need to
be corrected before proceeding with analysis.
What are these errors that we are talking about and how do we correct them if we find such errors?
Typical examples of data errors could be:
 incorrect coding of values
 typing mistakes
 shifting of data from one column into the neighboring column
 outliers or extreme values
Data cleaning activity typically takes a large chunk of time in data analysis. It is a very important
step nevertheless because erroneous data can lead to erroneous conclusions.
This session will be conducted as a hands-on exercise under supervision, according to the
following instructions.
Lab Exercise: Correcting and Cleaning Data
1. Read the supplied data file: titanic_ex_II.csv
2. Re-run the commands used in Section 4 - Inspecting the data
3. Inspect the outputs produced.
4. Make a list of the errors identified in the outputs.
5. Identify the cases which have these errors.
6. Correct the errors using the data editor.
7. Re-run the commands used in Section 4 to confirm that the errors have been rectified.
8. Save the data file.
Session II: Homework Exercise:
1. Read the data from the file “body.csv” into SPSS. Study the accompanying file “body.txt”
which provides information about the dataset.
2. The article associated with this data set appears in the Journal of Statistics Education,
Volume 11, Number 2 (July 2003). Read this article here:
http://www.amstat.org/publications/jse/v11n2/datasets.heinz.html
3. Once the data has been read into SPSS, assign meaningful variable labels and value
labels, using the information provided in the file “body.txt”.
4. Produce frequency tables, histograms and box plots from this dataset.
OR
1. Read the data from the file “cafedata.xls” into SPSS. Study the accompanying file
“cafedata_documentation.txt” which provides information about this dataset.
2. The article associated with this data set appears in the Journal of Statistics Education,
Volume 19, Number 1 (March 2011) issue. Read this article here:
http://www.amstat.org/publications/jse/v19n1/depaolo.pdf
3. Once the data has been read into SPSS, assign meaningful variable labels and value
labels.
4. Produce some frequency tables and histograms.
Online Resources:
1. https://statistics.laerd.com/statistical-guides/types-of-variable.php
2. https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php

Más contenido relacionado

La actualidad más candente

introduction to spss
introduction to spssintroduction to spss
introduction to spss
Omid Minooee
 
Access presentation
Access presentationAccess presentation
Access presentation
DUSPviz
 
Topic 4 intro spss_stata 30032012 sy_srini
Topic 4 intro spss_stata 30032012 sy_sriniTopic 4 intro spss_stata 30032012 sy_srini
Topic 4 intro spss_stata 30032012 sy_srini
SM Lalon
 
Basic introduction to ms access
Basic introduction to ms accessBasic introduction to ms access
Basic introduction to ms access
jigeno
 

La actualidad más candente (19)

Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
 
SPSS introduction Presentation
SPSS introduction Presentation SPSS introduction Presentation
SPSS introduction Presentation
 
Statistical Package for Social Science (SPSS)
Statistical Package for Social Science (SPSS)Statistical Package for Social Science (SPSS)
Statistical Package for Social Science (SPSS)
 
Spss training notes
Spss training notesSpss training notes
Spss training notes
 
Data processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overviewData processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overview
 
Spps training presentation 1
Spps training presentation 1Spps training presentation 1
Spps training presentation 1
 
DATA HANDLING FOR SPSS
DATA HANDLING FOR SPSSDATA HANDLING FOR SPSS
DATA HANDLING FOR SPSS
 
Spss
SpssSpss
Spss
 
SPS intro
SPS introSPS intro
SPS intro
 
introduction to spss
introduction to spssintroduction to spss
introduction to spss
 
(Manual spss)
(Manual spss)(Manual spss)
(Manual spss)
 
SPSS :Introduction for beginners
SPSS :Introduction for beginners SPSS :Introduction for beginners
SPSS :Introduction for beginners
 
An introduction to spss
An introduction to spssAn introduction to spss
An introduction to spss
 
Spss data capturing training
Spss data capturing trainingSpss data capturing training
Spss data capturing training
 
SPSS statistics - how to use SPSS
SPSS statistics - how to use SPSSSPSS statistics - how to use SPSS
SPSS statistics - how to use SPSS
 
Access presentation
Access presentationAccess presentation
Access presentation
 
Topic 4 intro spss_stata 30032012 sy_srini
Topic 4 intro spss_stata 30032012 sy_sriniTopic 4 intro spss_stata 30032012 sy_srini
Topic 4 intro spss_stata 30032012 sy_srini
 
Tutorial of SPSS
Tutorial of SPSSTutorial of SPSS
Tutorial of SPSS
 
Basic introduction to ms access
Basic introduction to ms accessBasic introduction to ms access
Basic introduction to ms access
 

Destacado (7)

Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Intro to Logistic Regression
Intro to Logistic RegressionIntro to Logistic Regression
Intro to Logistic Regression
 
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
 
Fault prediction using logistic regression (Python)
Fault prediction using logistic regression (Python)Fault prediction using logistic regression (Python)
Fault prediction using logistic regression (Python)
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 

Similar a Spss course session-II

Print CopyExport Output InstructionsSPSS output can be selectiv.docx
Print CopyExport Output InstructionsSPSS output can be selectiv.docxPrint CopyExport Output InstructionsSPSS output can be selectiv.docx
Print CopyExport Output InstructionsSPSS output can be selectiv.docx
stilliegeorgiana
 
Day1, session ii&iii- spss
Day1, session ii&iii- spssDay1, session ii&iii- spss
Day1, session ii&iii- spss
abir hossain
 
Introduction to Statistical package of social sciences
Introduction to Statistical package of social sciencesIntroduction to Statistical package of social sciences
Introduction to Statistical package of social sciences
prachisachdev4
 

Similar a Spss course session-II (20)

Data Coding and Data Management using SPSS
Data Coding and Data Management using SPSSData Coding and Data Management using SPSS
Data Coding and Data Management using SPSS
 
SPSS software
SPSS software SPSS software
SPSS software
 
Spss guidelines
Spss guidelinesSpss guidelines
Spss guidelines
 
SPSS PRESENTATION.PPT.pptx
SPSS PRESENTATION.PPT.pptxSPSS PRESENTATION.PPT.pptx
SPSS PRESENTATION.PPT.pptx
 
Print CopyExport Output InstructionsSPSS output can be selectiv.docx
Print CopyExport Output InstructionsSPSS output can be selectiv.docxPrint CopyExport Output InstructionsSPSS output can be selectiv.docx
Print CopyExport Output InstructionsSPSS output can be selectiv.docx
 
111249-140817070204-phpapp02.pdf
111249-140817070204-phpapp02.pdf111249-140817070204-phpapp02.pdf
111249-140817070204-phpapp02.pdf
 
Week 7 spss
Week 7 spssWeek 7 spss
Week 7 spss
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
Spss intro for engineering
Spss intro for engineeringSpss intro for engineering
Spss intro for engineering
 
6967176.ppt
6967176.ppt6967176.ppt
6967176.ppt
 
SPSS GUIDE
SPSS GUIDESPSS GUIDE
SPSS GUIDE
 
Day1, session ii&iii- spss
Day1, session ii&iii- spssDay1, session ii&iii- spss
Day1, session ii&iii- spss
 
SPSS FINAL.pdf
SPSS FINAL.pdfSPSS FINAL.pdf
SPSS FINAL.pdf
 
Beginners SPSS.ppt
Beginners SPSS.pptBeginners SPSS.ppt
Beginners SPSS.ppt
 
Software packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSSoftware packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSS
 
Statrting spss
Statrting spssStatrting spss
Statrting spss
 
extra material for practicals in spss.pptx
extra material for practicals in spss.pptxextra material for practicals in spss.pptx
extra material for practicals in spss.pptx
 
Introduction to Statistical package of social sciences
Introduction to Statistical package of social sciencesIntroduction to Statistical package of social sciences
Introduction to Statistical package of social sciences
 
BRM_Data Analysis, Interpretation and Reporting Part I.ppt
BRM_Data Analysis, Interpretation and Reporting Part I.pptBRM_Data Analysis, Interpretation and Reporting Part I.ppt
BRM_Data Analysis, Interpretation and Reporting Part I.ppt
 
Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSS
 

Último

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 

Último (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Spss course session-II

  • 1. Using SPSS for Statistical Analysis A course for Beginners by Leo Fernandez Session II: Describing Data 1 Types of Variables In statistics, variables describe attributes of the objects being studied. The value of the variable can 'vary' from one entity or sample element to another. For example, a person's nationality could be a variable if we are studying people. One person could be "Mexican" and another "Sudanese". Further, if we consider the two entities described above (a Mexican and a Sudanese), we might also observe some other attributes of these entities. For example, the Mexican's height could be 5ft 2in and that of the Sudanese, 5ft 10in. Variables can be grouped under two broad categories: Qualitative vs. Quantitative Variables.  Qualitative: Qualitative variables are also known as "categorical" variables. They describe attributes of objects by names or labels. A person's religion (e.g Hindu, Muslim, Christian) or the colour of the person's eyes (e.g., black, brown, blue) are examples of qualitative or categorical variables.  Quantitative: Quantitative variables are also know as "numeric" variables. They record a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable.
  • 2. In statistical data analysis variables are of following types: Table - 01 Type Category Description Example Nominal Categorical Indicates membership to collection or category. There is no implied ordering. eg: Nationality: 1 = Australian 2 = British 3 = Canadian 4 = Dane 5 = Other Ordinal Categorical Indicates a difference, and indicates the direction of the difference. The items in the category can be arranged from low to high. Difference between items are not in equal intervals eg: Education 1 = No education 2 = Primary School 3 = High School 4 = Graduate 5 = Postgraduate Interval Numeric Indicates a difference with direction. Amount of difference are in equal intervals. eg: Age Recorded in whole years Ratio Numeric Indicates a difference with direction. Amount of difference are in equal intervals A zero point is defined. eg: Income 2 SPSS: Reading data into SPSS The SPSS program has an interface for data entry. We were introduced to that interface in Session I. When a researcher decides to use SPSS for data analysis, it is more than likely that the data has already been collected and stored using an office productivity tool like a spreadsheet program. Data from external sources can be read into SPSS through the following steps: 1. In the SPSS program, navigate to File → Open → Data A dialogue box will open. 2. In the dialogue box, click on the down arrow against the field named Files of Type. Choose “Excel (*.xls, *.xlsx, *.xlsm)” 3. Navigate to the folder containing the Excel file that holds your data and select that file. [Use the file titanic_ex_II.xlsx that was sent to you.] 4. Click “Open” A dialogue box appears. 5. Make sure the check-box is ticked against the label “Read Variable names from the first row of data”. 6. Click “OK” The Excel file is loaded into SPSS. Click on the Data View tab at the bottom of the screen. Viola! You see the data just as you did in your spreadsheet program.
  • 3. Click on the Variable View tab at the bottom of the screen. This screen displays the names of the columns from the imported Excel file and the properties associated with each column. You have successfully read an external data source into SPSS. SPSS can recognize and read data directly from a select list of formats (as can be seen in the drop-down for File of Type field of the File → Open → Data dialogue box. Now that we have imported the data into SPSS, you can view the imported the data in the Data View and Variable View screens. The Data View screen displays the data in rows and columns (like a spreadsheet). You can scroll down the screen to verify that all the data has been correctly imported into the appropriate columns. The Variable View screen displays the column names and properties of the data contained in each column. 3 SPSS: Defining Variables When you examine the imported data closely, you may notice that the column names are cryptic (or if you had spaces in the column names of the spreadsheet, the spaces are removed and the column name is a string of concatenated words). SPSS column names cannot contain spaces and a few other special characters. In SPSS, column names are called 'variables'. It is considered good practice to assign descriptive labels to these variables and define their properties before proceeding with the analysis of the data. Defining a variable involves giving it a name, specifying its type, the values the variable can take (e.g., 1, 2, 3), the scale of measurement and so on. Variable definitions can be done in SPSS any of the following two screens: 1. The Variable View screen 2. Data → Define Variable Properties screen 1. The Variable View screen The Variable View screen lists the variables (columns) in the data file and the properties associated with each of those variables:
  • 4. Table - 02 Property Description Name The name of the variable. Variable names can not contain spaces. To change a variable's name, double-click on the variable that you wish to re-name. Type your new variable name. Type The type of variable. This column refers to how the data is stored, the number of characters it can contain besides other formatting information. This is not to be confused with the Type of Variables discussed at the beginning of Session II. SPSS recognizes the following types: Numeric, Comma, Dot, Scientific notation, Date, Dollar, Custom currency, String and Restricted Numeric (integer with leading zeros) To change a variable's type, click inside the cell corresponding to the “Type” column for that variable. A square "..." button will appear; click on it to open the Variable Type window. Click the option that best matches the type of variable. Click OK. Width The number of digits displayed for numerical values or the number of characters for a string variable. Decimals The number of digits after a decimal point for each value of the variable (applicable to numeric variables) Label A descriptive definition or display name for the variable. The variable label appears in the output in place of its name (often vriptic) Example: The variable sibsp might be described by the label “Number of Siblings or Spouse on board". Value For coded categorical variables, the value label(s) that should be associated with each category code. Value labels are useful primarily for categorical (i.e., nominal or ordinal) variables, especially if they have been recorded as codes (e.g., 1, 2, 3). It is good practice to give each value a label so that you (and anyone looking at your data or results) understands what each value represents. Example: In the sample dataset, the variable pclass represents the Passenger Class. The values 1, 2, 3 represent the categories “1st Class”, “2nd Class” and “3rd Class”, respectively. Missing The user-defined values that indicate data are missing for a variable (e.g., -99). Note that this does not affect or eliminate SPSS's default missing value code ("."). This column merely allows the user to specify alternative codes for missing values. Columns The width of each column in the Data View spreadsheet. Align The alignment of content in the cells of the Data View spreadsheet. Measure The level of measurement for the variable (e.g., nominal, ordinal, or scale). Role The role that a variable will play in your analyses (i.e., independent
  • 5. variable, dependent variable, both independent and dependent). Some options in SPSS allow you to pre-select variables for particular analyses based on their defined roles. Any variable that meets the role requirements will be available for use in such analyses. You can choose from the following roles for each variable:  Input: The variable will be used as a predictor (independent variable). This is the default assignment for variables.  Target: The variable will be used as an outcome (dependent variable).  Both: The variable will be used as both a predictor and an outcome (independent and dependent variable). 2. Data → Define Variable Properties screen The Define Variable Properties window is an efficient way of defining many variables at once, or defining many variables that share the same formatting. Click Data → Define Variable Properties. Figure - 01 The Define Variable Properties window will open. Figure - 02 Select the variables you wish to define in the box on the left and click on the blue arrow button. The selected variables will be moved to the box on the right under the heading 'Variables to Scan”. The Continue button is now enabled.
  • 6. Click on Continue. SPSS will scan the selected variables and identify the existing properties associated with those variables and display them in a screen where you can view and change the properties for each variable as shown in the following screen. Figure - 03 On the screen in Figure - 03 you select each variable in turn from the scanned variables list and enter the properties as described in Table - 02. When you are done describing all the variables click OK ADVANCED: When you have completed defining the properties of all the variables, instead of clicking on the OK button, you can click on the Paste button. This will open the SPSS Syntax Editor screen into which all the SPSS commands used to define the variable properties will be pasted. You can save this syntax into a file for future use. The next time if you have to import your file again into SPSS, you will not need to go through all the steps shown above to define the variable properties. You can open the syntax file you save and execute all the commands in it. The variable properties will be defined.
  • 7. 4 Inspecting the data: Frequency Distributions Before we get on with the analysis of the data, we need to inspect the data in order to:  spot abnormalities and data entry errors  observe extreme values (example Age could have been entered as 250 in a particular case)  check if data for each variable is within the defined range  check for missing values  identify variables that can be recoded into groups (e.g. Fare could be recoded into: Low, Medium and High)  get a general feel about the integrity and suitability of the data for further analysis A useful first step is to use the SPSS Frequencies command found from the menu. 1. Click on Analyze → Descriptive Statistics → Frequencies 2. Select all the variables in the list (except ones that represent serial number of cases or in the example data set the “Name of Passenger” variable – because one would expect a name to be unique to a passenger). 3. Click on the Statistics button 4. In the Frequency statistics window, place a check mark against: Mean, Median, Mode and any other optional statistic that you may be interested in examining. 5. Click on Continue 6. Click on Close SPSS opens an Output Window and displays pages of summary statistics and frequency tables Concept Check: 1) Give 3 examples of Nominal variables in the Titanic dataset. ANSWER: 3) What is the difference between Nominal and Ordinal variables? ANSWER: 4) List the variables in the Titanic dataset that: a) Can be placed on a scale of measurement. ANSWER: b) Can be considered Ordinal Variables. ANSWER: c) Are strings. ANSWER: 5) Can .docx files be read into SPSS ? ANSWER:
  • 8. for all the selected variables. The summary statistics table gives the mean, median and mode for each variable. The mean is meaningful only for numeric scale variables like Age and Fare. It also shows the number of missing cases for each variable. Inspect the frequency distribution table of each variable. From the frequency tables, it is easy to spot:  abnormal and extreme values (example Age could have been entered as 250 in a particular case)  data that is outside the defined range for a variable  number of cases with missing values ( i.e cases which have no data recorded for the variable)  identify variables that can be recoded into groups (e.g. Fare could be recoded into: Low, Medium and High)  get a general feel about the integrity and suitability of the data for further analysis As you would have observed, for variables measured on a scale (like Age and Fare), the frequency table could be very long because each case is likely to have a unique number. For scale variables, it is more informative to generate descriptive statistics. 1. Go to Analyze → Descriptive Statistics → Descriptives 2. Select the variables Age and Fare 3. Set the Options for the statistics you wish to see 4. Click OK. We have used the Frequency distribution here to detect wrongly coded variables, to spot abnormalities / extreme values in the data. However the Frequency distribution plays a greater role in statistics. It provides a useful summary of the data being studied. It is a part of a collection of statistics known as Descriptive Statistics which are used to describe the data. In particular the frequency distribution gives measures of central tendency and dispersion, indicating the mean, median and mode and spread of the data for each variable.
  • 9. Test - 1 Look at the outputs of the Descriptive Statistics and Frequencies command and answer the following: 1) What is the mean Fare paid by passengers on the Titanic ? ANSWER: 2) What is the mode of the Fares paid by passengers on the Titanic ? ANSWER: 3) How many cases in the Titanic dataset do not have Age entered ? ANSWER: 4) What is the mean Age of passengers on the Titanic ? ANSWER: 5) What is the median Age of passengers on the Titanic ? ANSWER: 6) What is the proportion of passengers on the Titanic who survived ? ANSWER: 7) How many passengers on the Titanic did not pay any fare ? ANSWER:
  • 10. 5 SPSS: Histograms While the Frequency distribution displays a table of numbers that summarizes the distribution of values of each variable, showing how the values are spread from minimum to maximum, the Histogram provides a graphical representation of the distribution. In SPSS, histograms are produced from the same menu option that produced frequency tables. 1. Click on Analyze → Descriptive Statistics → Frequencies 2. Select the variables for which you want to produce histograms (select Age and Pclass as an example) 3. At the bottom of the variable select screen, uncheck the check-box against the label “Display Frequency Tables” 4. Click on the Charts button 5. Select the radio button Histograms 6. Click on Continue 7. Click on Close The histogram will be displayed in the currently open SPSS output window. Figure - 04
  • 11. 6 Correcting and Cleaning Data The process of inspecting the data through frequency distributions and histograms, often reveal input errors and other problems with the data. The errors identified in the previous section need to be corrected before proceeding with analysis. What are these errors that we are talking about and how do we correct them if we find such errors? Typical examples of data errors could be:  incorrect coding of values  typing mistakes  shifting of data from one column into the neighboring column  outliers or extreme values Data cleaning activity typically takes a large chunk of time in data analysis. It is a very important step nevertheless because erroneous data can lead to erroneous conclusions. This session will be conducted as a hands-on exercise under supervision, according to the following instructions. Lab Exercise: Correcting and Cleaning Data 1. Read the supplied data file: titanic_ex_II.csv 2. Re-run the commands used in Section 4 - Inspecting the data 3. Inspect the outputs produced. 4. Make a list of the errors identified in the outputs. 5. Identify the cases which have these errors. 6. Correct the errors using the data editor. 7. Re-run the commands used in Section 4 to confirm that the errors have been rectified. 8. Save the data file. Session II: Homework Exercise: 1. Read the data from the file “body.csv” into SPSS. Study the accompanying file “body.txt” which provides information about the dataset. 2. The article associated with this data set appears in the Journal of Statistics Education, Volume 11, Number 2 (July 2003). Read this article here: http://www.amstat.org/publications/jse/v11n2/datasets.heinz.html 3. Once the data has been read into SPSS, assign meaningful variable labels and value labels, using the information provided in the file “body.txt”. 4. Produce frequency tables, histograms and box plots from this dataset.
  • 12. OR 1. Read the data from the file “cafedata.xls” into SPSS. Study the accompanying file “cafedata_documentation.txt” which provides information about this dataset. 2. The article associated with this data set appears in the Journal of Statistics Education, Volume 19, Number 1 (March 2011) issue. Read this article here: http://www.amstat.org/publications/jse/v19n1/depaolo.pdf 3. Once the data has been read into SPSS, assign meaningful variable labels and value labels. 4. Produce some frequency tables and histograms. Online Resources: 1. https://statistics.laerd.com/statistical-guides/types-of-variable.php 2. https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php