Más contenido relacionado La actualidad más candente (20) Similar a PSPP overview and Introduction to R & R Commander (20) Más de Bernard Deepal W. Jayamanne (7) PSPP overview and Introduction to R & R Commander1. Application Software
Statistical Software
MSc. (Medical Administration) Program 2014
Post Graduate Institute of Medicine,
University of Colombo ,Sri Lanka
Dr.B.D.W.Jayamanne
M.B.B.S.,MSc.(Biostatistics),MSc.(Biomedical Informatics)
17 - 02 -2014
2. Outline -1
•
•
•
•
•
Statistics - overview
Data processing
Data types in computing
Data representation in computers
Data Analysis with computers
o
•
o
Statistical Software/Package - overview
o
•
Variable types
Choose of test
o
Stand alone -FOSS / Proprietary
Online resources
Data entering options
o
o
o
Spreadsheet
Database
Statistical software
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
3. •
•
Outline - 2
PSPP software
How to construct PSPP data file
o
•
•
•
•
o
How to import other format files
How to recode variables
Processing data
How to analyse - Parametric /Non Parametric
o
o
•
Text variables
Numeric Variables
o
Frequency
Bivariate Analysis
Cross tab
Correlation
t test
Sub group selection
Introduction of R software
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
6. Statistics
What is Statistics(ස ස ස ස ?
ස ස ස )
ස
The science of collection, analysis, and making
inference / conclusion of data.
•
•
•
Collection
Analysis
Making Inference
(* the word statistic(ස ස ස ස ) has a different
ස ස ස ස
ස
meaning)
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
7. Variable
Variable:
A quantity that vary from one unit to another ,the quantity
referred as a variable.
Eg: Height ,Weight,Blood Pressure, Crop yield -one value is
no sufficient
Discrete - Fixed number of possibilities (Blood Group)
Continuous - Infinite number of possibilities (BP) -even within
a finite interval
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
8. Constant
Constant:
Opposite of a variable .If the quantity is not vary from one
unit to another that quantity is referred as a constant.
Eg. Density of an element - one value is sufficient
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
9. Data Processing - Steps
Raw Data
Interviews
Questionnaires
Observations
Interview guides
Secondary sources
Editing
Coding
<Codebook>
Coding the data
Verifying the coded data
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
Analysis
Develop frame of analysis
Analysis
10. Data editing
•
Scrutinizing the completed research
instruments (identify and minimize )
Errors
o Incompleteness
o Misclassification
o Information gaps
o
•
Two ways
o
o
One variable at a time
One Questionnaire at a time
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
11. Data Types with computers
•
Boolen
•
Text/Character/String
o
o
•
Numeric
o
o
•
Single
Multiple
Integer
Decimal
Date /time
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
12. Levels of measurement in statistics
1. Nominal scale
a.
b.
Only indicates category
Eg.Religion -Buddhism ,Christianity,Hindu
2. Ordinal scale
a.
b.
in addition to the category,allows cases to be ordered by degree
according to the measurement
Eg: very poor,Poor,OK,Good,Excellent
3. Interval scale
a.
b.
c.
Has units measuring intervals of equal distance between values measured in linear scale
No true zero
Eg: temperature in Celsius ,Date ,Latitude
4. Ratio scale
a.
b.
Has true zero
Not measured in linear scale
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
13. Data type & Scale of measurement
Data type
Measurement
•
•
Boolean
Text
•
Nominal
•
Numeric
•
Ordinal
•
Date/Time
•
Interval & Ratio scale
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
14. Data type & Scale of measurement
•
•
Identification of correct data type for the
scale of measurement is very important
before data entry
If wrongly applied
o
o
o
o
Can’tdoappropriateanalysis
Wrong conclusions
Can recode and correct the issues
Or can re-enter
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
15. Coding of Questions
•
Open ended
o
•
o
Structured (close ended)
o
o
•
Text eg .Name
Number eg.age
Single Answer
Yes / No - True/False
Likert scale ( Agree -> Strongly disagree )
Multiple Options/List - One Answer
More than one answer
Multiple Options/List
Combined
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
16. Coding of Questions….
1.Age :
2. Have you obtained any Postgraduate qualifications
1.Yes
2.No
3.We do not have to worry because Sri Lanka is not much
affected by climate change ?
1.Strongly agree 2.Agree 3.No opinion 4.Disagree 5.Strongly
disagree
4. You obtain information
1.TV 2.Radio
3.Newspapers
5.Journals 6.Books
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
4.Internet
17. Good Data File Should...
Correct coding
of Questions
Correct Data
type
Correct scale of
measurement
Good Data File
Good Analysis
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
20. Proprietary Software list (familiar ) -2014
•
•
•
•
•
•
•
•
SPSS
MiniTAB
SAS
STATA
LISREL
MedCalc
STATISTICA
etc
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
US $ 5,500
US $ 1,400
US $ 1,440
US $ 620
21. Free Software list (?? unfamiliar )
•
•
PSPP - Analog for SPSS
R and supportive packages
o
•
•
•
•
o
R Commander
Red R
Epi Info (7)
Epi Data
Win PEPI
Openepi -online Free
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
22. Data entering options
•
•
Not necessarily be a statistical software
Spreadsheet
o
o
•
o
Databases
o
•
Openoffice Calc
MS Excel
Google Spreadsheet
o
MS Access
etc
Statistical package
o
o
o
o
Epi Data
Epi Info
PSPP / SPSS
etc
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
24. Working with PSPP
•
•
•
Ver 0.8.x
Perfect Statistics Professionally Presented!
Probabilities Sometimes Prevent Problems!
People Should Prefer PSPP!!
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
25. Introduction to PSPP
•
How to Download and install ?
o
o
o
o
•
o
Similar features with SPSS
o
o
•
Free download
Easy to install
Light weight (Small in size)
http://pspp.awardspace.com/
or simple google search download PSPP
o
Layout
Menu Commands
Scripts
Datafile & Script compatibility with SPSS
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
26. •
•
•
•
Advantages of PSPP
Free download / No subscription fees
Compatible with SPSS data files (similar)
Compatible with SPSS scripts
Multiplatform compatible - Has Linux versions
(Inter platform portability )
•
•
•
Faster than SPSS
> 1 billion variables(SPSS 2.15 billion,Excel
16,000)
> 1 billion cases (SPSS 2.15 billion,Excel 1million)
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
27. Windows in PSPP
1.Data Editor(default)
a. Data view
b. Variable view
2. Output Window
3. Syntax editor
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
28. Data Editor
•
•
Provides a convenient, spreadsheet-like method for
creating and editing data files.
This window opens automatically when you start a
session.
Switch windows
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
29. Toolbar - Data View
Save File
Jump to case
Jump to variable
OpenFile (Data/Syntax (Script),etc)
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
30. Data view
•Data View. This view displays the actual data values or
defined value labels.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
31. •
Data view
Rows are cases.
Each row represents a case or an observation. For
example, each individual respondent to a questionnaire is a
case.
•
Columns are variables.
Each column represents a variable or characteristic that is being
measured. For example, each item on a questionnaire is a
variable.
•
Cells contain values. Each cell contains a single value of a
variable for a case. The cell is where the case and the variable
intersect. Cells contain only data values.
**Unlike spreadsheet programs, cells in the Data Editor cannot
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
33. Variable view
Variable View. This view displays variable definition
information, including defined variable and value
labels, data type (for example, string, date, or
numeric), measurement level (nominal, ordinal, or
scale), and user-defined missing values.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
34. Variable view
•Variable View contains descriptions of the attributes of each variable in the
data file. In Variable View:
•Rowsarevariables.
•Columnsarevariableattributes.
•You can add or delete variables and modify attributes of
variables, including the following attributes:
•Variablename
•Datatype
•Numberofdigitsorcharacters
•Numberofdecimalplaces
•Descriptivevariableandvaluelabels
•User-defined missing values
•Columnwidth
•Measurementlevel
•Alloftheseattributesaresavedwhenyousavethedatafile.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
36. Variable Name
•
•
•
•
Each variable name must be unique; duplication is not
allowed.
Variable names can be up to 64 bytes long, and the first
character must be a letter or one of the characters @, #, or
$. Subsequent characters can be any combination of letters
and numbers
Variable names cannot contain spaces. Can keep space
using underscores
Reserved keywords cannot be used as variable names.
Reserved keywords are
ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, and
WITH.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
37. Variable Type
•
Variable Type specifies the data type for each variable. By default, all
new variables are assumed to be numeric. You can use Variable Type
to change the data type.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
38. Variable Labels
•
Can assign descriptive variable labels up to 256 characters
(128 characters in double-byte languages). Variable labels
can contain spaces and reserved characters that are not
allowed in variable names.
Missing Values
•
Missing Values defines specified data values as usermissing. For example, you might want to distinguish
between data that are missing because a respondent
refused to answer and data that are missing because the
question didn't apply to that respondent. Data values that
are specified as user-missing are flagged for special
treatment and are excluded from most calculations.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
39. Value Labels
•
•
You can assign descriptive value labels for each value of a
variable. This process is particularly useful if your data file
uses numeric codes to represent non-numeric categories
(for example, codes of 1 and 2 for male and female).
Value labels are saved with the data file. You do not need
to redefine value labels each time you open a data file.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
40. Variable Measurement Level
•
Nominal.
A variable can be treated as nominal when its values represent categories
with no intrinsic ranking (for example, the department of the company in
which an employee works). Examples of nominal variables include
region, zip code, and religious affiliation.
•
Ordinal.
A variable can be treated as ordinal when its values represent categories
with some intrinsic ranking (for example, levels of service satisfaction
from highly dissatisfied to highly satisfied). Examples of ordinal variables
include attitude scores representing degree of satisfaction or confidence
and preference rating scores.
•
Scale.
A variable can be treated as scale when its values represent ordered
categories with a meaningful metric, so that distance comparisons
between values are appropriate. Examples of scale variables include
age in years and income in thousands of dollars.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
42. Importing Data Files -Spreadsheets
•
Should be compatible with data structure
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
43. Importing Data Files -Spreadsheets
•
From the menus choose
– File
»Open
»Import Data
»Select All spreadsheets as the file type you want to view
»Open *.xls file
Access
DBsae
Excel
Other
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
47. Recode into Same Variables
•The Recode into Same Variables dialog box allows you to reassign the
values of existing variables or collapse ranges of existing values into new
values. For example, you could collapse salaries into salary range
categories.
•You can recode numeric and string variables. If you select multiple
variables, they must all be the same type. You cannot recode numeric and
string variables together.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
48. Recode into Different Variables
•The Recode into Different Variables dialog box allows you to
reassign the values of existing variables or collapse ranges
of existing values into new values for a new variable. For
example, you could collapse salaries into a new variable
containing salary-range categories.
•Youcanrecodenumericandstringvariables.
•Youcanrecodenumericvariablesintostringvariablesand
vice versa.
•Ifyouselectmultiplevariables,theymustallbethesame
type. You cannot recode numeric and string variables
together.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
50. Univariate : Frequency
•
•
The first thing to do when all the data are collected is to count how
many people gave particular answers to each question.
We look at how the sample is spread or distributed in the various
categories of each variable.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
52. Measuring Central Tendency
•
•
One of the most important way of summarizing
a distribution of values for a variable is to
establish its Central Tendency
Central Tendency : The typical value in a
distribution .
o The arithmetic mean
o The median
o The mode
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
55. Measuring Dispersion
•
The amount of variation shown by that distribution
is called dispersion.
Range
Variance
•
•
•
Standard Deviation
Range : Difference between highest and Lower
value in a distribution.
Variance : Average amount of deviation from the
mean.
Standard Deviation : Square root of the variance.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
59. Group Selection (Select Cases)
•
Specified analysis for a category /selected
group
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
60. Bivariate Analysis
•
The aim of bivariate analysis is to see whether
two variables are related.
o Cross Tabulation
o Bivariate Correlation
o t test
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
61. Crosstabulation
•
Crosstabulation are a way of displaying data so that we
fairly readily detect association between two variable.
Steps of Crosstabulation
•
•
•
•
Determine which variable is to be treated as independent.
The independent variable is usually placed across the top of
the variable and a column is drawn for each category of that
variable.
The dependent variable is usually placed on the side of the
table and a row is drawn for each category of that variable
Compare percentages for each subgroups of the
independent variable within one category of the dependent
variable at a time.
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
70. R commander - Import data files
© bdwjayamanne@gmail.com/djayamanne@yahoo.com
71. R commander - Menu commands
© bdwjayamanne@gmail.com/djayamanne@yahoo.com