1. http://www.slideshare.net/statcave/week8-finalexamlivelecture-2010june
http://www.facebook.com/statcave
Data Types
General speaking, statistical techniques are determined by the type of data. A basic
understanding about the data types is helpful for choosing statistical procedures. In SPSS,
a column is for a variable and a row is for a case. There are, generally speaking, two major
types of data:
Qualitative variables: The data values are non-numeric categories.
Examples: Blood type, Gender.
Quantitative variables: The data values are counts or numerical measurements. A
quantitative variable can be either discrete such as # of students receiving an 'A' in a
class, or continuous such as GPA, salary and so on.
Another way of classifying data is by the measurement scales. In statistics, there are four
generally used measurement scales:
Nominal data: data values are non-numeric group labels. For example, Gender variable
can be defined as male = 0 and female =1.
Ordinal data (we sometimes call 'Discrete Data'): data values are categorical and may
be ranked in some numerically meaningful way. For example, strongly disagree to
strong agree may be defined as 1 to 5.
Continuous data:
Interval data : data values are ranged in a real interval, which can be as large as from
negative infinity to positive infinity. The difference between two values are
meaningful, however, the ratio of two interval data is not meaningful. For example
temperature, IQ. Today is 1.2 times hotter than yesterday is not much useful nor
meaningful.
Ratio data: Both difference and ratio of two values are meaningful. For example,
salary, weight.
NOTE: The statistical procedures mentioned below are demonstrated using movie clips in
the Statistical Procedures Page.
http://www.cst.cmich.edu/users/lee1c/spss/datatype.htm
2. Nominal
Nominal data are items which are differentiated by a simple naming system. The only
thing a nominal scale does is to say that items being measured have something in
common, although this may not be described.
Nominal items may have numbers assigned to them. This may appear ordinal but is not --
these are used to simplify capture and referencing.
Nominal items are usually categorical, in that they belong to a definable category, such as
'employees'.
Example: The number pinned on a sports person. A set of countries.
Ordinal
Items on an ordinal scale are set into some kind of order by their position on the scale.
This may indicate such as temporal position, superiority, etc.
The order of items is often defined by assigning numbers to them to show their relative
position. Letters or other sequential symbols may also be used as appropriate.
Ordinal items are usually categorical, in that they belong to a definable category, such as
'1956 marathon runners'.
You cannot do arithmetic with ordinal numbers -- they show sequence only.
Example
The first, third and fifth person in a race.
Pay bands in an organization, as denoted by A, B, C and D.
Interval
Interval data (also sometimes called integer) is measured along a scale in which each
position is equidistant from one another. This allows for the distance between two pairs to
be equivalent in some way.
This is often used in psychological experiments that measure attributes along an arbitrary
scale between two extremes.
Interval data cannot be multiplied or divided.
Example
My level of happiness, rated from 1 to 10.
Temperature, in degrees Fahrenheit.
3. Ratio
In a ratio scale, numbers can be compared as multiples of one another. Thus one person
can be twice as tall as another person. Important also, the number zero has meaning.
Thus the difference between a person of 35 and a person 38 is the same as the difference
between people who are 12 and 15. A person can also have an age of zero.
Ratio data can be multiplied and divided because not only is the difference between 1 and
2 the same as between 3 and 4, but also that 4 is twice as much as 2.
Interval and ratio data measure quantities and hence are quantitative. Because they can
be measured on a scale, they are also called scale data.
Example
A person's weight
The number of pizzas I can eat before fainting
Mode: The mode of a set of data is the value in the set that occurs most often.
Problem: The number of points scored in a series of
football games is listed below. Which score
occurred most often?
7, 13, 18, 24, 9, 3, 18
Solution: Ordering the scores from least to greatest,
we get:
3, 7, 9, 13, 18, 18, 24
Answer: The score which occurs most often is 18.
This problem really asked us to find the mode of a set of 7 numbers.
Mode: The mode of a set of data is the value in the set that occurs most often.
Biomodal: a setof data is bimodal if it has 2 modes(i.e.,twonumbersthatoccurmost often,andthe
same numberof times).
No Mode: When each value occurs only once in the data set, there is no mode for this set of data.
Zero Mode, when 0 occurs most often in the set.
0 mode and no mode are two different data sets.
4. In a blindexperiment,the subjectsdonotknow whethertheyare inthe treatmentgroup or the control
group.In orderto have a blindexperimentwithhumansubjects,itisusuallynecessarytoadministera
placebotothe control group.
Blindingisabasic tool to preventconsciousaswell assubconsciousbiasinresearch.Forexample,in
opentaste testscomparingdifferentproductbrands,consumersusuallychoose theirregularbrand.
However,inblindtaste tests,wherethe brandidentitiesare concealed,consumersmayfavoradifferent
brand.
In a double-blindexperiment,neitherthe subjects northe people evaluatingthe subjectsknowswhois
inthe treatmentgroup andwhois inthe control group.Thismoderates the placeboeffectandguards
againstconsciousandunconsciousprejudicefororagainstthe treatmentonthe part of the evaluators.
Double-blindmethodscanbe appliedtoanyexperimental situationwhere there isthe possibilitythat
the resultswill be affectedbyconsciousorunconscious biasonthe part of the experimenter. Random
assignmentof the subjecttothe experimentalorcontrol groupis a critical part of double-blindresearch
design.The keythatidentifiesthe subjectsandwhichgrouptheybelongedtoiskeptbya thirdparty
and notgivento the researchersuntil the studyisover
Computer-controlledexperimentsare sometimesalsoreferredtoasdouble-blindexperiments,since
software shouldnotcause anybias. Ananalogyto the above,the part of the software thatprovides
interactionwiththe humanisthe blindedresearcher,while the partof the software thatdefinesthe key
isthe thirdparty.An example isthe ABXtest,where the humansubjecthasto identifyanunknown
stimulusXas beingeitherA or B.
http://www.worldlingo.com/ma/enwiki/en/Blind_experiment
http://www.worldlingo.com/ma/enwiki/en/Blind_experiment
Randomsamplingensuresthateverymemberof the populationhasanequal chance of beingselected.
Simple randomsample isasamplingmethod
Randomizationisthe processof makingsomethingrandom.
Randomization-basedinference isespeciallyimportantinexperimental designandinsurveysampling.
Randomizationinvolvesrandomlyallocatingthe experimental unitsacrossthe treatmentgroups.For
example,if anexperimentcomparesanew drugagainsta standard drug,thenthe patientsshouldbe
allocatedtoeitherthe newdrugor to the standarddrug control usingrandomization.
5. Simple randomsamplingreferstoasamplingmethodthathasthe followingproperties.
The populationconsistsof N objects.
The sample consistsof n objects.
All possible samplesof nobjectsare equallylikelytooccur.
The main benefitof simple randomsamplingisthatitguaranteesthatthe sample chosenis
representative of the population.Thisensuresthatthe statistical conclusionswillbe valid.
There are manywaysto obtaina simple randomsample.One waywouldbe the lotterymethod.Eachof
the N populationmembersisassignedaunique number.The numbersare placedinabowl