SlideShare una empresa de Scribd logo
1 de 67
> x=11
> print(x)
[1] 11
> x
[1] 11
> X
Error: object 'X' not found
> y<-7
> y
[1] 7
> y<-9
> y
[1] 9

> ls()
[1] "x" "y"
> rm(y)
> y
Error: object 'y' not found
> y<-9
> x.1<-14
> x.1
[1] 14
> 1x<-22
Error: unexpected symbol in "1x"
Entering data with c
• c function for small datasets – combines or concatenates
terms together
Example: we have a count of the number of typing mistakes of a
word document:
02132011
To enter this into an R session we go like this:
> typo=c(0,2,1,3,2,0,1,1)
> typo
[1] 0 2 1 3 2 0 1 1
Learning Objectives
• What is statistics?
• Become aware of the varied applications of statistics in
business.
• Differentiate between descriptive and inferential statistics.
• Identify types of variables.
Statistics in Business
• Accounting — auditing and cost estimation
• Economics — local, regional, national, and international
economic performance
• Finance — investments and portfolio management
• Management — human resources, compensation, and quality
management
• Management Information Systems — performance of systems
which gather, summarize, and disseminate information to
various managerial levels
• Marketing — market analysis and consumer research
• International Business — market and demographic analysis
What is Statistics?
• Science dealing with
collection, analysis, interpretation and
presentation of data (with a view to making
inferences)
• Branches of statistics:
– Descriptive – graphical or numerical summaries of
data
– Inferential – making a decision based on data
What is Statistics?

Statistics in business is the study of VARIATIONS
Population Versus Sample
• Population — the whole
– a collection of all persons, objects, or items under
study
• Census — gathering data from the entire population
• Sample — gathering data on a subset of the population
– Use information about the sample to infer about the
population
Population Versus Sample
Population and Census Data
Identifier

Color

MPG

RD1

Red

12

RD2

Red

10

RD3

Red

13

RD4

Red

10

RD5

Red

13

BL1

Blue

27

BL2

Blue

24

GR1

Green

35

GR2

Green

35

GY1

Gray

15

GY2

Gray

18

GY3

Gray

17
Sample and Sample Data
Identifier

Color

MPG

RD2

Red

10

RD5

Red

13

GR1

Green

35

GY2

Gray

18
Population Versus Sample

Select a
random sample
Parameter vs. Statistic
• Parameter — descriptive measure of the population
– Usually represented by Greek letters
 denotes population parameter
 2 denotes population variance

 denotes population standard deviation

• Statistic — descriptive measure of a sample
– Usually represented by Roman letters
x denotes sample mean
s 2 denotes sample variance

s denotes sample standarddeviation
Statistics in Business
• Inferences about parameters made under
conditions of uncertainty (which are always
present in statistics)
– Uncertainty can be caused by
• Randomness in selection of a sample
• Lack of knowledge about the source of the
inferences
• Change in conditions not accounted for
Variables and Data
Variable : a characteristic of any entity being studied – is
capable of taking on different values that can be used for
analysis
e.g. stock price, ROI, market share, age of worker, income of a
family, total sales, advertising cost etc

Measurement : is done when a standard process is used
to assign numbers to particular characteristics of a
variable – may be obvious or defined
e.g. age is obvious but ROI or Labour productivity is defined

The source of each measurement is called a Sampling unit
Data : recorded measurements
Levels of Data Measurement
What are 40 and 80? may represent
Weights of two objects being shipped
Ratings received in a consumer test by two
different products
Football jersey numbers of a fullback and centreforward
Appropriateness of data analysis depends
on the level of measurement of the data gathered
Levels of Data Measurement
• Nominal — Qualitative data, typically numbers
are used only to classify or categorize the
attribute, however it is useful to retain original
verbal descriptions of categories
– 1 for “male” and 2 for “female”
– Employee identification number
– Religion, Geographic location, PIN code, Place of
birth
– Demographic questions in survey etc
Levels of Data Measurement
• Ordinal - A variable is ordinal measurable if
ranking or ordering is possible for values of
the variable.
– For example, a gold medal reflects superior
performance to a silver or bronze medal in the
Olympics. But can you say a gold and a bronze
medal average out to a silver medal?
– Preference scales are typically ordinal – how much
do you like this cereal? Like it a lot, somewhat like
it, neutral, somewhat dislike it, dislike it a lot.
Levels of Data Measurement
• Interval - In interval measurement the
distance between attributes does have
meaning.
– Numerical data typically fall into this category
– For example, when measuring temperature (in
Fahrenheit), the distance from 30-40 is same as
the distance from 70-80. The interval between
values is interpretable.
Levels of Data Measurement
• Ratio — in ratio measurement there is always
a reference point that is meaningful (either 0
for rates or 1 for ratios)
– This means that you can construct a meaningful
fraction
(or ratio) with a ratio variable.
– In applied social research most "count" variables
are ratio, for example, the number of clients in
past six months.
Visualizing the data
• Construct a frequency distribution
– For both grouped and ungrouped data

• Construct graphical summaries of qualitative
data
• Construct graphical summaries of quantitative
data
• Construct graphical summaries of two
variables
Ungrouped vs.Grouped Data
• Ungrouped data
– have not been summarized in any way
– are also called raw data

• Grouped data
– logical groupings of data exists
• i.e. age ranges (20-29, 30-39, etc.)

– have been organized into a frequency distribution
Example of Ungrouped Data
42

26

32

34

57

30

58

37

50

30

53

40

30

47

49

50

40

32

31

40

52

28

23

35

25

30

36

32

26

50

55

30

58

64

52

49

33

43

46

32

61

31

30

40

60

74

37

29

43

54

Ages of a sample of
Managers from
Urban Child Care
Centres in US
Frequency Distribution
• Frequency Distribution – summary of data
presented in the form of class intervals and
frequencies
– Vary in shape and design
– Constructed according to the individual
researcher's preferences
Frequency Distribution
• Steps in Frequency Distribution
– Step 1 – Determine range of frequency distribution
• Range is the difference between the high and the lowest
numbers

– Step 2 – Determine the number of classes
• Do not use too many, or two few classes

– Step 3 – Determine the width of the class interval
• Approx. class width can be calculated by dividing the range
by the number of classes
• Values fit into only one class
Frequency Distribution of Child
Care Manager’s Ages
Class Interval

Frequency

20-under 30

6

30-under 40

18

40-under 50

11

50-under 60

11

60-under 70

3

70-under 80

1
Relative Frequency
Relative frequency is the proportion of the total frequency that
is in any given class interval in a frequency distributionrtion of
the total frequency
that is any given class interval in a frequency distribution.

Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Total

Frequency
6
18
11
11
3
1
50

Relative
Frequency
6
.12

50
.36
18
.22

50
.22
.06
.02
1.00
Cumulative Frequency
Cumulative frequency is a running total of frequencies through
the classes of a frequency distributionen class interval in a frequency
distribution.

Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Total

Frequency
6
18
11
11
3
1
50

Cumulative
Frequency
6
24
18 + 6
35
11 + 24
46
49
50
Cumulative Relative Frequencies
Cumulative relative frequency is a running total of the relative
frequencies through the classes of a frequency distributione
total frequency
Cumulative
Relative Cumulative
Relative
Class Interval Frequency Frequency Frequency
Frequency
20-under 30
6
.12
6
.12
30-under 40
18
.36
24
.48
40-under 50
11
.22
35
.70
50-under 60
11
.22
46
.92
60-under 70
3
.06
49
.98
70-under 80
1
.02
50
1.00
Total
50
1.00
Common Statistical Graphs
– Quantitative Data
•
•
•
•
•

Histogram -- vertical bar chart of frequencies
Frequency Polygon -- line graph of frequencies
Ogive -- line graph of cumulative frequencies
Dot Plots – each data value is plotted
Stem and Leaf Plot -- Like a histogram, but
shows individual data values. Useful for small
data sets.
Histogram
• A histogram is a graphical summary of a
frequency distribution
• Labeling x-axis with class endpoints and y-axis
with frequencies, drawing a horizontal line
between two class endpoints at each frequency
value
• The number and location of rectangles (bars)
should be determined based on the sample
size and the range of the data
Data Range
42

26

32

34

57

30

58

37

50

30

53

40

30

47

49

50

40

32

31

40

52

28

23

35

25

30

36

32

26

50

55

30

58

64

52

49

33

43

46

32

61

31

30

40

60

74

37

29

43

54

Range = Largest - Smallest
= 74 - 23
= 51

Smallest
Largest
Number of Classes
and Class Width
• The number of classes should be between 5 and 15.
– Fewer than 5 classes cause excessive summarization.
– More than 15 classes leave too much detail.

• Class Width
– Divide the range by the number of classes for an
approximate class width
– Round up to a convenient number
Class midpoint or Class mark
The midpoint of each class interval is called the
class midpoint or the class mark.
Midpoints for Age Classes

Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Total

Frequency
6
18
11
11
3
1
50

Midpoint
25
35
45
55
65
75

Relative
Frequency
.12
.36
.22
.22
.06
.02
1.00

Cumulative
Frequency
6
24
35
46
49
50
Midpoints for Age Classes

Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Total

Frequency
6
18
11
11
3
1
50

Midpoint
25
35
45
55
65
75

Relative
Frequency
.12
.36
.22
.22
.06
.02
1.00

Cumulative
Frequency
6
24
35
46
49
50
Histogram
Class Interval Frequency
20-under 30
6
30-under 40
18
40-under 50
11
50-under 60
11
60-under 70
3
70-under 80
1
10
0

A graphical display of
class frequencies

Frequency

Class Interval Frequency
20-under 30
6
30-under 40
18
40-under 50
11
50-under 60
11
60-under 70
3
70-under 80
1

20

Frequency Polygon

0

10 20 30 40 50 60 70 80
Years
Relative Frequency Ogive
Cumulative
Relative
Class Interval

Frequency

20-under 30

.12

30-under 40

.48

40-under 50

.70

50-under 60

.92

60-under 70

.98

70-under 80

1.00
Stem and Leaf plot:
Safety Examination Scores for Plant Trainees

Raw Data

Stem

Leaf

86

77

91

60

55

2

3

76

92

47

88

67

3

9

23

59

72

75

83

4

79

5

569

6

07788

77

68

82

97

89

81

75

74

39

67

7

0245567789

79

83

70

78

91

8

11233689

68

49

56

94

81

9

11247
Stem and Leaf plot: Construction
Raw Data
86

77

91

60

Stem
55

Leaf

2

3

3

9

4

79

5

569

Leaf

6

07788

67

7

0245567789

78

91

8

11233689

Leaf
94

81

9

11247

76

92

47

88

23

59

72

75

77

68

82

97

81

75

74

39

79

83

70

68

49

56

Stem

Stem
67
83
89
Histogram vs. Stem and Leaf?
• So, which one should you use?
• A Stem and Leaf plot is useful for small data
sets. It shows the values of the datapoints.
• A histogram foregoes seeing the individual
values of the data for the bigger picture of the
distribution of the data
• The purpose of these graphs is to summarize a
set of data. As long as that need is met, either
one is okay to use.
Common Statistical Graphs
– Qualitative Data
• Pie Chart -- proportional representation for
categories of a whole
• Bar Chart – frequency or relative frequency of
one more categorical variables
Complaints by Amtrak Passengers
COMPLAINT

NUMBER PROPORTION

DEGREES

Stations, etc.

28,000

.40

144.0

Train
Performance
Equipment

14,700

.21

75.6

10,500

.15

54.0

Personnel

9,800

.14

50.4

Schedules,
etc.
Total

7,000

.10

36.0

70,000

1.00

360.0
Complaints by Amtrak Passengers
Second Quarter U.S. Truck Production
Second Quarter Truck
Production in the U.S.
(Hypothetical values)

Company

2d Quarter
Truck
Production

A

357,411

B

354,936

C

160,997

D

34,099

E
Totals

12,747
920,190
Second Quarter
U.S. Truck Production
Pie Chart Calculations
for Company A

Company

2d Quarter
Truck
Production

Proportion

Degrees

A

357,411

.388

140

B

354,936

.386

139

C

160,997

.175

63

D

34,099

.037

13

12,747
920,190

.014
1.000

5
360

E
Totals
Vertical Bar Graphs or Column Charts
6
5

4
Kolkata

3

Mumbai

Chennai

2
1

0
2010

2011

2012

2013
Horizontal Bar Chart
2013

2012
Chennai
Mumbai

2011

Kolkata

2010
0

2

4

6
Pareto Chart
A pareto chart is a bar chart, sorted from the most frequent to the
least frequent, overlaid with a cumulative line graph (like an ogive).
These data present the most common types of defects.
100%
90%

80
70

Frequency

100
90

80%
70%

60
50
40

60%
50%
40%

30
20

30%
20%

10
0

10%
0%

Poor
Wiring

Short in
Coil

Defective
Plug

Other
Scatter Plot
Registered
Vehicles
(1000's)

Gasoline Sales
(1000's of
Gallons)

5

60

15

120

9

90

15

140

7

60
Common Statistical Graphs –
Comparing Two Variables
• Scatter Plot -- type of display using Cartesian
coordinates to display values for two variables for
a set of data.
– The data is displayed as a collection of points, each
having the value of one variable determining the
position on the horizontal axis and the value of the
other variable determining the position on the vertical
axis.
– A scatter plot is also called a scatter chart, scatter
diagram and scatter graph.
Measures of Central Tendency
& Dispersion:
Learning Objectives

• Distinguish between measures of central
tendency, measures of variability, measures of
shape, and measures of association.
• Understand the meanings of
mean, median, mode, quartile, percentile, and
range.
• Compute
mean, median, mode, percentile, quartile, range, v
ariance, standard deviation, and mean absolute
deviation on ungrouped data.
• Differentiate between sample and population
variance and standard deviation.
Measures of Central Tendency
& Dispersion:
Learning Objectives - continued

• Understand the meaning of standard deviation as
it is applied by using the empirical rule and
Chebyshev’s theorem.
• Compute the mean, median, standard
deviation, and variance on grouped data.
• Understand box and whisker plots, skewness, and
kurtosis.
• Compute a coefficient of correlation and interpret
it.
Measures of Central Tendency:
Ungrouped Data
• Measures of central tendency yield information
about “the centre, or middle part, of a group of
numbers.”
• Measures of central tendency do not focus on the
span of the data set or how far values are from the
middle numbers
• Common Measures of Location
–
–
–
–
–

Mode
Median
Mean
Percentiles
Quartiles
Mode
• Mode - the most frequently occurring value in a
data set
– Applicable to all levels of data measurement
(nominal, ordinal, interval, and ratio)
– Can be used to determine what categories occur most
frequently
– Sometimes, no mode exists (no duplicates)

• Bimodal – In a tie for the most frequently
occurring value, two modes are listed
• Multimodal -- Data sets that contain more than
two modes
Median
• Median - middle value in an ordered array of
numbers.
– Half the data are above it, half the data are below it
– Mathematically, it is the (n+1)/2 th ordered
observation
• For an array with an odd number of terms, the median is
the middle number
– n=11 => (n+1)/2 th = 12/2 th = 6th ordered observation

• For an array with an even number of terms the median is
the average of the middle two numbers
– n=10 => (n+1)/2 th = 11/2 th = 5.5th = average of 5th and 6th
ordered observation
Arithmetic Mean
•
•
•
•

Mean is the average of a group of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data
set, including extreme values
• Computed by summing all values in the data
set and dividing the sum by the number of
values in the data set
Demonstration Problem
The number of U.S. cars in service by top car rental
companies in a recent year according to Auto Rental
News follows.
Company / Number of Cars in Service
Enterprise 643,000; Hertz 327,000; National/Alamo
233,000; Avis 204,000; Dollar/Thrifty 167,000; Budget
144,000; Advantage 20,000; U-Save 12,000; Payless
10,000; ACE 9,000; Fox 9,000; Rent-A-Wreck 7,000;
Triangle 6,000
Compute the mode, the median, and the mean.
Demonstration Problem
•

Solutions

Solution

Mode: 9,000 (two companies with 9,000 cars in
service)

Median: With 13 different companies in this
group, N = 13. The median is located at the (13
+1)/2 = 7th position. Because the data are
already ordered, median is the 7th term, which is
20,000.
Mean: μ = ∑x/N = (1,791,000/13) = 137,769.23
Percentile
• Percentile - measures of central tendency that divide a
group of data into 100 parts
• At least n% of the data lie at or below the nth
percentile, and at most (100 - n)% of the data lie
above the nth percentile
• Example: 90th percentile indicates that at 90% of the
data are equal to or less than it, and 10% of the data
lie above it
Calculating Percentiles
• To calculate the pth percentile,
– Order the data
– Calculate i = N (p/100)
– Determine the percentile
• If i is a whole number, then use the average of the
ith and (i+1)th ordered observation
• Otherwise, round i up to the next highest whole
number
Quartiles
• Quartile - measures of central tendency that divide a
group of data into four subgroups
• Q1: 25% of the data set is below the first quartile
• Q2: 50% of the data set is below the second quartile
• Q3: 75% of the data set is below the third quartile

Q2

Q1
25%

25%

Q3
25%

25%
Quartiles for Demonstration Problem

For the cars in service data, n=13, so
Q1: i = 13 (25/100) = 3.25, so use the 4th ordered observation
Q1 = 9,000
Q3: i = 13 (75/100) = 9.75, so use the 10th ordered observation
Q3 = 204,000
Which Measure Do I Use?
• Which measure of central tendency is most
appropriate?
– In general, the mean is preferred, since it has nice
mathematical properties, we shall discuss later
– The median and quartiles, are resistant to outliers

• Consider the following three datasets
–
–
–
–

1, 2, 3 (median=2, mean=2)
1, 2, 6 (median=2, mean=3)
1, 2, 30 (median=2, mean=11)
All have median=2, but the mean is sensitive to the outliers

• In general, if there are outliers, the median is preferred
to the mean
……….. To continue

Más contenido relacionado

La actualidad más candente

Application of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxApplication of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxHalim AS
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive ModelsBhaskar T
 
Lecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptLecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptMohammedAbdela7
 
1 Introduction to Biostatistics last.pptx
1 Introduction to Biostatistics last.pptx1 Introduction to Biostatistics last.pptx
1 Introduction to Biostatistics last.pptxdebabatolosa
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributionsjasondroesch
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Basics on statistical data analysis
Basics on statistical data analysisBasics on statistical data analysis
Basics on statistical data analysisDipesh Tamrakar
 
ANCOVA-Analysis-of-Covariance.pptx
ANCOVA-Analysis-of-Covariance.pptxANCOVA-Analysis-of-Covariance.pptx
ANCOVA-Analysis-of-Covariance.pptxRomielyn Beran
 
Non linearregression 4+
Non linearregression 4+Non linearregression 4+
Non linearregression 4+Ricardo Solano
 
2.4 Scatterplots, correlation, and regression
2.4 Scatterplots, correlation, and regression2.4 Scatterplots, correlation, and regression
2.4 Scatterplots, correlation, and regressionLong Beach City College
 
Generalized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects DesignsGeneralized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects Designssmackinnon
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notesJapheth Muthama
 
Sexual Exploitation Risk Assessment Framework- A Pilot Study
Sexual Exploitation Risk Assessment Framework- A Pilot StudySexual Exploitation Risk Assessment Framework- A Pilot Study
Sexual Exploitation Risk Assessment Framework- A Pilot StudyJAN COLES
 
A.6 confidence intervals
A.6  confidence intervalsA.6  confidence intervals
A.6 confidence intervalsUlster BOCES
 

La actualidad más candente (20)

Application of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxApplication of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptx
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
 
Data Analysis Using Spss T Test
Data Analysis Using Spss   T TestData Analysis Using Spss   T Test
Data Analysis Using Spss T Test
 
Lecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptLecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.ppt
 
1 Introduction to Biostatistics last.pptx
1 Introduction to Biostatistics last.pptx1 Introduction to Biostatistics last.pptx
1 Introduction to Biostatistics last.pptx
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Regression
RegressionRegression
Regression
 
Basics on statistical data analysis
Basics on statistical data analysisBasics on statistical data analysis
Basics on statistical data analysis
 
ANCOVA-Analysis-of-Covariance.pptx
ANCOVA-Analysis-of-Covariance.pptxANCOVA-Analysis-of-Covariance.pptx
ANCOVA-Analysis-of-Covariance.pptx
 
Multivariate analysis
Multivariate analysisMultivariate analysis
Multivariate analysis
 
Non linearregression 4+
Non linearregression 4+Non linearregression 4+
Non linearregression 4+
 
2.4 Scatterplots, correlation, and regression
2.4 Scatterplots, correlation, and regression2.4 Scatterplots, correlation, and regression
2.4 Scatterplots, correlation, and regression
 
Point estimation
Point estimationPoint estimation
Point estimation
 
Generalized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects DesignsGeneralized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects Designs
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notes
 
Sexual Exploitation Risk Assessment Framework- A Pilot Study
Sexual Exploitation Risk Assessment Framework- A Pilot StudySexual Exploitation Risk Assessment Framework- A Pilot Study
Sexual Exploitation Risk Assessment Framework- A Pilot Study
 
A.6 confidence intervals
A.6  confidence intervalsA.6  confidence intervals
A.6 confidence intervals
 
Roc auc curve
Roc auc curveRoc auc curve
Roc auc curve
 

Similar a Statistics with R

Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxIndhuGreen
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Statistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuStatistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuSudhir INDIA
 
Statistics for analytics
Statistics for analyticsStatistics for analytics
Statistics for analyticsdeepika4721
 
Statistics for Physical Education
Statistics for Physical EducationStatistics for Physical Education
Statistics for Physical EducationParag Shah
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
STATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsSTATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsssuseref12b21
 
data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023ayesha455941
 
Lecture 1 - Overview.pptx
Lecture 1 - Overview.pptxLecture 1 - Overview.pptx
Lecture 1 - Overview.pptxDrAnisFatima
 
Univariate, bivariate analysis, hypothesis testing, chi square
Univariate, bivariate analysis, hypothesis testing, chi squareUnivariate, bivariate analysis, hypothesis testing, chi square
Univariate, bivariate analysis, hypothesis testing, chi squarekongara
 
4. six sigma descriptive statistics
4. six sigma descriptive statistics4. six sigma descriptive statistics
4. six sigma descriptive statisticsHakeem-Ur- Rehman
 
Chapter 8 addisional content
Chapter 8 addisional contentChapter 8 addisional content
Chapter 8 addisional contentbathabilev
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatMarwa Zalat
 
Measure of central tendency
Measure of central tendency Measure of central tendency
Measure of central tendency Kannan Iyanar
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 

Similar a Statistics with R (20)

Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Statistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuStatistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahu
 
Basic Statistics to start Analytics
Basic Statistics to start AnalyticsBasic Statistics to start Analytics
Basic Statistics to start Analytics
 
Statistics for analytics
Statistics for analyticsStatistics for analytics
Statistics for analytics
 
Statistics for Physical Education
Statistics for Physical EducationStatistics for Physical Education
Statistics for Physical Education
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
STATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsSTATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and students
 
data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Lecture 1 - Overview.pptx
Lecture 1 - Overview.pptxLecture 1 - Overview.pptx
Lecture 1 - Overview.pptx
 
SP and R.pptx
SP and R.pptxSP and R.pptx
SP and R.pptx
 
Univariate, bivariate analysis, hypothesis testing, chi square
Univariate, bivariate analysis, hypothesis testing, chi squareUnivariate, bivariate analysis, hypothesis testing, chi square
Univariate, bivariate analysis, hypothesis testing, chi square
 
4. six sigma descriptive statistics
4. six sigma descriptive statistics4. six sigma descriptive statistics
4. six sigma descriptive statistics
 
Chapter 8 addisional content
Chapter 8 addisional contentChapter 8 addisional content
Chapter 8 addisional content
 
Chapter 8 addisional content
Chapter 8 addisional contentChapter 8 addisional content
Chapter 8 addisional content
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
Measure of central tendency
Measure of central tendency Measure of central tendency
Measure of central tendency
 
IV STATISTICS I.pdf
IV STATISTICS I.pdfIV STATISTICS I.pdf
IV STATISTICS I.pdf
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 

Más de Ruru Chowdhury

The One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. PrelimsThe One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. PrelimsRuru Chowdhury
 
The One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. FinalsThe One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. FinalsRuru Chowdhury
 
Statr session 25 and 26
Statr session 25 and 26Statr session 25 and 26
Statr session 25 and 26Ruru Chowdhury
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24Ruru Chowdhury
 
Statr session 21 and 22
Statr session 21 and 22Statr session 21 and 22
Statr session 21 and 22Ruru Chowdhury
 
Statr session 19 and 20
Statr session 19 and 20Statr session 19 and 20
Statr session 19 and 20Ruru Chowdhury
 
Statr session 17 and 18
Statr session 17 and 18Statr session 17 and 18
Statr session 17 and 18Ruru Chowdhury
 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Ruru Chowdhury
 
Statr session 15 and 16
Statr session 15 and 16Statr session 15 and 16
Statr session 15 and 16Ruru Chowdhury
 
Statr session14, Jan 11
Statr session14, Jan 11Statr session14, Jan 11
Statr session14, Jan 11Ruru Chowdhury
 
JM Statr session 13, Jan 11
JM Statr session 13, Jan 11JM Statr session 13, Jan 11
JM Statr session 13, Jan 11Ruru Chowdhury
 
Statr sessions 11 to 12
Statr sessions 11 to 12Statr sessions 11 to 12
Statr sessions 11 to 12Ruru Chowdhury
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December Ruru Chowdhury
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10Ruru Chowdhury
 

Más de Ruru Chowdhury (20)

The One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. PrelimsThe One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. Prelims
 
The One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. FinalsThe One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. Finals
 
Statr session 25 and 26
Statr session 25 and 26Statr session 25 and 26
Statr session 25 and 26
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24
 
Statr session 21 and 22
Statr session 21 and 22Statr session 21 and 22
Statr session 21 and 22
 
Statr session 19 and 20
Statr session 19 and 20Statr session 19 and 20
Statr session 19 and 20
 
Statr session 17 and 18
Statr session 17 and 18Statr session 17 and 18
Statr session 17 and 18
 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)
 
Statr session 15 and 16
Statr session 15 and 16Statr session 15 and 16
Statr session 15 and 16
 
Statr session14, Jan 11
Statr session14, Jan 11Statr session14, Jan 11
Statr session14, Jan 11
 
JM Statr session 13, Jan 11
JM Statr session 13, Jan 11JM Statr session 13, Jan 11
JM Statr session 13, Jan 11
 
Statr sessions 11 to 12
Statr sessions 11 to 12Statr sessions 11 to 12
Statr sessions 11 to 12
 
Nosql part3
Nosql part3Nosql part3
Nosql part3
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December
 
Nosql part 2
Nosql part 2Nosql part 2
Nosql part 2
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10
 
R part iii
R part iiiR part iii
R part iii
 
R part II
R part IIR part II
R part II
 
Statr sessions 7 to 8
Statr sessions 7 to 8Statr sessions 7 to 8
Statr sessions 7 to 8
 
R part I
R part IR part I
R part I
 

Último

Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 

Último (20)

Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 

Statistics with R

  • 1.
  • 2. > x=11 > print(x) [1] 11 > x [1] 11 > X Error: object 'X' not found > y<-7 > y [1] 7 > y<-9 > y [1] 9 > ls() [1] "x" "y" > rm(y) > y Error: object 'y' not found > y<-9 > x.1<-14 > x.1 [1] 14 > 1x<-22 Error: unexpected symbol in "1x"
  • 3.
  • 4. Entering data with c • c function for small datasets – combines or concatenates terms together Example: we have a count of the number of typing mistakes of a word document: 02132011 To enter this into an R session we go like this: > typo=c(0,2,1,3,2,0,1,1) > typo [1] 0 2 1 3 2 0 1 1
  • 5. Learning Objectives • What is statistics? • Become aware of the varied applications of statistics in business. • Differentiate between descriptive and inferential statistics. • Identify types of variables.
  • 6. Statistics in Business • Accounting — auditing and cost estimation • Economics — local, regional, national, and international economic performance • Finance — investments and portfolio management • Management — human resources, compensation, and quality management • Management Information Systems — performance of systems which gather, summarize, and disseminate information to various managerial levels • Marketing — market analysis and consumer research • International Business — market and demographic analysis
  • 7. What is Statistics? • Science dealing with collection, analysis, interpretation and presentation of data (with a view to making inferences) • Branches of statistics: – Descriptive – graphical or numerical summaries of data – Inferential – making a decision based on data
  • 8. What is Statistics? Statistics in business is the study of VARIATIONS
  • 9. Population Versus Sample • Population — the whole – a collection of all persons, objects, or items under study • Census — gathering data from the entire population • Sample — gathering data on a subset of the population – Use information about the sample to infer about the population
  • 11. Population and Census Data Identifier Color MPG RD1 Red 12 RD2 Red 10 RD3 Red 13 RD4 Red 10 RD5 Red 13 BL1 Blue 27 BL2 Blue 24 GR1 Green 35 GR2 Green 35 GY1 Gray 15 GY2 Gray 18 GY3 Gray 17
  • 12. Sample and Sample Data Identifier Color MPG RD2 Red 10 RD5 Red 13 GR1 Green 35 GY2 Gray 18
  • 14. Parameter vs. Statistic • Parameter — descriptive measure of the population – Usually represented by Greek letters  denotes population parameter  2 denotes population variance  denotes population standard deviation • Statistic — descriptive measure of a sample – Usually represented by Roman letters x denotes sample mean s 2 denotes sample variance s denotes sample standarddeviation
  • 15. Statistics in Business • Inferences about parameters made under conditions of uncertainty (which are always present in statistics) – Uncertainty can be caused by • Randomness in selection of a sample • Lack of knowledge about the source of the inferences • Change in conditions not accounted for
  • 16. Variables and Data Variable : a characteristic of any entity being studied – is capable of taking on different values that can be used for analysis e.g. stock price, ROI, market share, age of worker, income of a family, total sales, advertising cost etc Measurement : is done when a standard process is used to assign numbers to particular characteristics of a variable – may be obvious or defined e.g. age is obvious but ROI or Labour productivity is defined The source of each measurement is called a Sampling unit Data : recorded measurements
  • 17. Levels of Data Measurement What are 40 and 80? may represent Weights of two objects being shipped Ratings received in a consumer test by two different products Football jersey numbers of a fullback and centreforward Appropriateness of data analysis depends on the level of measurement of the data gathered
  • 18. Levels of Data Measurement • Nominal — Qualitative data, typically numbers are used only to classify or categorize the attribute, however it is useful to retain original verbal descriptions of categories – 1 for “male” and 2 for “female” – Employee identification number – Religion, Geographic location, PIN code, Place of birth – Demographic questions in survey etc
  • 19. Levels of Data Measurement • Ordinal - A variable is ordinal measurable if ranking or ordering is possible for values of the variable. – For example, a gold medal reflects superior performance to a silver or bronze medal in the Olympics. But can you say a gold and a bronze medal average out to a silver medal? – Preference scales are typically ordinal – how much do you like this cereal? Like it a lot, somewhat like it, neutral, somewhat dislike it, dislike it a lot.
  • 20. Levels of Data Measurement • Interval - In interval measurement the distance between attributes does have meaning. – Numerical data typically fall into this category – For example, when measuring temperature (in Fahrenheit), the distance from 30-40 is same as the distance from 70-80. The interval between values is interpretable.
  • 21. Levels of Data Measurement • Ratio — in ratio measurement there is always a reference point that is meaningful (either 0 for rates or 1 for ratios) – This means that you can construct a meaningful fraction (or ratio) with a ratio variable. – In applied social research most "count" variables are ratio, for example, the number of clients in past six months.
  • 22. Visualizing the data • Construct a frequency distribution – For both grouped and ungrouped data • Construct graphical summaries of qualitative data • Construct graphical summaries of quantitative data • Construct graphical summaries of two variables
  • 23. Ungrouped vs.Grouped Data • Ungrouped data – have not been summarized in any way – are also called raw data • Grouped data – logical groupings of data exists • i.e. age ranges (20-29, 30-39, etc.) – have been organized into a frequency distribution
  • 24. Example of Ungrouped Data 42 26 32 34 57 30 58 37 50 30 53 40 30 47 49 50 40 32 31 40 52 28 23 35 25 30 36 32 26 50 55 30 58 64 52 49 33 43 46 32 61 31 30 40 60 74 37 29 43 54 Ages of a sample of Managers from Urban Child Care Centres in US
  • 25. Frequency Distribution • Frequency Distribution – summary of data presented in the form of class intervals and frequencies – Vary in shape and design – Constructed according to the individual researcher's preferences
  • 26. Frequency Distribution • Steps in Frequency Distribution – Step 1 – Determine range of frequency distribution • Range is the difference between the high and the lowest numbers – Step 2 – Determine the number of classes • Do not use too many, or two few classes – Step 3 – Determine the width of the class interval • Approx. class width can be calculated by dividing the range by the number of classes • Values fit into only one class
  • 27. Frequency Distribution of Child Care Manager’s Ages Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
  • 28. Relative Frequency Relative frequency is the proportion of the total frequency that is in any given class interval in a frequency distributionrtion of the total frequency that is any given class interval in a frequency distribution. Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total Frequency 6 18 11 11 3 1 50 Relative Frequency 6 .12  50 .36 18 .22  50 .22 .06 .02 1.00
  • 29. Cumulative Frequency Cumulative frequency is a running total of frequencies through the classes of a frequency distributionen class interval in a frequency distribution. Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total Frequency 6 18 11 11 3 1 50 Cumulative Frequency 6 24 18 + 6 35 11 + 24 46 49 50
  • 30. Cumulative Relative Frequencies Cumulative relative frequency is a running total of the relative frequencies through the classes of a frequency distributione total frequency Cumulative Relative Cumulative Relative Class Interval Frequency Frequency Frequency Frequency 20-under 30 6 .12 6 .12 30-under 40 18 .36 24 .48 40-under 50 11 .22 35 .70 50-under 60 11 .22 46 .92 60-under 70 3 .06 49 .98 70-under 80 1 .02 50 1.00 Total 50 1.00
  • 31. Common Statistical Graphs – Quantitative Data • • • • • Histogram -- vertical bar chart of frequencies Frequency Polygon -- line graph of frequencies Ogive -- line graph of cumulative frequencies Dot Plots – each data value is plotted Stem and Leaf Plot -- Like a histogram, but shows individual data values. Useful for small data sets.
  • 32. Histogram • A histogram is a graphical summary of a frequency distribution • Labeling x-axis with class endpoints and y-axis with frequencies, drawing a horizontal line between two class endpoints at each frequency value • The number and location of rectangles (bars) should be determined based on the sample size and the range of the data
  • 34. Number of Classes and Class Width • The number of classes should be between 5 and 15. – Fewer than 5 classes cause excessive summarization. – More than 15 classes leave too much detail. • Class Width – Divide the range by the number of classes for an approximate class width – Round up to a convenient number
  • 35. Class midpoint or Class mark The midpoint of each class interval is called the class midpoint or the class mark.
  • 36. Midpoints for Age Classes Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total Frequency 6 18 11 11 3 1 50 Midpoint 25 35 45 55 65 75 Relative Frequency .12 .36 .22 .22 .06 .02 1.00 Cumulative Frequency 6 24 35 46 49 50
  • 37. Midpoints for Age Classes Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total Frequency 6 18 11 11 3 1 50 Midpoint 25 35 45 55 65 75 Relative Frequency .12 .36 .22 .22 .06 .02 1.00 Cumulative Frequency 6 24 35 46 49 50
  • 38. Histogram Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
  • 39. 10 0 A graphical display of class frequencies Frequency Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1 20 Frequency Polygon 0 10 20 30 40 50 60 70 80 Years
  • 40. Relative Frequency Ogive Cumulative Relative Class Interval Frequency 20-under 30 .12 30-under 40 .48 40-under 50 .70 50-under 60 .92 60-under 70 .98 70-under 80 1.00
  • 41. Stem and Leaf plot: Safety Examination Scores for Plant Trainees Raw Data Stem Leaf 86 77 91 60 55 2 3 76 92 47 88 67 3 9 23 59 72 75 83 4 79 5 569 6 07788 77 68 82 97 89 81 75 74 39 67 7 0245567789 79 83 70 78 91 8 11233689 68 49 56 94 81 9 11247
  • 42. Stem and Leaf plot: Construction Raw Data 86 77 91 60 Stem 55 Leaf 2 3 3 9 4 79 5 569 Leaf 6 07788 67 7 0245567789 78 91 8 11233689 Leaf 94 81 9 11247 76 92 47 88 23 59 72 75 77 68 82 97 81 75 74 39 79 83 70 68 49 56 Stem Stem 67 83 89
  • 43. Histogram vs. Stem and Leaf? • So, which one should you use? • A Stem and Leaf plot is useful for small data sets. It shows the values of the datapoints. • A histogram foregoes seeing the individual values of the data for the bigger picture of the distribution of the data • The purpose of these graphs is to summarize a set of data. As long as that need is met, either one is okay to use.
  • 44. Common Statistical Graphs – Qualitative Data • Pie Chart -- proportional representation for categories of a whole • Bar Chart – frequency or relative frequency of one more categorical variables
  • 45. Complaints by Amtrak Passengers COMPLAINT NUMBER PROPORTION DEGREES Stations, etc. 28,000 .40 144.0 Train Performance Equipment 14,700 .21 75.6 10,500 .15 54.0 Personnel 9,800 .14 50.4 Schedules, etc. Total 7,000 .10 36.0 70,000 1.00 360.0
  • 46. Complaints by Amtrak Passengers
  • 47. Second Quarter U.S. Truck Production Second Quarter Truck Production in the U.S. (Hypothetical values) Company 2d Quarter Truck Production A 357,411 B 354,936 C 160,997 D 34,099 E Totals 12,747 920,190
  • 49. Pie Chart Calculations for Company A Company 2d Quarter Truck Production Proportion Degrees A 357,411 .388 140 B 354,936 .386 139 C 160,997 .175 63 D 34,099 .037 13 12,747 920,190 .014 1.000 5 360 E Totals
  • 50. Vertical Bar Graphs or Column Charts 6 5 4 Kolkata 3 Mumbai Chennai 2 1 0 2010 2011 2012 2013
  • 52. Pareto Chart A pareto chart is a bar chart, sorted from the most frequent to the least frequent, overlaid with a cumulative line graph (like an ogive). These data present the most common types of defects. 100% 90% 80 70 Frequency 100 90 80% 70% 60 50 40 60% 50% 40% 30 20 30% 20% 10 0 10% 0% Poor Wiring Short in Coil Defective Plug Other
  • 53. Scatter Plot Registered Vehicles (1000's) Gasoline Sales (1000's of Gallons) 5 60 15 120 9 90 15 140 7 60
  • 54. Common Statistical Graphs – Comparing Two Variables • Scatter Plot -- type of display using Cartesian coordinates to display values for two variables for a set of data. – The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. – A scatter plot is also called a scatter chart, scatter diagram and scatter graph.
  • 55. Measures of Central Tendency & Dispersion: Learning Objectives • Distinguish between measures of central tendency, measures of variability, measures of shape, and measures of association. • Understand the meanings of mean, median, mode, quartile, percentile, and range. • Compute mean, median, mode, percentile, quartile, range, v ariance, standard deviation, and mean absolute deviation on ungrouped data. • Differentiate between sample and population variance and standard deviation.
  • 56. Measures of Central Tendency & Dispersion: Learning Objectives - continued • Understand the meaning of standard deviation as it is applied by using the empirical rule and Chebyshev’s theorem. • Compute the mean, median, standard deviation, and variance on grouped data. • Understand box and whisker plots, skewness, and kurtosis. • Compute a coefficient of correlation and interpret it.
  • 57. Measures of Central Tendency: Ungrouped Data • Measures of central tendency yield information about “the centre, or middle part, of a group of numbers.” • Measures of central tendency do not focus on the span of the data set or how far values are from the middle numbers • Common Measures of Location – – – – – Mode Median Mean Percentiles Quartiles
  • 58. Mode • Mode - the most frequently occurring value in a data set – Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio) – Can be used to determine what categories occur most frequently – Sometimes, no mode exists (no duplicates) • Bimodal – In a tie for the most frequently occurring value, two modes are listed • Multimodal -- Data sets that contain more than two modes
  • 59. Median • Median - middle value in an ordered array of numbers. – Half the data are above it, half the data are below it – Mathematically, it is the (n+1)/2 th ordered observation • For an array with an odd number of terms, the median is the middle number – n=11 => (n+1)/2 th = 12/2 th = 6th ordered observation • For an array with an even number of terms the median is the average of the middle two numbers – n=10 => (n+1)/2 th = 11/2 th = 5.5th = average of 5th and 6th ordered observation
  • 60. Arithmetic Mean • • • • Mean is the average of a group of numbers Applicable for interval and ratio data Not applicable for nominal or ordinal data Affected by each value in the data set, including extreme values • Computed by summing all values in the data set and dividing the sum by the number of values in the data set
  • 61. Demonstration Problem The number of U.S. cars in service by top car rental companies in a recent year according to Auto Rental News follows. Company / Number of Cars in Service Enterprise 643,000; Hertz 327,000; National/Alamo 233,000; Avis 204,000; Dollar/Thrifty 167,000; Budget 144,000; Advantage 20,000; U-Save 12,000; Payless 10,000; ACE 9,000; Fox 9,000; Rent-A-Wreck 7,000; Triangle 6,000 Compute the mode, the median, and the mean.
  • 62. Demonstration Problem • Solutions Solution Mode: 9,000 (two companies with 9,000 cars in service) Median: With 13 different companies in this group, N = 13. The median is located at the (13 +1)/2 = 7th position. Because the data are already ordered, median is the 7th term, which is 20,000. Mean: μ = ∑x/N = (1,791,000/13) = 137,769.23
  • 63. Percentile • Percentile - measures of central tendency that divide a group of data into 100 parts • At least n% of the data lie at or below the nth percentile, and at most (100 - n)% of the data lie above the nth percentile • Example: 90th percentile indicates that at 90% of the data are equal to or less than it, and 10% of the data lie above it
  • 64. Calculating Percentiles • To calculate the pth percentile, – Order the data – Calculate i = N (p/100) – Determine the percentile • If i is a whole number, then use the average of the ith and (i+1)th ordered observation • Otherwise, round i up to the next highest whole number
  • 65. Quartiles • Quartile - measures of central tendency that divide a group of data into four subgroups • Q1: 25% of the data set is below the first quartile • Q2: 50% of the data set is below the second quartile • Q3: 75% of the data set is below the third quartile Q2 Q1 25% 25% Q3 25% 25%
  • 66. Quartiles for Demonstration Problem For the cars in service data, n=13, so Q1: i = 13 (25/100) = 3.25, so use the 4th ordered observation Q1 = 9,000 Q3: i = 13 (75/100) = 9.75, so use the 10th ordered observation Q3 = 204,000
  • 67. Which Measure Do I Use? • Which measure of central tendency is most appropriate? – In general, the mean is preferred, since it has nice mathematical properties, we shall discuss later – The median and quartiles, are resistant to outliers • Consider the following three datasets – – – – 1, 2, 3 (median=2, mean=2) 1, 2, 6 (median=2, mean=3) 1, 2, 30 (median=2, mean=11) All have median=2, but the mean is sensitive to the outliers • In general, if there are outliers, the median is preferred to the mean ……….. To continue