SlideShare a Scribd company logo
1 of 39
Those who don’t know statistics are condemned to reinvent it… David Freedman
All you ever wanted to know about the histogram and more ...
Distribution of No of Graphics on web pages (N=1873) Mean = 17.93 N = 1873 Graphic Count Std. Dev = 17.92 Median = 16.00 1 95.0 90.0 85.0 80.0 75.0 70.0 65.0 60.0 55.0 50.0 45.0 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 400 300 200 100 0
Horizontal Scale 2
Distribution of Redundant Link % on web pages (N =1861) Std. Dev = 37.33  Mean = 22.1 N = 1861.00 Median = 14  3 480.0 440.0 400.0 360.0 320.0 280.0 240.0 200.0 160.0 120.0 80.0 40.0 0.0 1000 800 600 400 200 0
Plotting a histogram: endpoint convention,  plot frequencies,  make equal intervals etc.
Frequency Table convention: include the left endpoint in the class interval 4
Frequency/Probability
No of fonts used on a web-page Frequency /probability 5 0/  0 200/  .1 400/  .2 600/  .3 800/  .4 1000/  .5 Frequency 110 430 860 280 180 40 20 10 1 3 5 7 9 11 13 15 Probability .06 .22 .45 .15 .09 .02 .01 .01
Cleaning up a histogram:  getting rid of outliers
Distribution of word count (N=1903) Std. Dev = 725.24  Mean = 393.2 Maximum = 20,357 Minimum = 0 Median = 223 20000.0 18000.0 16000.0 14000.0 12000.0 10000.0 8000.0 6000.0 4000.0 2000.0 0.0 1600 1400 1200 1000 800 600 400 200 0
Distribution of word count (N=1897) top six removed Std. Dev = 474.04  Mean = 368.0 Maximum = 4132 Minimum = 0 Median = 223 7 4000.0 3600.0 3200.0 2800.0 2400.0 2000.0 1600.0 1200.0 800.0 400.0 0.0 800 600 400 200 0
Distribution of word count (N=1873) Std. Dev = 360.30  Mean = 333.4 Maximum = 4132 Minimum = 0 WORDCNT2 Median = 220 2400.0 2200.0 2000.0 1800.0 1600.0 1400.0 1200.0 1000.0 800.0 600.0 400.0 200.0 0.0 500 400 300 200 100 0
What can histograms tell you
Distribution of link count on good & bad web-pages Good Sites Bad Sites 8 2 8 0 . 0 2 4 0 . 0 2 0 0 . 0 1 6 0 . 0 1 2 0 . 0 8 0 . 0 4 0 . 0 0 . 0 3 0 0 2 0 0 1 0 0 0
Making inferences from histograms: Incidence of riots and temperature 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0 temperature 9
Mean and Median ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Mean is arithmetic average, median is 50% point Mean is point where graph balances
The instability of means and standard deviations
Add two numbers: watch the mean, median, & SD
Add one outlier...
Standard Deviation: a measure of spread
Same mean, different spread S D S D 10
The Standard Deviation
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Understanding the standard deviation ,[object Object],0% 25% 50% Histogram is symmetric about 2,  2 is mean,  and 50% to left of 2, 50% to right
[object Object],[object Object],[object Object],0% 25% 50% 0% 25% 50% List: 1, 2, 2, 5 Average =2.5  SD = 1.73 0% 25% 50% List: 1, 2, 2, 7 Average =3 SD = 2.71
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Computing the standard deviation
Properties of the standard deviation ,[object Object],[object Object],[object Object]
Properties of the Normal Probability Curve ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 SD= 68% 2 SD = 95% 3 SD= 99.7% 11
Distribution of judges ratings for the Webby Awards Std. Dev = 1.98  Mean = 6.3 N = 1867.00 Skewness = -.43  Kurtosis = -.201  Median = 6.3  12 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 500 400 300 200 100 0
It is a remarkable fact that  many histograms  in real life tend to follow the Normal Curve. For such histograms, the  mean and SD are good summary statistics .  The  average  pins down the center, while the  SD  gives the spread. For histogram which do not follow the normal Curve, the mean and SD are not good summary statistics. What when the histogram is not normal ...
+- 3 SD = (384 * 3) = 1152 Mean - 1152 = about 30% sample had negative number of links Mean = 348.3 Std. Dev = 384.83  Distribution of word count on web pages 13 2800.0 2600.0 2400.0 2200.0 2000.0 1800.0 1600.0 1400.0 1200.0 1000.0 800.0 600.0 400.0 200.0 0.0 500 400 300 200 100 0
Note. A percentile is a score below which a certain % of sample is  When SD is influenced by outliers Use inter quartile range 75th percentile - 25th percentile
Measures of Normality ,[object Object],[object Object],Positively  Skewed Negatively Skewed Symmetric 14
Kurtosis: Does it cluster in the middle? Large tail Small tail Normal Tail ,[object Object],[object Object],[object Object],[object Object],15
Positively Skewed and Leptokurtic:  Word Count Std. Dev = 725.24  Mean = 393.2 N = 1903.00 Kurtosis = 321.84 Skewness = 13.62 Median = 223 20000.0 18000.0 16000.0 14000.0 12000.0 10000.0 8000.0 6000.0 4000.0 2000.0 0.0 1600 1400 1200 1000 800 600 400 200 0
Distribution of word count (N=1897) top six removed Std. Dev = 474.04  Mean = 368.0 N = 1897.00 Skewness = 3.49 Kurtosis = 16.40 Median = 223 4000.0 3600.0 3200.0 2800.0 2400.0 2000.0 1600.0 1200.0 800.0 400.0 0.0 800 600 400 200 0
Degree of Freedom ,[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Propteties of Standard Deviation
Propteties of Standard DeviationPropteties of Standard Deviation
Propteties of Standard DeviationSahil Jindal
 
Standard Deviation and Variance
Standard Deviation and VarianceStandard Deviation and Variance
Standard Deviation and VarianceJufil Hombria
 
Multiples and Common Multiples
Multiples and Common MultiplesMultiples and Common Multiples
Multiples and Common MultiplesJohdener14
 
Malimu variance and standard deviation
Malimu variance and standard deviationMalimu variance and standard deviation
Malimu variance and standard deviationMiharbi Ignasm
 
Coefficient of variation
Coefficient of variationCoefficient of variation
Coefficient of variationNadeem Uddin
 
Measure of Dispersion
Measure of DispersionMeasure of Dispersion
Measure of Dispersionelly_gaa
 
GCSE Geography: How And Why To Use Spearman’s Rank
GCSE Geography: How And Why To Use Spearman’s RankGCSE Geography: How And Why To Use Spearman’s Rank
GCSE Geography: How And Why To Use Spearman’s RankMark Cowan
 
UNIT III -Measures of Dispersion (2) (1).ppt
UNIT III -Measures of Dispersion (2) (1).pptUNIT III -Measures of Dispersion (2) (1).ppt
UNIT III -Measures of Dispersion (2) (1).pptMalihAz2
 
What is the Mode?
What is the Mode?What is the Mode?
What is the Mode?Ken Plummer
 
The Interpretation Of Quartiles And Percentiles July 2009
The Interpretation Of Quartiles And Percentiles   July 2009The Interpretation Of Quartiles And Percentiles   July 2009
The Interpretation Of Quartiles And Percentiles July 2009Maggie Verster
 

What's hot (20)

Propteties of Standard Deviation
Propteties of Standard DeviationPropteties of Standard Deviation
Propteties of Standard Deviation
 
regression
regressionregression
regression
 
Rounding Powerpoint
Rounding PowerpointRounding Powerpoint
Rounding Powerpoint
 
Standard Deviation and Variance
Standard Deviation and VarianceStandard Deviation and Variance
Standard Deviation and Variance
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Multiples and Common Multiples
Multiples and Common MultiplesMultiples and Common Multiples
Multiples and Common Multiples
 
MEAN DEVIATION
MEAN DEVIATIONMEAN DEVIATION
MEAN DEVIATION
 
Malimu variance and standard deviation
Malimu variance and standard deviationMalimu variance and standard deviation
Malimu variance and standard deviation
 
Coefficient of variation
Coefficient of variationCoefficient of variation
Coefficient of variation
 
Examen estadistica
Examen estadisticaExamen estadistica
Examen estadistica
 
VARIANCE
VARIANCEVARIANCE
VARIANCE
 
Measure of Dispersion
Measure of DispersionMeasure of Dispersion
Measure of Dispersion
 
GCSE Geography: How And Why To Use Spearman’s Rank
GCSE Geography: How And Why To Use Spearman’s RankGCSE Geography: How And Why To Use Spearman’s Rank
GCSE Geography: How And Why To Use Spearman’s Rank
 
UNIT III -Measures of Dispersion (2) (1).ppt
UNIT III -Measures of Dispersion (2) (1).pptUNIT III -Measures of Dispersion (2) (1).ppt
UNIT III -Measures of Dispersion (2) (1).ppt
 
What is the Mode?
What is the Mode?What is the Mode?
What is the Mode?
 
3.2 Measures of variation
3.2 Measures of variation3.2 Measures of variation
3.2 Measures of variation
 
The Interpretation Of Quartiles And Percentiles July 2009
The Interpretation Of Quartiles And Percentiles   July 2009The Interpretation Of Quartiles And Percentiles   July 2009
The Interpretation Of Quartiles And Percentiles July 2009
 
Divisibility
DivisibilityDivisibility
Divisibility
 
Integers
IntegersIntegers
Integers
 
Quantity and unit
Quantity and unitQuantity and unit
Quantity and unit
 

Viewers also liked

Presentation Sharing Community (Presentation Camp)
Presentation Sharing Community (Presentation Camp)Presentation Sharing Community (Presentation Camp)
Presentation Sharing Community (Presentation Camp)Rashmi Sinha
 
John AdamsII You Arent From Around Here Are You
John AdamsII You Arent From Around Here Are YouJohn AdamsII You Arent From Around Here Are You
John AdamsII You Arent From Around Here Are YouRashmi Sinha
 
OPEN Forum: Women Business Owners
OPEN Forum: Women Business OwnersOPEN Forum: Women Business Owners
OPEN Forum: Women Business OwnersRashmi Sinha
 
Hacking growth at slideshare
Hacking growth at slideshareHacking growth at slideshare
Hacking growth at slideshareRashmi Sinha
 
SlideShare Zeitgeist 2011
SlideShare Zeitgeist 2011SlideShare Zeitgeist 2011
SlideShare Zeitgeist 2011Rashmi Sinha
 
Growth hacking is unsexy
Growth hacking is unsexyGrowth hacking is unsexy
Growth hacking is unsexyRashmi Sinha
 

Viewers also liked (7)

Presentation Sharing Community (Presentation Camp)
Presentation Sharing Community (Presentation Camp)Presentation Sharing Community (Presentation Camp)
Presentation Sharing Community (Presentation Camp)
 
John AdamsII You Arent From Around Here Are You
John AdamsII You Arent From Around Here Are YouJohn AdamsII You Arent From Around Here Are You
John AdamsII You Arent From Around Here Are You
 
OPEN Forum: Women Business Owners
OPEN Forum: Women Business OwnersOPEN Forum: Women Business Owners
OPEN Forum: Women Business Owners
 
Hacking growth at slideshare
Hacking growth at slideshareHacking growth at slideshare
Hacking growth at slideshare
 
SlideShare Zeitgeist 2011
SlideShare Zeitgeist 2011SlideShare Zeitgeist 2011
SlideShare Zeitgeist 2011
 
Growth hacking is unsexy
Growth hacking is unsexyGrowth hacking is unsexy
Growth hacking is unsexy
 
Meet SlideShare
Meet SlideShareMeet SlideShare
Meet SlideShare
 

Similar to SIMS Quant Course Lecture 4

The standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesThe standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesAbhi Manu
 
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
02 PSBE3_PPT.Ch01_2_Examining Distribution.pptBishoyRomani
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in StatisticsAzmi Mohd Tamil
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceLong Beach City College
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Long Beach City College
 
MEASURES OF DISPERSION NOTES.pdf
MEASURES OF DISPERSION NOTES.pdfMEASURES OF DISPERSION NOTES.pdf
MEASURES OF DISPERSION NOTES.pdfLSHERLEYMARY
 
PSY_1004_2023H1_session 05.pdf
PSY_1004_2023H1_session 05.pdfPSY_1004_2023H1_session 05.pdf
PSY_1004_2023H1_session 05.pdfcheuklamchan3
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributionsnszakir
 
Year 12 Maths A Textbook - Chapter 10
Year 12 Maths A Textbook - Chapter 10Year 12 Maths A Textbook - Chapter 10
Year 12 Maths A Textbook - Chapter 10westy67968
 
Mean, median, and mode ug
Mean, median, and mode ugMean, median, and mode ug
Mean, median, and mode ugAbhishekDas15
 
Data-Handling part 2.ppt
Data-Handling part 2.pptData-Handling part 2.ppt
Data-Handling part 2.pptAhmadHashlamon
 
Measures of Dispersion - Thiyagu
Measures of Dispersion - ThiyaguMeasures of Dispersion - Thiyagu
Measures of Dispersion - ThiyaguThiyagu K
 
Averages and range
Averages and rangeAverages and range
Averages and rangemwardyrem
 
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfUnit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfRavinandan A P
 

Similar to SIMS Quant Course Lecture 4 (20)

The standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesThe standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciences
 
DescriptiveStatistics.pdf
DescriptiveStatistics.pdfDescriptiveStatistics.pdf
DescriptiveStatistics.pdf
 
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
MEASURES OF DISPERSION NOTES.pdf
MEASURES OF DISPERSION NOTES.pdfMEASURES OF DISPERSION NOTES.pdf
MEASURES OF DISPERSION NOTES.pdf
 
PSY_1004_2023H1_session 05.pdf
PSY_1004_2023H1_session 05.pdfPSY_1004_2023H1_session 05.pdf
PSY_1004_2023H1_session 05.pdf
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributions
 
Year 12 Maths A Textbook - Chapter 10
Year 12 Maths A Textbook - Chapter 10Year 12 Maths A Textbook - Chapter 10
Year 12 Maths A Textbook - Chapter 10
 
Mean, median, and mode ug
Mean, median, and mode ugMean, median, and mode ug
Mean, median, and mode ug
 
Data-Handling part 2.ppt
Data-Handling part 2.pptData-Handling part 2.ppt
Data-Handling part 2.ppt
 
Statistics For Management 3 October
Statistics For Management 3 OctoberStatistics For Management 3 October
Statistics For Management 3 October
 
5.DATA SUMMERISATION.ppt
5.DATA SUMMERISATION.ppt5.DATA SUMMERISATION.ppt
5.DATA SUMMERISATION.ppt
 
Stats chapter 1
Stats chapter 1Stats chapter 1
Stats chapter 1
 
Measures of Dispersion - Thiyagu
Measures of Dispersion - ThiyaguMeasures of Dispersion - Thiyagu
Measures of Dispersion - Thiyagu
 
Chapter3
Chapter3Chapter3
Chapter3
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Averages and range
Averages and rangeAverages and range
Averages and range
 
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfUnit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
 

More from Rashmi Sinha

SlideShare Zeitgeist 2010
SlideShare Zeitgeist 2010SlideShare Zeitgeist 2010
SlideShare Zeitgeist 2010Rashmi Sinha
 
Why your startup needs multiple revenue models
Why your startup needs multiple revenue modelsWhy your startup needs multiple revenue models
Why your startup needs multiple revenue modelsRashmi Sinha
 
How to get the most out of SlideShare
How to get the most out of SlideShareHow to get the most out of SlideShare
How to get the most out of SlideShareRashmi Sinha
 
Slide share nonprofit_gov_agencies
Slide share nonprofit_gov_agenciesSlide share nonprofit_gov_agencies
Slide share nonprofit_gov_agenciesRashmi Sinha
 
Because brands are people too (talk at Smash Summit 2010)
Because brands are people too (talk at Smash Summit 2010)Because brands are people too (talk at Smash Summit 2010)
Because brands are people too (talk at Smash Summit 2010)Rashmi Sinha
 
Work wants to be social (talk at Web 2 Expo 2010)
Work wants to be social (talk at Web 2 Expo 2010)Work wants to be social (talk at Web 2 Expo 2010)
Work wants to be social (talk at Web 2 Expo 2010)Rashmi Sinha
 
Sharing is the new lead gen - Talk at Web 2.0 expo
Sharing is the new lead gen - Talk at Web 2.0 expoSharing is the new lead gen - Talk at Web 2.0 expo
Sharing is the new lead gen - Talk at Web 2.0 expoRashmi Sinha
 
How to use SlideShare to promote your business (webinar)
How to use SlideShare to promote your business (webinar)How to use SlideShare to promote your business (webinar)
How to use SlideShare to promote your business (webinar)Rashmi Sinha
 
10 things i wish i had known
10 things i wish i had known10 things i wish i had known
10 things i wish i had knownRashmi Sinha
 
LeadShare for your domain
LeadShare for your domainLeadShare for your domain
LeadShare for your domainRashmi Sinha
 
SlideShare Zeitgeist 2009
SlideShare Zeitgeist 2009SlideShare Zeitgeist 2009
SlideShare Zeitgeist 2009Rashmi Sinha
 
Fast Cheap Barely In Control Web2 Expo
Fast Cheap Barely In Control Web2 ExpoFast Cheap Barely In Control Web2 Expo
Fast Cheap Barely In Control Web2 ExpoRashmi Sinha
 
Presentations As Social Media In (talk at Portland Presentation Camp)
Presentations As Social Media In (talk at Portland Presentation Camp)Presentations As Social Media In (talk at Portland Presentation Camp)
Presentations As Social Media In (talk at Portland Presentation Camp)Rashmi Sinha
 
How Businesses can use Social Media (talk at TiE workshop)
How Businesses can use Social Media (talk at TiE workshop)How Businesses can use Social Media (talk at TiE workshop)
How Businesses can use Social Media (talk at TiE workshop)Rashmi Sinha
 
Introducing SlideShare Business
Introducing SlideShare BusinessIntroducing SlideShare Business
Introducing SlideShare BusinessRashmi Sinha
 
Microsoft Parent Toolbox Channel On Slide Share
Microsoft Parent Toolbox Channel On Slide ShareMicrosoft Parent Toolbox Channel On Slide Share
Microsoft Parent Toolbox Channel On Slide ShareRashmi Sinha
 
SlideShare for Gov usage (talk at New Media series0
SlideShare for Gov usage (talk at New Media series0SlideShare for Gov usage (talk at New Media series0
SlideShare for Gov usage (talk at New Media series0Rashmi Sinha
 

More from Rashmi Sinha (20)

SlideShare Zeitgeist 2010
SlideShare Zeitgeist 2010SlideShare Zeitgeist 2010
SlideShare Zeitgeist 2010
 
Why your startup needs multiple revenue models
Why your startup needs multiple revenue modelsWhy your startup needs multiple revenue models
Why your startup needs multiple revenue models
 
How to get the most out of SlideShare
How to get the most out of SlideShareHow to get the most out of SlideShare
How to get the most out of SlideShare
 
Animations
AnimationsAnimations
Animations
 
Slide share nonprofit_gov_agencies
Slide share nonprofit_gov_agenciesSlide share nonprofit_gov_agencies
Slide share nonprofit_gov_agencies
 
Because brands are people too (talk at Smash Summit 2010)
Because brands are people too (talk at Smash Summit 2010)Because brands are people too (talk at Smash Summit 2010)
Because brands are people too (talk at Smash Summit 2010)
 
Work wants to be social (talk at Web 2 Expo 2010)
Work wants to be social (talk at Web 2 Expo 2010)Work wants to be social (talk at Web 2 Expo 2010)
Work wants to be social (talk at Web 2 Expo 2010)
 
Sharing is the new lead gen - Talk at Web 2.0 expo
Sharing is the new lead gen - Talk at Web 2.0 expoSharing is the new lead gen - Talk at Web 2.0 expo
Sharing is the new lead gen - Talk at Web 2.0 expo
 
How to use SlideShare to promote your business (webinar)
How to use SlideShare to promote your business (webinar)How to use SlideShare to promote your business (webinar)
How to use SlideShare to promote your business (webinar)
 
10 things i wish i had known
10 things i wish i had known10 things i wish i had known
10 things i wish i had known
 
LeadShare for your domain
LeadShare for your domainLeadShare for your domain
LeadShare for your domain
 
SlideShare Zeitgeist 2009
SlideShare Zeitgeist 2009SlideShare Zeitgeist 2009
SlideShare Zeitgeist 2009
 
Fast Cheap Barely In Control Web2 Expo
Fast Cheap Barely In Control Web2 ExpoFast Cheap Barely In Control Web2 Expo
Fast Cheap Barely In Control Web2 Expo
 
Presentations As Social Media In (talk at Portland Presentation Camp)
Presentations As Social Media In (talk at Portland Presentation Camp)Presentations As Social Media In (talk at Portland Presentation Camp)
Presentations As Social Media In (talk at Portland Presentation Camp)
 
How Businesses can use Social Media (talk at TiE workshop)
How Businesses can use Social Media (talk at TiE workshop)How Businesses can use Social Media (talk at TiE workshop)
How Businesses can use Social Media (talk at TiE workshop)
 
Introducing SlideShare Business
Introducing SlideShare BusinessIntroducing SlideShare Business
Introducing SlideShare Business
 
Why Use AdShare
Why Use AdShareWhy Use AdShare
Why Use AdShare
 
Microsoft Parent Toolbox Channel On Slide Share
Microsoft Parent Toolbox Channel On Slide ShareMicrosoft Parent Toolbox Channel On Slide Share
Microsoft Parent Toolbox Channel On Slide Share
 
Linktext
LinktextLinktext
Linktext
 
SlideShare for Gov usage (talk at New Media series0
SlideShare for Gov usage (talk at New Media series0SlideShare for Gov usage (talk at New Media series0
SlideShare for Gov usage (talk at New Media series0
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

SIMS Quant Course Lecture 4

  • 1. Those who don’t know statistics are condemned to reinvent it… David Freedman
  • 2. All you ever wanted to know about the histogram and more ...
  • 3. Distribution of No of Graphics on web pages (N=1873) Mean = 17.93 N = 1873 Graphic Count Std. Dev = 17.92 Median = 16.00 1 95.0 90.0 85.0 80.0 75.0 70.0 65.0 60.0 55.0 50.0 45.0 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 400 300 200 100 0
  • 5. Distribution of Redundant Link % on web pages (N =1861) Std. Dev = 37.33 Mean = 22.1 N = 1861.00 Median = 14 3 480.0 440.0 400.0 360.0 320.0 280.0 240.0 200.0 160.0 120.0 80.0 40.0 0.0 1000 800 600 400 200 0
  • 6. Plotting a histogram: endpoint convention, plot frequencies, make equal intervals etc.
  • 7. Frequency Table convention: include the left endpoint in the class interval 4
  • 9. No of fonts used on a web-page Frequency /probability 5 0/ 0 200/ .1 400/ .2 600/ .3 800/ .4 1000/ .5 Frequency 110 430 860 280 180 40 20 10 1 3 5 7 9 11 13 15 Probability .06 .22 .45 .15 .09 .02 .01 .01
  • 10. Cleaning up a histogram: getting rid of outliers
  • 11. Distribution of word count (N=1903) Std. Dev = 725.24 Mean = 393.2 Maximum = 20,357 Minimum = 0 Median = 223 20000.0 18000.0 16000.0 14000.0 12000.0 10000.0 8000.0 6000.0 4000.0 2000.0 0.0 1600 1400 1200 1000 800 600 400 200 0
  • 12. Distribution of word count (N=1897) top six removed Std. Dev = 474.04 Mean = 368.0 Maximum = 4132 Minimum = 0 Median = 223 7 4000.0 3600.0 3200.0 2800.0 2400.0 2000.0 1600.0 1200.0 800.0 400.0 0.0 800 600 400 200 0
  • 13. Distribution of word count (N=1873) Std. Dev = 360.30 Mean = 333.4 Maximum = 4132 Minimum = 0 WORDCNT2 Median = 220 2400.0 2200.0 2000.0 1800.0 1600.0 1400.0 1200.0 1000.0 800.0 600.0 400.0 200.0 0.0 500 400 300 200 100 0
  • 15. Distribution of link count on good & bad web-pages Good Sites Bad Sites 8 2 8 0 . 0 2 4 0 . 0 2 0 0 . 0 1 6 0 . 0 1 2 0 . 0 8 0 . 0 4 0 . 0 0 . 0 3 0 0 2 0 0 1 0 0 0
  • 16. Making inferences from histograms: Incidence of riots and temperature 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0 temperature 9
  • 17.
  • 18. The instability of means and standard deviations
  • 19. Add two numbers: watch the mean, median, & SD
  • 21. Standard Deviation: a measure of spread
  • 22. Same mean, different spread S D S D 10
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. Distribution of judges ratings for the Webby Awards Std. Dev = 1.98 Mean = 6.3 N = 1867.00 Skewness = -.43 Kurtosis = -.201 Median = 6.3 12 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 500 400 300 200 100 0
  • 32. It is a remarkable fact that many histograms in real life tend to follow the Normal Curve. For such histograms, the mean and SD are good summary statistics . The average pins down the center, while the SD gives the spread. For histogram which do not follow the normal Curve, the mean and SD are not good summary statistics. What when the histogram is not normal ...
  • 33. +- 3 SD = (384 * 3) = 1152 Mean - 1152 = about 30% sample had negative number of links Mean = 348.3 Std. Dev = 384.83 Distribution of word count on web pages 13 2800.0 2600.0 2400.0 2200.0 2000.0 1800.0 1600.0 1400.0 1200.0 1000.0 800.0 600.0 400.0 200.0 0.0 500 400 300 200 100 0
  • 34. Note. A percentile is a score below which a certain % of sample is When SD is influenced by outliers Use inter quartile range 75th percentile - 25th percentile
  • 35.
  • 36.
  • 37. Positively Skewed and Leptokurtic: Word Count Std. Dev = 725.24 Mean = 393.2 N = 1903.00 Kurtosis = 321.84 Skewness = 13.62 Median = 223 20000.0 18000.0 16000.0 14000.0 12000.0 10000.0 8000.0 6000.0 4000.0 2000.0 0.0 1600 1400 1200 1000 800 600 400 200 0
  • 38. Distribution of word count (N=1897) top six removed Std. Dev = 474.04 Mean = 368.0 N = 1897.00 Skewness = 3.49 Kurtosis = 16.40 Median = 223 4000.0 3600.0 3200.0 2800.0 2400.0 2000.0 1600.0 1200.0 800.0 400.0 0.0 800 600 400 200 0
  • 39.