Data, Information, & Statistics

Data, Information & Statistics

Data, Information & Statistics…
We believe in GOD rest all… to be supported by DATA!!!
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/data-information-and-statistics-3281

Data, Variables and Observations…
●Data as per dictionary is defined as facts, figures, or values from which logical conclusions may be drawn
as per requirement
● Variable is any quantity that varies; i.e., an aspect or characteristic of a person, object, or situation
that can assume different values
● Observation is any data measurement value for a defined variable; for e.g., the %-scores for males
and females in a statistics exam:.
Males 55 60 58 75 72 66 54 81 49 30 54 23 45 63 76 57
Females 84 28 54 66 71 63 52 43 38 67 71 29 55 66 44 71
Data in the form of measurements always gets assigned to variable(s)!!!

Variation Thinking Vs Goalpost Mentality
Traditionally anything outside the specification limits of a process represented defects. Genicihi
Taguchi (a Japanese engineer) refined the concept of specification limits, or so called Goal-Posts, by
stating that any deviation (i.e., variation) from the target (contained within specification limits) also
represented defects in the process Possible customer measurements under variation
thinking are:
From the traditional goalpost mentality there has been a radical shift to
attack variation!!!
● Target, less the better: e.g., MTTR, cycle time, defects;
● Target, greater the better: e.g., MTBF, customer satisfaction scores;
● Target, specific value: e.g., appointment time, dimensions;

Goalpost Mentality – The Concept…
Under the new rules, the field goal kicker gets 3 points for target and ± 1σ; 2 points for ±1σ
& ± 2σ; and “only” 1 point for > ± 2σ and out to the LSL & USL!!!
How Does
this…
…Change the
game!!

Continuous: The data is measured by quantifying numbers.
e.g. the measurement of Length and Breadth of a table in the training room. This type of measurement can
result in an output like 5 ft, or, 4 ft 5 in, etc.
Data Classification – Numeric Data
Discrete: The data is counted. These counts can be in the form of event occurrences (Poisson, e.g. the
number of errors in the form, number calls in the queue, etc.) or can be count-based proportions
(Binomial, e.g. percentage pass/fail, etc.; pass/fail being binary data). Discrete data can have count-
based values and/or count-based proportions as a possibility and involves counting

Data Classification - Summary
Numeric Attribute
Discrete
Continuous Binary
Non-Binary
They can take any numerical value
(1, 3.2, 5, 4.25, 6, 20, …)
Example:
Height, length, temperature, etc…
They can take a count based value or a derived
proportion value (3, 5, 78, 34, 10 out of 100,…)
Example: Number of People, % Present, etc…
They can take only two values
Example: Yes/No or Man/Woman
They can take more than two values
Example: Colors - (Nominal – red, green, blue, ...)
Color Intensity - (Ordinal – strong, moderate, light)

Data Representation
Numeric Attribute
Time Plot
Histogram
Pie Chart
Bar Diagram

Data Representation – Time Plot
A Time Plot is a graphical representation of numerical data. A time-plot is a two-dimensional graph
used to detect trends in time.

Data Representation – Pie Chart
A Pie Chart is a graphical representation of attribute data. The “pieces” represent proportions of count
categories in the overall situation. Pie charts show the relationship among quantities by dividing the
whole pie (100%) into wedges or smaller percentages

End of Data, Information & Statistics…This document is a partial preview. Full document download can be found on Flevy:

Measures of Location & Spread
Measure of
Central Tendency
Measure of
Dispersion
Median
Mean
Mode
Quartiles
Variance
Standard
Deviation
Range
Span
Stability Factor

Measures of Location – Median
● If we put all data in rank order (low to high) then the median is the ordered value located at n/2, i.e. in
the middle If there are odd number of observations then median is the (n+1)/2th ordered value.
● If there are even number of observations then median is the average of the two middle values i.e. the n/2
and the (n/2)+1 ordered value
+ ¥- ¥
99.7%
95%
68%
50% Median

Measures of Location – Quartiles
● Quartile 1: (Q1 or P25) is defined as the ordered value below which 25% of the data points fall.
If we put all data in rank order (low to high), then Q1 is the value located at n/4
● Quartile 3: (Q3 or P75) is defined as the ordered value below which 75% of the data points fall.
If we put all data in rank order (low to high), then Q3 is the value located at 3n/4
+ ¥- ¥
99.7%
95%
68%
50%
25%
75%

Measures of Spread – Standard Deviation
Standard deviation can be interpreted as the average distance of the individual observations from
the group mean; it is a measure of distance with a positive value
Standard deviation for the population is represented as “σ”
Standard deviation for the sample is represented as “s”

Measures of Spread – Variance
Variance is defined as the spread of the standard deviation from the group observations; it is
a measure of area with a positive value
● Variance for the population is represented as “σ2”
● Variance for the sample is represented as “s2”

Pooled Standard Deviation & Variance
An important property of standard deviation and variance is also that variances can get added while
standard deviations do NOT!
The first way out in the later case is that variances can be averaged and then a “pooled” standard
deviation can be computed as:

Measures of Spread – Span
Span measures how successful we are in meeting our customer needs and requirements; and is
NOT influenced by extreme values.
Span for a given dataset is calculated as:
Span = P95 – P5

Recap
• Types of Data
– Numeric
– Attribute
• Numeric
– Continuous
– Discrete
• Attribute
– Nominal
– Ordinal
• Population Characteristics
– Location
– Spread
– Shape
– Consistency
● Measure Location
● Central Tendency
● Mean
● Median
● Mode
● Quartiles
● Measure of Spread
● Dispersion
● Standard Deviation
● Variance
● Range
● Span
● Stability Factor

Measures of Shape - Distributions
The two possible distribution models for our statistical study are:
● Probability distribution – A theoretical data distribution which represents futuristic data and
values are between -1 to +1
● Frequency distribution – An actual data distribution which represents historical data and values
are actual measurements
Probability
Distribution
Frequency
Distribution
●Futuristic Data
●Value are between -1 to +1
●Theoretical Distribution
●Actual Data
●Value are actual measurements
●Actual Distribution

Probability and Frequency…
Probability and Frequency – Game of Dice
Each die has 6 distinct sides and probability
of any one value being rolled is “1 out of 6”
…any Value on Die 1 = 1/6 = .1667
…any Value on Die 2 = 1/6 = .1667
…for 1 Combination = 1/6 x 1/6 = .0278
What was your frequency for the above
probability?!!!

Probability Distributions – Poisson
A Poisson distribution gives us the probability of exactly “x” independent occurrences during a given
period of time if events take place independently, equally likely across the whole area of opportunity, and at
a constant rate. This discrete distribution represents random discrete defects per unit, e.g. defects/unit,
calls/hour, etc.; i.e. number of occurrences measured as rates
A process that meets these criteria is called as a Poisson process
Examples: Used to represent distribution of number of defects in an - application form; or, customer arrivals; or,
insurance claims; or, incoming telephone calls; etc.
Poisson experiments are often called as
rate experiments!!!

Estimation: For a Poisson experiment where λ is the expected number of successes in a given area or
opportunity and e = 2.718 is a constant, then the Poisson probability distribution is defined
as
For a Poisson distribution , μ = λ and σ = √ λ, “λ” is the expected number of successes and
“e” is a constant found in a number of natural processes.
It is the inverse of the natural log and is useful in modeling growth of biological systems and events like the
random distribution of defects on statements. An approximate value for “e” is 2.718

Example: The average number of orders being booked during weekday mornings is four per hour.
The probability that six order booking requests will arrive in any given hour is calculated as:

Probability Distributions – Binomial
The Binomial distribution is a discrete distribution which takes on the proportion values; and,
used for determining the probability of establishing if the unit would be defective or not defective
It is often used as a model for determining the probabilities for any number of successes from a series of
trials, where the possible outcome is either “success” (p) or “failure” (q=1-p)
Mutually exclusive and exhaustive events represent a Binomial
Distribution!!!

Shape: The shape of this probability curve is dependent on the sample size and probability of success. If
the average (number of samples * probability of success) > 5 it is approximated to a
normal distribution else it is an exponential distribution
If [average (or, np) > 5] and [n(1-p) > 5], then the distribution tends to move
towards a normal distribution

Note 1: A binomial distribution with a “high success rate” can be approximated by a
Gaussian distribution if the following condition is met – n>= (5/min (p, q)), i.e. if the
number of trials is greater than or equal to 5 divided by the minimum of p and q values Note
2: A binomial distribution with a “low success rate” can be approximated by a
Poisson distribution

Probability Distributions – Exponential
An Exponential Distribution gives the distribution of time between
independent events occurring at a constant rate. This continuous distribution is also used to assign
probabilities to task cycle times (Target, lower the better/Target, higher the better)
This continuous exponential distribution measures the time between two occurrences that have a Poisson
distribution
Examples: Distribution of time between arrival of travelers at a airline check-in counter; or, distribution of
the service levels in a contact center process (Service level is the % number of calls answered in “x”
seconds by the total number of calls offered to the queue in consideration); etc
Exponential experiments are often called as
target experiments!!!

Shape: The shape of this probability curve is either skewed to the left or towards right of the x-axis
Probability Distributions – Exponential
Left Skewed Right Skewed

This statistically most important probability distribution exhibits the following vital characteristics:
Probability Distributions – Gaussian
● The probability curve indicates random or chance variation
● The mean, median, mode of the probability curve are the same
● The probability curve peak represents the center of the process
● The probability curve theoretically does not reach towards zero
● The probability curve can be divided in half with equal pieces falling
either
side of the most frequently occurring value
● The total area under the curve represents virtually 100% of the
product or
service the process is capable of producing

Probability Distributions – Gaussian
Transformation: If we standardize the mean (μ) and standard deviation (σ) to a “z” value then the
standard normal distribution gets represented as N(0,1) and can be used for any dataset!!
“z” is equal to “x”, only if μ = 0 and σ = 1…

Measures of Shape - Normality
Normality is a measure of distribution of frequently occurring values around the average and other
probabilities tailing off symmetrically in both directions from –∞ to +∞
Anderson-darling test is used to compare the actual frequency distribution with a theoretical
normal distribution calculated by using sample estimates for μ and σ of the frequency distribution
The test calculates a Anderson-darling statistic (AD) and a Critical value statistic (CV) and
if AD < CV then we assume the frequency distribution to be exhibiting characteristics of a normal
distribution
A normal data would result in symmetrical distribution
of data around the mean!!!

Measures of Shape – Kurtosis
8 9 10 11 12 13 14 15 16 17 186 7 8 9 10 11 12 13 14 15 16 17 186 76 7
Kurtosis is a measure of how squashed (too flat and broad) or how squeezed (too tall and thin)
the pattern of frequency distribution is relative to a perfect normal distribution
A perfectly fit data would result in a
Kurtosis value of zero!!!
Normal
Squeezed
Squashed

Inferential Statistics

Inferential Statistics – Confidence Interval
Confidence Interval: A confidence interval gives an estimated range of values which is likely to
include an unknown population parameter, the estimated range being calculated from a given set of
sample data; And, these intervals are very useful in assessing the practical significance of a given result
The width of a confidence interval is related to sample size and measurement variability in the
observations. The width is decreased by increasing the sample size, but is increased with the increasing
variability in our processes
It’s the interval computed from “sample data” that has
a specified probability that the unknown parameter of
interest is contained within the interval!!!This document is a partial preview. Full document download can be found on Flevy:

Inferential Statistics – Confidence Limit
Confidence Limits: The lower and upper limits of a confidence interval that define the interval
within which a population parameter being estimated presumably lies. These limits are computed from
sample data and have a known probability that the unknown parameter of interest is contained between
them
Confidence limits, which define the range of a confidence interval,
are usually annotated as:
● LCL = Lower Confidence Limits
● UCL = Upper Confidence Limits
Confidence limits are the “upper” and “lower” values of
the confidence interval; And, population μ and σ can be
anywhere between these limits!!!This document is a partial preview. Full document download can be found on Flevy:

Inferential Statistics – Confidence Level
Confidence Coefficient: The confidence coefficient of a confidence interval for a parameter is the
probability that the interval will contain the value of the parameter of interest
● It is the percentage of intervals (obtained from the repeated samples, each of
size “n”, taken from a given population of size “N”) that can be expected to
include the actual value of the parameter being estimated
The Confidence Level (CL) is the probability value associated with a confidence interval and is
often expressed as percentage
Confidence level is how sure we want to be that the population mean or standard deviation falls in
the confidence interval we are going to calculate based on the sample from the populationThis document is a partial preview. Full document download can be found on Flevy:

Inferential Statistics – Confidence Level
Therefore, for any given interval estimates and Confidence Level:
● Our confidence level reduces as the confidence interval size reduces
● Our confidence interval size widens as the confidence level
increases
Confidence Coefficient for running statistical experiments can take different values depending
on the Confidence Level required for the estimation of results:
● Coefficient has a value of 1.64 for tests conducted at 90%-CL
The most commonly used confidence coefficient values
for many statistical experiments!!!

Confidence Level Vs Confidence Interval
Pa r t Cycle T im e
163162161160159158157
99% C I fo r th e M ean (Z = 2.58)
95% C I fo r th e M ean (Z = 1.96)
90% C I fo r th e M ean (Z = 1.65)
80% C I fo r th e M ean (Z = 1.28)
As confidence level increases, confidence intervals widen

Inferential Statistics – Precision Level
Margin Of Error: The Precision Level refers to the spread of an estimate of a parameter, and/or
the quality associated with a set of measurements by which repeated observations approximate to the true
value of a parameter
A precise measurement may not be accurate because of the unrecognized bias or the other errors in the
process of sampling
Statistical experiments are typically
carried out at a “Precision Level” of
5%!!!This document is a partial preview. Full document download can be found on Flevy:

Performance Measures
“CTQs Are Customer Needs Translated Into The Critical Process
Requirements That Are Specific & Measurable” !!!

Performance Targets & Specifications
A performance target is the requirement(s), and specification the requirement range, imposed by the
customer on a specific CTQ. It addresses the following:
● What does the customer want?
● What is a good process/product?
● What is a customer defect?
The goal of a performance target and specification is to translate the customer requirements into a
measurable characteristic which has a defined:
● Operational Definition
● Target Value
● Specification Limits
● Defect Definition

Performance Targets & Specifications
Specification Limit:
● A specification limit is the range on the target for output characteristic of a CTQ within which the
customer is served satisfactorily
● The specification limits are defined by the customer and are also called as “performance standards”
and/or “tolerance limits”
Defect Definition:
● A defect is a customer experience that results in an unacceptable level of customer satisfaction by the
usage of product/process

Operational Definitions – The “Must-be” Features
Must have specific and concrete criteria:
● Everyone in the project team has the same understanding of the definition
● Different people can use the definition and know that their data will be measured in the same
way
Must have a method to measure criteria:
● The definition tells you how to get a value (either a number or
yes/no)
● The definition will clearly state how to get a measurement
There is no single right way to write an
operational definition!!!This document is a partial preview. Full document download can be found on Flevy:

Determine the Project Y

1
Flevy (www.flevy.com) is the marketplace
for premium documents. These
documents can range from Business
Frameworks to Financial Models to
PowerPoint Templates.
Flevy was founded under the principle that
companies waste a lot of time and money
recreating the same foundational business
documents. Our vision is for Flevy to
become a comprehensive knowledge base
of business documents. All organizations,
from startups to large enterprises, can use
Flevy— whether it's to jumpstart projects, to
find reference or comparison materials, or
just to learn.
Contact Us
Please contact us with any questions you may have
about our company.
• General Inquiries
support@flevy.com
• Media/PR
press@flevy.com
• Billing
billing@flevy.com

Data, Information, & Statistics

Recomendados

Recomendados

Más contenido relacionado

Más de Flevy.com Best Practices

Más de Flevy.com Best Practices (20)

Último

Último (20)

Data, Information, & Statistics