SlideShare una empresa de Scribd logo
1 de 31
Selection of bin width and
     bin number for
       Histograms
 We often come across the problem when density of the variable of
interest is unknown. One popular method of estimating the unknown
density is by using the Histogram estimator.




 Often the decision on bin number or bin width in a histogram is
made arbitrarily or subjectively but need not be. Here we review the
literature on various statistical procedures that have been proposed for
making the decision on optimum bin width and bin number.
 We shall review various methods in statistical literature that
are prevalent for determining the optimal number of bins and
the bin width in a histogram.

 We shall also try to present a comparative analysis so as to
determine which methods are more efficient.

The measure we use to compare the various methods of
optimal binning is sup|ĥ(x) -h(x)| where ĥ(x) is the
histogram density estimator at x and h is the true density at the
point x.
Proposed methods of interest for optimal binning
 Sturges’ rule and Doane Modification
 Scott Rule and Freedman-Diaconis modification
 Bayesian optimal binning
 Optimal binning by Hellinger Risk minimization
 Penalized maximum log-likelihood method with
 penalty A and Hogg penalty
 Stochastic complexity or Kolmogorov complexity
 method
Sturges’ Rule
 If one constructs a frequency distribution with k bins , each of width 1 and
centered on the points i=0,1,. . . , k-1 . Choose the bin count of the "i" th
bin to be the Binomial coefficient           . As k increases, the ideal frequency
histogram assumes the shape of a normal density with mean (k-1)/2 and
variance (k-1)/4.
     According to the Sturges' rule, the optimum number of bins for the histogram
     is given by,
                       n =

k is the number of bins to be used. This when solved for k, gives us

                n=
We split the sample range into k such bins of equal length. So, the Sturges'
rule gives us a regular histogram.
Conceptual Fallacy of the Sturges rule

•  There is conceptually a fallacy in Sturges rule derivation Instead of
  choosing n=        , one could have satisfied any n that satisfies
  individual cell frequencies to be
• m(i)=no. of obs in “i”th cell could well have been taken to be
   m(i)= n.

•  So, intuitively there is no reason for choosing this particular n given
  the motivation we employ in Sturges’ rule.
 Doane’s law
For skewed or kurtotic distributions, additional bins may be required.
  Doane proposed increasing the number of bins by log2(1 +ŷ ) where
  ŷ is the standardized skewness coefficient.
Scott rule and Freedman Diaconis
                    modification
• We get an optimum band width by minimizing the asymptotic
  expected L2 norm. The histogram estimator is given by
• ĥ(x)=Vk /nh where h is the bin width and n is the total no. of
  observations and Vk = no. of obs lying in the “k” th bin.
• The optimum band width given by
    h*(x) = [f(xk)/2γ2n] 1/3 where xk is some point lying in the “k”th
  bin and γ is the Lipschitz continuity factor.
  For normal density case, we observe that h*= 3.5n-1/3sd(x) for
  regular case.
 The Friedman Diaconis modification for non-normal data is given
  by h*= 2(IQ)n-1/3
Hellinger risk minimization
• The Hellinger risk between the histogram density estimator ĥ(x) for a
  given bin width k for a regular histogram and the true density f(x) is
  defined as
   H=
• We try to minimize this quantity for different choices of the bin width
  or bin number .



•    If the true f is known, we have no problem in dealing with this
    integral. But, if the true f is not known, one may estimate f using
    Bootstrapping over repeated sample from f.
Bayesian model for optimal binning

The likelihood of the data given the parameters M – no. of
bins and the vector tuple π ,we get
P(d/ π,M,I)= (M/V)N π1n1π2n2 ……πM- 1nM-1 πMnM where
 V =Mv and v is the bin width.

 Assume that the prior densities are defined as follows
P(M/I)=1/C where C= max no. of bins taken in account
P(π/M) = [π1π2 …πM ]-1/2 ᴦ   (M/2)/ᴦ (1/2)M. Which is actually
a Dirichlet distribution with M parameters equal to ½ and this is
conjugate prior of multinomial distribution.
 P(π, M/d,I) = k*P(π /M)P(M/)P(d/ π ,M) is obtained and
integrated over M to get the marginal distribution of M which
when maximized yields the optimal value of M.
Maximum penalized loglikelihood method
 In this case we do maximize the loglikelihood of the multinomial
 distribution corresponding to a histogram but with some penalty
 function added. The penalized loglikelihood is thus of the form

 Pl=log(L(ĥ, x1 , x2 ,……, xn))- penn (I) where I is the partition of
 the sample range into disjoint intervals. Note that these bins need
 not be of equal length i,e the histogram may be irregular.

 There are various choices of the penalty , however our two
 choices have been under D bins

 penA=

 The first penalty is applicable for both regular and irregular cases
 penB(Hogg or Akaike penalty)=D-1
Stochastic complexity method

• This is based on the idea of encoding the data with minimum number
  of bits. This is a sort of PML with no. of bits or description length as
  penalty.
• If P(X|Ө) be the distribution of the data with Ө unknown and if σi
  (Ө) be the standard deviation with respect to the
  best estimator of “i” th co-ordiante of Ө, then the description length
  is given by
  - log2 (P(X|Ө))+∑ log2 ( )
 We define stochastic complexity as
 - log2 ∫ P(X|Ө) π(Ө)dӨ. If we take an uniform prior for Ө,
 Then taking P(X|Ө) to be the multinomial distribution, we get
  stochastic complexity to be
 l=(m-1)ᴦ          (N .N2 ….. , Nm )/(m+n-1)ǃ. Maximize wrt to m to
                      1
  get the no. of bins
Simulation design
In order to compare the various methods of binning, we use
simulation experiments from 3 reference distributions namely Chi
square (2), Normal(0,1) and Uniform(1,10).

We compare the statistic T =    | |ĥ(x)-f(x)|

For various methods and compare how smaller the value of T is on an
average for each of these methods.

We have simulated 1000 observations from each of the reference
distributions,computed the T statistic for each simulated run and
carry out this experiment 200 times to get a distribution of T.
Mean and variance of T for chi-
square(2)
      Method          Mean no.   Mean(T)   Variance(T)
                      of bins
      Sturges         10         0.1364    0.00031
      Doane           15         0.1028    0.00018
      Scott           19         0.0874    0.00015
      Hellinger       13         0.1151    0.00022
      FD              32         0.0747    0.00023
      Kolmogrov       10         0.2744    0.01288
      Bayesian        12         0.1177    0.00037
      Hogg            18         0.0948    0.000194
      Irregular(pen   6          0.1134    0.00028
      A)
back
Analysis of chi square simulation
 For Chi-square(2), Freedman-diaconis and Scott’s rule
  have performed very well in terms of smaller mean value of
  T.

 Kolmogorov’s complexity method has the maximum
  spread in the t-values. The distribution of T under the
  sturge’s rule dominates that under Freedman-diaconis and
  Scott’s rule.

 The irregular histogram method under PenA gives very
  less no. of bins compared to others.
Mean and variance of T for N(0,1)

Method            Mean     Mean(T)   Variance(T)
                  No. of
                  Bins
Sturges           10       0.0909    0.00013
Scott             18       0.08377   0.00025
Hellinger         20       0.08309   0.00022
FD                25       0.08687   0.00029

Kolmogrov         13       0.2243    0.0137
Bayesian          13       0.0912    0.00022
Hogg              12       0.0855    0.00011
Irregular(penA)   6        0.1984    0.00113
T distribution for N(0,1) family




                                   back
 For Normal(0,1), we left out Doane's modification as it is meant for
non-normal or skewed distribution.

 Sturges rule and Scott’s rule have performed very well under the
normal case, which is expected given that they are designed under
normality assumptions.

Scott, Freedman-Diaconis and Sturges rule are very close to one
another in terms of the distribution of T.

The penalized log –likelihood with penalty A has a distribution of T
that dominates the T distribution under the other methods.

 The T-distribution under stochastic complexity and Hellinger
distance have maximum spread. The minimum spread is due to
Sturges rule.
Mean and Variance of T for U(1,10)

  Method     Mean no. of    Mean(T)   Variance(T)
             Bins
  Sturges    10            0.1298     0.00036
  Scott      9             0.1288     0.00035
  Doane      11            0.1308     0.00051
  FD         9             0.1283     0.00032
  Bayesian   9             0.1274     0.000361
back
Analysis under U(1,10) distribution
 Most of the methods under uniform case give only
 1 or 2 bins, so they cannot be compared with others
 which are more stable in nature.

 However, the Scott’s, Freedman Diaconis and
 Sturges rule have performed well with small values
 of the T and small variation in the values of T
 under repeated simulations.
   Similar to the univariate method, we try to
    generalize our method for bivariate
    distributions.
   Here we simulate observations from bivariate
    normal distribution with mean (0,0) , ρ = 0.5
    and σ2 = 1.
   The methods we use are the multivariate
    extension of the Bayesian optimal binning
    and the multivariate Scott's rule.
In the same vein as in univariate case, the
multivariate Scott’s rule is determined by minimizing
 the asymptotic L-2 error of the expected L-2 norm.


The Multivariate Scott’s choice of bin
width is given by h*=3.5 σxk
Where d is the dimension of the
dataset and σxk the standard
deviation along “k”th co-ordinate.
The 3-d histogram obtained for T statistic
      distribution under Scott rule
Distribution of T-statistic for
Bivariate Normal under Scott’s rule
Bayesian optimal binning for multivariate normal
case
 In this case, we select Mx bins along X axis and My
  bins along the Y axis and define M= Mx My .The joint
  likelihood in this case given by
 h(x,y, Mx ,My )=

which is quite analogous to the univariate
case. Again taking a rectangular prior for
(Mx,My ) and dirichlet distribution of M
dimensions with each parameter ½ as prior
for ℿ.
Bivariate normal histogram under
Bayesian optimal binning
T distribution under Bayesian rule
for bivariate normal
      We have dealt with only histogram estimators in
    this paper.However,one may apply smoothing
    parameter to make the estimator more efficient and
    analyze the values of T-staistic for various smoothing
    parameters.
      We have only used Bayesian and Scott’s
    multivariate extensions . However, one may try to
    generalize other methods in the multivariate case .
    One may use other form of penalties and observe for
    which penalty, the estimator thus obtained is most
    efficient.
   From All three univariate simulation
    experiments we infer that Scott’s and
    Freedman-Diaconis method have been most
    efficient in reducing the values of T .
    No method however is uniformly best under
    all scenarios.
    For bivariate normal case, using Scott’s rule
    and Bayesian optimal binning , we find that the
    T value is smaller on an average under Scott
    than under the bayesian optimal binning.
Kushal Kumar Dey
    Saswati Saha
    Raka Mondol
 Avijit Kumar Dutta
Nilanjan Chatterjee

Más contenido relacionado

La actualidad más candente

Measures of Central Tendency and Dispersion
Measures of Central Tendency and DispersionMeasures of Central Tendency and Dispersion
Measures of Central Tendency and DispersionPharmacy Universe
 
Autocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testAutocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testShilpa Chaudhary
 
Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of varianceRavi Rohilla
 
Skewness and kurtosis
Skewness and kurtosisSkewness and kurtosis
Skewness and kurtosisKalimaniH
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaEdureka!
 
Confidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overviewConfidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overviewRizwan S A
 
Chi square test final
Chi square test finalChi square test final
Chi square test finalHar Jindal
 
Measures of variability
Measures of variabilityMeasures of variability
Measures of variabilityJed Abolencia
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 
Introduction to the t Statistic
Introduction to the t StatisticIntroduction to the t Statistic
Introduction to the t Statisticjasondroesch
 
Autocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesAutocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesShilpa Chaudhary
 

La actualidad más candente (20)

Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
 
Chi square
Chi squareChi square
Chi square
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Measures of Central Tendency and Dispersion
Measures of Central Tendency and DispersionMeasures of Central Tendency and Dispersion
Measures of Central Tendency and Dispersion
 
Anova; analysis of variance
Anova; analysis of varianceAnova; analysis of variance
Anova; analysis of variance
 
Autocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testAutocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d test
 
Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of variance
 
Skewness and kurtosis
Skewness and kurtosisSkewness and kurtosis
Skewness and kurtosis
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | Edureka
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Confidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overviewConfidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overview
 
Chi square test final
Chi square test finalChi square test final
Chi square test final
 
Measures of dispersions
Measures of dispersionsMeasures of dispersions
Measures of dispersions
 
Measures of variability
Measures of variabilityMeasures of variability
Measures of variability
 
Chi squared test
Chi squared testChi squared test
Chi squared test
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
Introduction to the t Statistic
Introduction to the t StatisticIntroduction to the t Statistic
Introduction to the t Statistic
 
Autocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesAutocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and Consequences
 
Variance And Standard Deviation
Variance And Standard DeviationVariance And Standard Deviation
Variance And Standard Deviation
 

Similar a Optimal Binning Methods for Histogram Density Estimation

Characterization of student’s t distribution with some application to finance
Characterization of student’s t  distribution with some application to financeCharacterization of student’s t  distribution with some application to finance
Characterization of student’s t distribution with some application to financeAlexander Decker
 
1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environmentBob Marcus
 
K-adaptive partitioning for survival data
K-adaptive partitioning for survival dataK-adaptive partitioning for survival data
K-adaptive partitioning for survival data수행 어
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsUniversity of Salerno
 
law of large number and central limit theorem
 law of large number and central limit theorem law of large number and central limit theorem
law of large number and central limit theoremlovemucheca
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadRione Drevale
 
descriptive statistics.pptx
descriptive statistics.pptxdescriptive statistics.pptx
descriptive statistics.pptxTeddyteddy53
 
Non-Normally Distributed Errors In Regression Diagnostics.docx
Non-Normally Distributed Errors In Regression Diagnostics.docxNon-Normally Distributed Errors In Regression Diagnostics.docx
Non-Normally Distributed Errors In Regression Diagnostics.docxvannagoforth
 
Problem_Session_Notes
Problem_Session_NotesProblem_Session_Notes
Problem_Session_NotesLu Mao
 
InternshipReport
InternshipReportInternshipReport
InternshipReportHamza Ameur
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceLong Beach City College
 

Similar a Optimal Binning Methods for Histogram Density Estimation (20)

Characterization of student’s t distribution with some application to finance
Characterization of student’s t  distribution with some application to financeCharacterization of student’s t  distribution with some application to finance
Characterization of student’s t distribution with some application to finance
 
Talk 3
Talk 3Talk 3
Talk 3
 
kcde
kcdekcde
kcde
 
Montecarlophd
MontecarlophdMontecarlophd
Montecarlophd
 
Input analysis
Input analysisInput analysis
Input analysis
 
1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment
 
K-adaptive partitioning for survival data
K-adaptive partitioning for survival dataK-adaptive partitioning for survival data
K-adaptive partitioning for survival data
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
2019 PMED Spring Course - SMARTs-Part II - Eric Laber, April 10, 2019
2019 PMED Spring Course - SMARTs-Part II - Eric Laber, April 10, 2019 2019 PMED Spring Course - SMARTs-Part II - Eric Laber, April 10, 2019
2019 PMED Spring Course - SMARTs-Part II - Eric Laber, April 10, 2019
 
Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 
Sturges
SturgesSturges
Sturges
 
law of large number and central limit theorem
 law of large number and central limit theorem law of large number and central limit theorem
law of large number and central limit theorem
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spread
 
descriptive statistics.pptx
descriptive statistics.pptxdescriptive statistics.pptx
descriptive statistics.pptx
 
Non-Normally Distributed Errors In Regression Diagnostics.docx
Non-Normally Distributed Errors In Regression Diagnostics.docxNon-Normally Distributed Errors In Regression Diagnostics.docx
Non-Normally Distributed Errors In Regression Diagnostics.docx
 
Problem_Session_Notes
Problem_Session_NotesProblem_Session_Notes
Problem_Session_Notes
 
InternshipReport
InternshipReportInternshipReport
InternshipReport
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 

Optimal Binning Methods for Histogram Density Estimation

  • 1. Selection of bin width and bin number for Histograms
  • 2.  We often come across the problem when density of the variable of interest is unknown. One popular method of estimating the unknown density is by using the Histogram estimator.  Often the decision on bin number or bin width in a histogram is made arbitrarily or subjectively but need not be. Here we review the literature on various statistical procedures that have been proposed for making the decision on optimum bin width and bin number.
  • 3.  We shall review various methods in statistical literature that are prevalent for determining the optimal number of bins and the bin width in a histogram.  We shall also try to present a comparative analysis so as to determine which methods are more efficient. The measure we use to compare the various methods of optimal binning is sup|ĥ(x) -h(x)| where ĥ(x) is the histogram density estimator at x and h is the true density at the point x.
  • 4. Proposed methods of interest for optimal binning  Sturges’ rule and Doane Modification  Scott Rule and Freedman-Diaconis modification  Bayesian optimal binning  Optimal binning by Hellinger Risk minimization  Penalized maximum log-likelihood method with penalty A and Hogg penalty  Stochastic complexity or Kolmogorov complexity method
  • 5. Sturges’ Rule If one constructs a frequency distribution with k bins , each of width 1 and centered on the points i=0,1,. . . , k-1 . Choose the bin count of the "i" th bin to be the Binomial coefficient . As k increases, the ideal frequency histogram assumes the shape of a normal density with mean (k-1)/2 and variance (k-1)/4. According to the Sturges' rule, the optimum number of bins for the histogram is given by, n = k is the number of bins to be used. This when solved for k, gives us n= We split the sample range into k such bins of equal length. So, the Sturges' rule gives us a regular histogram.
  • 6. Conceptual Fallacy of the Sturges rule • There is conceptually a fallacy in Sturges rule derivation Instead of choosing n= , one could have satisfied any n that satisfies individual cell frequencies to be • m(i)=no. of obs in “i”th cell could well have been taken to be m(i)= n. • So, intuitively there is no reason for choosing this particular n given the motivation we employ in Sturges’ rule. Doane’s law For skewed or kurtotic distributions, additional bins may be required. Doane proposed increasing the number of bins by log2(1 +ŷ ) where ŷ is the standardized skewness coefficient.
  • 7. Scott rule and Freedman Diaconis modification • We get an optimum band width by minimizing the asymptotic expected L2 norm. The histogram estimator is given by • ĥ(x)=Vk /nh where h is the bin width and n is the total no. of observations and Vk = no. of obs lying in the “k” th bin. • The optimum band width given by h*(x) = [f(xk)/2γ2n] 1/3 where xk is some point lying in the “k”th bin and γ is the Lipschitz continuity factor. For normal density case, we observe that h*= 3.5n-1/3sd(x) for regular case.  The Friedman Diaconis modification for non-normal data is given by h*= 2(IQ)n-1/3
  • 8. Hellinger risk minimization • The Hellinger risk between the histogram density estimator ĥ(x) for a given bin width k for a regular histogram and the true density f(x) is defined as H= • We try to minimize this quantity for different choices of the bin width or bin number . • If the true f is known, we have no problem in dealing with this integral. But, if the true f is not known, one may estimate f using Bootstrapping over repeated sample from f.
  • 9. Bayesian model for optimal binning The likelihood of the data given the parameters M – no. of bins and the vector tuple π ,we get P(d/ π,M,I)= (M/V)N π1n1π2n2 ……πM- 1nM-1 πMnM where V =Mv and v is the bin width.  Assume that the prior densities are defined as follows P(M/I)=1/C where C= max no. of bins taken in account P(π/M) = [π1π2 …πM ]-1/2 ᴦ (M/2)/ᴦ (1/2)M. Which is actually a Dirichlet distribution with M parameters equal to ½ and this is conjugate prior of multinomial distribution.  P(π, M/d,I) = k*P(π /M)P(M/)P(d/ π ,M) is obtained and integrated over M to get the marginal distribution of M which when maximized yields the optimal value of M.
  • 10. Maximum penalized loglikelihood method In this case we do maximize the loglikelihood of the multinomial distribution corresponding to a histogram but with some penalty function added. The penalized loglikelihood is thus of the form Pl=log(L(ĥ, x1 , x2 ,……, xn))- penn (I) where I is the partition of the sample range into disjoint intervals. Note that these bins need not be of equal length i,e the histogram may be irregular. There are various choices of the penalty , however our two choices have been under D bins penA= The first penalty is applicable for both regular and irregular cases penB(Hogg or Akaike penalty)=D-1
  • 11. Stochastic complexity method • This is based on the idea of encoding the data with minimum number of bits. This is a sort of PML with no. of bits or description length as penalty. • If P(X|Ө) be the distribution of the data with Ө unknown and if σi (Ө) be the standard deviation with respect to the best estimator of “i” th co-ordiante of Ө, then the description length is given by - log2 (P(X|Ө))+∑ log2 ( )  We define stochastic complexity as  - log2 ∫ P(X|Ө) π(Ө)dӨ. If we take an uniform prior for Ө,  Then taking P(X|Ө) to be the multinomial distribution, we get stochastic complexity to be  l=(m-1)ᴦ (N .N2 ….. , Nm )/(m+n-1)ǃ. Maximize wrt to m to 1 get the no. of bins
  • 12. Simulation design In order to compare the various methods of binning, we use simulation experiments from 3 reference distributions namely Chi square (2), Normal(0,1) and Uniform(1,10). We compare the statistic T = | |ĥ(x)-f(x)| For various methods and compare how smaller the value of T is on an average for each of these methods. We have simulated 1000 observations from each of the reference distributions,computed the T statistic for each simulated run and carry out this experiment 200 times to get a distribution of T.
  • 13. Mean and variance of T for chi- square(2) Method Mean no. Mean(T) Variance(T) of bins Sturges 10 0.1364 0.00031 Doane 15 0.1028 0.00018 Scott 19 0.0874 0.00015 Hellinger 13 0.1151 0.00022 FD 32 0.0747 0.00023 Kolmogrov 10 0.2744 0.01288 Bayesian 12 0.1177 0.00037 Hogg 18 0.0948 0.000194 Irregular(pen 6 0.1134 0.00028 A)
  • 14. back
  • 15. Analysis of chi square simulation  For Chi-square(2), Freedman-diaconis and Scott’s rule have performed very well in terms of smaller mean value of T.  Kolmogorov’s complexity method has the maximum spread in the t-values. The distribution of T under the sturge’s rule dominates that under Freedman-diaconis and Scott’s rule.  The irregular histogram method under PenA gives very less no. of bins compared to others.
  • 16. Mean and variance of T for N(0,1) Method Mean Mean(T) Variance(T) No. of Bins Sturges 10 0.0909 0.00013 Scott 18 0.08377 0.00025 Hellinger 20 0.08309 0.00022 FD 25 0.08687 0.00029 Kolmogrov 13 0.2243 0.0137 Bayesian 13 0.0912 0.00022 Hogg 12 0.0855 0.00011 Irregular(penA) 6 0.1984 0.00113
  • 17. T distribution for N(0,1) family back
  • 18.  For Normal(0,1), we left out Doane's modification as it is meant for non-normal or skewed distribution.  Sturges rule and Scott’s rule have performed very well under the normal case, which is expected given that they are designed under normality assumptions. Scott, Freedman-Diaconis and Sturges rule are very close to one another in terms of the distribution of T. The penalized log –likelihood with penalty A has a distribution of T that dominates the T distribution under the other methods.  The T-distribution under stochastic complexity and Hellinger distance have maximum spread. The minimum spread is due to Sturges rule.
  • 19. Mean and Variance of T for U(1,10) Method Mean no. of Mean(T) Variance(T) Bins Sturges 10 0.1298 0.00036 Scott 9 0.1288 0.00035 Doane 11 0.1308 0.00051 FD 9 0.1283 0.00032 Bayesian 9 0.1274 0.000361
  • 20. back
  • 21. Analysis under U(1,10) distribution  Most of the methods under uniform case give only 1 or 2 bins, so they cannot be compared with others which are more stable in nature.  However, the Scott’s, Freedman Diaconis and Sturges rule have performed well with small values of the T and small variation in the values of T under repeated simulations.
  • 22. Similar to the univariate method, we try to generalize our method for bivariate distributions.  Here we simulate observations from bivariate normal distribution with mean (0,0) , ρ = 0.5 and σ2 = 1.  The methods we use are the multivariate extension of the Bayesian optimal binning and the multivariate Scott's rule.
  • 23. In the same vein as in univariate case, the multivariate Scott’s rule is determined by minimizing the asymptotic L-2 error of the expected L-2 norm. The Multivariate Scott’s choice of bin width is given by h*=3.5 σxk Where d is the dimension of the dataset and σxk the standard deviation along “k”th co-ordinate.
  • 24. The 3-d histogram obtained for T statistic distribution under Scott rule
  • 25. Distribution of T-statistic for Bivariate Normal under Scott’s rule
  • 26. Bayesian optimal binning for multivariate normal case  In this case, we select Mx bins along X axis and My bins along the Y axis and define M= Mx My .The joint likelihood in this case given by  h(x,y, Mx ,My )= which is quite analogous to the univariate case. Again taking a rectangular prior for (Mx,My ) and dirichlet distribution of M dimensions with each parameter ½ as prior for ℿ.
  • 27. Bivariate normal histogram under Bayesian optimal binning
  • 28. T distribution under Bayesian rule for bivariate normal
  • 29. We have dealt with only histogram estimators in this paper.However,one may apply smoothing parameter to make the estimator more efficient and analyze the values of T-staistic for various smoothing parameters.  We have only used Bayesian and Scott’s multivariate extensions . However, one may try to generalize other methods in the multivariate case .  One may use other form of penalties and observe for which penalty, the estimator thus obtained is most efficient.
  • 30. From All three univariate simulation experiments we infer that Scott’s and Freedman-Diaconis method have been most efficient in reducing the values of T .  No method however is uniformly best under all scenarios.  For bivariate normal case, using Scott’s rule and Bayesian optimal binning , we find that the T value is smaller on an average under Scott than under the bayesian optimal binning.
  • 31. Kushal Kumar Dey Saswati Saha Raka Mondol Avijit Kumar Dutta Nilanjan Chatterjee