Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM
1. Modeling Big Count Data
An IRLS framework for COM-Poisson regression and GAM
Suneel Chatla
Galit Shmueli
November 12, 2016
Institute of Service Science
National Tsing Hua University, Taiwan (R.O.C)
2. Table of contents
1. Speed Dating Experiment- Count data models
2. Motivation
3. An IRLS framework
4. Simulation Study-Comparison of IRLS with MLE
5. A CMP Generalized Additive Model
6. Results & Conclusions
1
4. Speed dating experiment
Fisman et al. (2006) conducted a speed dating experiment to
evaluate the gender differences in mate selection 1
.
Total sessions 14
Decision 1 or 0
Attractiveness 1-10
Intelligence 1-10
Ambition 1-10
...
...
Control variables
1https://www.kaggle.com/annavictoria/speed-dating-experiment
2
5. Outcome/Count variables
Matches : When both persons decide Yes
Tot.Yes : Total number of Yes for each subject in a particular session
3
10. CMP Regression
CMP regression models can be formulated as follows:
log(λ) = Xβ (1)
log(ν) = Zγ (2)
Maximizing the log-likelihood w.r.t the parameters β and γ will yield
the following normal equations Sellers and Shmueli (2010):
U =
∂logL
∂β
= XT
(y − E(y)) (3)
V =
∂logL
∂γ
= νZT
(−log(y!) + E(log(y!))) (4)
8
13. More flexibility?
Generalized Additive Models
• Smoothing Splines
• Penalized Splines
Both implementations are dependent upon the Iterative Reweighted
Least Squares (IRLS) estimation framework.
At present, there is no IRLS framework available for CMP !!
10
20. Study design
We compare our IRLS algorithm with the existing implementation
which is based on maximizing the likelihood function (through optim
in R).
(a) Set sample size n = 100
(b) Generate x1 ∼ U(0, 1) and x2 ∼ N(0, 1)
(c) Calculate x3 = 0.2x1 + U(0, 0.3) and x4 = 0.3x2 + N(0, 0.1) (to
create correlated variables)
(d) Generate
y ∼ CMP(log(λ) = 0.05 + 0.5x1 − 0.5x2 + 0.25x3 − 0.25x4, ν)
where ν = {0.5, 2, 5}
15
21. Results
q
q
q
q
IR MLE IR MLE IR MLE
−0.50.00.51.01.5
x1
q q
q
q
q
q
q
q
IR MLE IR MLE IR MLE
−2.0−1.5−1.0−0.50.00.5
x2
q
q
q
IR MLE IR MLE IR MLE
−4−20246
x3
q
q
q
q
q
q
q
q
qq
IR MLE IR MLE IR MLE
−4−2024
x4
q
q
q
IR MLE IR MLE IR MLE
−2−101234
log(ν)
ν=0.5
ν=2
ν=5
16
27. Comparison of Additive Models on Tot.Yes
Dependent variable:
Tot.Yes
CMP(Chi.Sq) Poisson(Chi.Sq)
s(sinc) 7.16 11.53∗∗
s(func) 7.51 11.40∗∗
s(sinc_o) 13.96∗∗
29.30∗∗∗
s(intel_o) 14.06∗∗
13.26∗∗∗
ν 0.56
AIC 2737.03 2804.77
Note: ∗
p<0.1; ∗∗
p<0.05; ∗∗∗
p<0.01
It’s more about the behavior of opposite person that guide us to
select her/him.
20
28. Summary
• The IRLS framework is far more efficient than the existing
likelihood based method and provides more flexibility.
• Since CMP is computationally heavier than the other GLMs we
could parallelize some matrix computations inorder to increase
the speed.
• The IRLS framework allows CMP to have other modeling
extensions such as LASSO etc.
Full paper available from https://arxiv.org/abs/1610.08244
and the source code is available from
https://github.com/SuneelChatla/cmp
21
30. References
Fisman, R., Iyengar, S. S., Kamenica, E., and Simonson, I. (2006).
Gender differences in mate selection: Evidence from a speed
dating experiment. The Quarterly Journal of Economics, pages
673–697.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models,
volume 43. CRC Press.
Sellers, K. F. and Shmueli, G. (2010). A flexible regression model for
count data. Annals of Applied Statistics, 4(2):943–961.
Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., and Boatwright, P.
(2005). A useful distribution for fitting discrete data: revival of the
conway–maxwell–poisson distribution. Journal of the Royal
Statistical Society: Series C (Applied Statistics), 54(1):127–142.
31. Wood, S. (2006). Generalized additive models: an introduction with R.
CRC press.