תכניית כנס האיגוד לסטטיסטיקה 2013

‫האיגוד‬ ‫כנס‬
‫לסטטיסטיקה‬ ‫הישראלי‬
Israel Statistical
Association Conference
23.5.13

‫לסטטיסטיקה‬ ‫הישראלי‬ ‫האיגוד‬ ‫כנס‬
Israel Statistical Association Conference
‫תגים‬ ‫וקבלת‬ ‫רישום‬ ,‫התכנסות‬ 9:00-8:00
‫קל‬ ‫כיבוד‬ 9:00-8:30
‫קנת‬ ‫רון‬ ,‫האיגוד‬ ‫נשיא‬ – ‫פתיחה‬ ‫דברי‬ 9:10-9:00
‫טכניון‬ ,‫אדלר‬ ‫רוברט‬ :‫מליאה‬ ‫הרצאת‬ 10:00-9:10
"Topological Inference: an Old/New Way of Thinking About Data"
‫לכיתות‬ ‫ומעבר‬ ‫קפה‬ ‫הפסקת‬ 10:20-10:00
‫מוזמנים‬ ‫מקבילים‬ ‫מושבים‬ 12:00-10:20
Computational statistics and machine learning :I ‫מושב‬
‫גורפיין‬ ‫מלכה‬ :‫ויו"ר‬ ‫מארגנת‬
"Time Series Analysis - An Online Learning Approach" ‫טכניון‬ ,‫חזן‬ ‫אלעד‬
"On Estimation of Sparse Principal Components in High Dimensions" ‫וייצמן‬ ‫מכון‬ ,‫נדלר‬ ‫בועז‬
"Isotonic Modeling: Methodology and Applications" ‫תל-אביב‬ ‫אוניברסיטת‬ ,‫רוסט‬ ‫סהרון‬
"Lightning-speed Structure Learning of Non-Gaussian Networks" ‫העברית‬ ‫האוניברסיטה‬ ,‫אלידן‬ ‫גל‬
‫נתן‬ ‫גד‬ '‫פרופ‬ ‫של‬ ‫לזכרו‬ ‫מוקדש‬ ‫המושב‬ ,‫רשמית/ממשלתית‬ ‫סטטיסטיקה‬ :II ‫מושב‬
‫בורק‬ ‫לואיזה‬ :‫ויו"ר‬ ‫מארגנת‬
‫נתן‬ ‫גד‬ '‫פרופ‬ ‫של‬ ‫לזכרו‬ ‫דברים‬ ‫יישא‬ ‫העברית‬ ‫האוניברסיטה‬ ,‫פפרמן‬ ‫דני‬
‫מבקשי‬ ‫כלפי‬ ‫הגבוהה‬ ‫להשכלה‬ ‫המיון‬ ‫מערכת‬ ‫"הוגנות‬ ‫ולהערכה‬ ‫לבחינות‬ ‫הארצי‬ ‫המרכז‬ ,‫קלפר‬ ‫דביר‬
")‫העברית‬ ‫(בשפה‬ ‫הפסיכומטרית‬ ‫בבחינה‬ ‫מותאמים‬ ‫תנאים‬
Estimation of Measurement Error in Categorical" ‫לסטטיסטיקה‬ ‫המרכזית‬ ‫הלשכה‬ ,‫גובמן‬ ‫יורי‬
"Income Survey Data
"‫ומגבלות‬ ‫חסרונות‬ ,‫יתרונות‬ ,‫באינטרנט‬ ‫"סקרים‬ "‫"מדגם‬ ‫המחקר‬ ‫מכון‬ ,‫גבע‬ ‫מנו‬
"‫משתנים‬ ‫רב‬ ‫עיתיות‬ ‫בסדרות‬ ‫חריגים‬ ‫"זיהוי‬ ‫ישראל‬ ‫בנק‬ ,‫מנצורה‬ ‫אריאל‬
.‫ופוטר‬ ‫פרץ‬ ‫ע"ש‬ ‫הפרסים‬ ‫חלוקת‬ ‫וטקס‬ ‫האיגוד‬ ‫של‬ ‫כללית‬ ‫אסיפה‬ ‫תתקיים‬ ‫במהלכה‬ ,‫צהריים‬ ‫ארוחת‬ 13:30-12:00
‫מוזמנים‬ ‫מקבילים‬ ‫מושבים‬ 15:10-13:30
Topological Data Analysis :III ‫מושב‬
‫אדלר‬ ‫רוברט‬ :‫ויו"ר‬ ‫מארגן‬
”Duke University. “The Topology of Noise ‫בוברובסקי‬ ‫עומר‬
“A Topological Model of Recurrence” Ljubljana, Josef Stefan Institute .‫סקראבא‬ ‫פרימוז‬
.Harvard University ,‫שוורצמן‬ ‫ארמין‬
"Geometric Means of Positive Definite Matrices and the Matrix-Variate Log-Normal Distribution"
"Asymptotic Data Analysis on Manifolds" ‫חיפה‬ ‫אוניברסיטת‬ ,‫לנדסמן‬ ‫זינובי‬
‫המכללה‬ ‫לובי‬
309 ‫אולם‬
3 ‫אולם‬
2 ‫אולם‬
3 ‫אולם‬
2 ‫אולם‬
‫הכנס‬ ‫תוכנית‬

‫וספורט‬ ‫סטטיסטיקה‬ :IV ‫מושב‬
‫גולדבורט‬ ‫אורי‬ :‫יו"ר‬ ‫לידור‬ ‫רוני‬ ,‫לוי‬ ‫יוסי‬ :‫מארגנים‬
" ‫לעתיד‬ ‫ומבט‬ ‫היסטורית‬ ‫סקירה‬ - ‫ספורט‬ ‫"סטטיסטיקת‬ ‫טבע‬ ,‫לוי‬ ‫יוסי‬
‫העברית‬ ‫האוניברסיטה‬ ‫נבו‬ ‫דניאל‬
"Around the goal: Examining the effect of the first goal on the second goal in soccer using
survival analysis methods"
‫בוינגייט‬ ‫האקדמית‬ ‫המכללה‬ ,‫לידור‬ ‫רוני‬
Predicting team ranking in basketball: The questionable use of on-court performance"
statistics"
‫ומה‬ ‫עילית‬ ‫בספורטאי‬ ‫עשרה‬ ‫בקרב‬ ‫ההישגים‬ ‫מבנה‬ ‫"מה‬ ‫בוינגייט‬ ‫האקדמית‬ ‫המכללה‬ ,‫סירא‬ ‫בן‬ ‫דוד‬
"?‫משמעויותיו‬
‫לכיתות‬ ‫ומעבר‬ ‫קפה‬ ‫הפסקת‬ 15:20-15:10
‫נתרמים‬ ‫מקבילים‬ ‫מושבים‬ 16:30-15:20
Methods and Applications I :V ‫מושב‬
‫גולדברג‬ ‫יאיר‬ :‫יו"ר‬
"Shewhart Revisited" ‫העברית‬ ‫האוניברסיטה‬ ,‫פולק‬ ‫משה‬
"Memoryless Representation of Markov Processes" ‫תל-אביב‬ ‫אוניברסיטת‬ ,‫פיינסקי‬ ‫עמיחי‬
"A new beta distribution framework for respiratory ‫בישראל‬ ‫ביולוגי‬ ‫למחקר‬ ‫המכון‬ ,‫קלויזנר‬ ‫זיו‬
protection based stochastic formulation"
Methods and Applications II : VI ‫מושב‬
‫פרמט‬ ‫ישראל‬ :‫יו"ר‬
"Scanning an image" ‫העברית‬ ‫האוניברסיטה‬ ,‫יקיר‬ ‫בני‬
"Confidence Interval for Test Error of Support Vector ‫העברית‬ ‫האוניברסיטה‬ ,‫יפה-נוף‬ ‫יונתן‬
Machine Classifiers"
"Spectral Decomposition of Gaussian Processes and its ‫תל-אביב‬ ‫אוניברסיטת‬ ,‫הררי‬ ‫אופיר‬
Application to Minimum IMSPE Designs"
‫מארגנת‬ ,"‫ציבוריות‬ ‫מערכות‬ ‫על‬ ‫נתונים‬ ‫של‬ ‫פומבי‬ ‫בפרסום‬ ‫אתגרים‬ - ?‫יער‬ ‫וגם‬ ‫עצים‬ ‫"גם‬ :‫פאנל‬ 17:30-16:30
,‫שמואלי‬ ‫ועמיר‬ ‫החינוך‬ ‫משרד‬ ,‫ראמ"ה‬ ,‫בלר‬ ‫מיכל‬ :‫משתתפים‬ ,‫החינוך‬ ‫משרד‬ ,‫ראמ"ה‬ ,‫גליקמן‬ ‫חגית‬ :‫ומנחה‬
.‫העברית‬ ‫האוניברסיטה‬
3 ‫אולם‬
3 ‫אולם‬
2 ‫אולם‬
3 ‫אולם‬

I ‫מושב‬
‫מליאה‬ ‫הרצאת‬
‫תקצירים‬:
1..‫טכניון‬ ,‫חשמל‬ ‫להנדסת‬ ‫הפקולטה‬ ,‫אדלר‬ ‫רוברט‬.‫מליאה‬ ‫הרצאת‬
Abstract
Over the last few years a small but rapidly growing group of dedicated mathematicians has been
developing an innovative new approach to data called "TDA - Topological Data Analysis". Parts of this
project are not totally new. For example, the brain imaging community has long been using random
fields and topology for quite some time, leading to the notion of `topological inference'. However, what is
completely new is the mathematical sophistication of the techniques now being applied to areas as
widespread as data mining, dimension reduction, and manifold learning, all topics familiar to
statisticians, as well as to areas classically outside of Statistics.
In the lecture I shall describe some of the new ideas that have arisen in TDA, and discuss the
challenges they raise for Statistics. The lecture will be non-technical, and based on case studies and
examples rather than rigorous results.
The main aim of the lecture will be to convince statisticians that there exists an entirely new way of
handling data that, on the one hand, has a lot to offer current statistical thinking, and, on the other hand,
has an inherent need for internal strengthening from the addition of statistical tools. Hopefully this will
motivate some listeners to join in solving the exciting challenges that topological inference is already
providing to Statistics.
2..‫טכניון‬ ,‫חזן‬ ‫אלעד‬‫מושב‬I.
"Time Series Analysis - An Online Learning Approach"
Abstract
We present a new approach for time series analysis, with focus on the ARMA )autoregressive moving
average( model. Using regret minimization techniques, we develop effective online learning algorithms
for the prediction problem, without assuming that the noise terms are Gaussian, identically distributed or
even independent. Effectiveness of the methods is demonstrated on simulated and real-world data.
Joint work with Oren Anava, Shie Mannor and Ohad Shamir.
3..‫וייצמן‬ ‫מכון‬ ,‫נדלר‬ ‫בועז‬‫מושב‬I.
"On Estimation of Sparse Principal Components in High Dimensions"
Abstract
In many contemporary applications there is a need to analyze high dimensional data with relatively few
samples )the large p small n setting(. One common method for this task is principal component analysis,
routinely used for linear dimensionality reduction, denoising, visualization and more. In the large p small
n setting, however, standard PCA may provide poor approximations to the population PC's. In this talk
I'll review some methods and corresponding theory for estimating the leading population principal
components assuming those are approximately sparse. We'll present both minimax rates of sparse
eigenvector estimation as well as methods that achieve them, in the case of approximate L_q sparsity
with q>0. As we will see, the case of exact L_0 sparsity presents some interesting challenges.
4.‫תל‬ ‫אוניברסיטת‬ ,‫רוסט‬ ‫סהרון‬-.‫אביב‬‫מושב‬I.
"Isotonic Modeling: Methodology and Applications"
Abstract
In isotonic modeling, non parametric predictive models ˆˆ ( )y f x are fitted to data, requiring only that
ˆ( )f x is isotonic, i.e., monotone in all explanatory variables. The monotonicity assumption on the
underlying data generation process is appropriate in many applications, for example in modeling gene-
gene interactions in genetics. However, isotonic modeling has enjoyed limited interest as a tool for
modern data modeling due to a combination of statistical )over fitting( difficulties and computational
difficulties. I will first describe our Isotonic Recursive Partitioning )IRP( algorithm, which overcomes both
difficulties in fitting isotonic regression )i.e., isotonic modeling with squared loss( to large data. IRP
recursively partitions the covariate space to an increasing number of regions and at every iteration fits
the best isotonic model to the current partition. At each iteration a linear program is solved, and the
whole algorithm can be practically applied to datasets with tens of thousands of observations.
Surprisingly, this greedy algorithm provably converges to the global isotonic regression solution, and we
view the recursive partitioning process as a regularization path which allows over fitting control.
As time permits, I will discuss further methodological topics. First, generalization of IRP to non-squared
loss situations, like Poisson regression, or using robust Huber's loss. Second, development of other
practically useful and theoretically sound regularization approaches for isotonic modeling. In this
context, we propose to use the range of model predictions as a regularization functional. This problem
can be formulated as a lasso problem in the very high dimensional basis of upper-sets in the covariate
space, and can be solved very efficiently using some properties I will describe.
‫תקצירים‬:
1..‫טכניון‬ ,‫חשמל‬ ‫להנדסת‬ ‫הפקולטה‬ ,‫אדלר‬ ‫רוברט‬.‫מליאה‬ ‫הרצאת‬
Abstract
Over the last few years a small but rapidly growing group of dedicated mathematicians has been
developing an innovative new approach to data called "TDA - Topological Data Analysis". Parts of this
project are not totally new. For example, the brain imaging community has long been using random
fields and topology for quite some time, leading to the notion of `topological inference'. However, what is
completely new is the mathematical sophistication of the techniques now being applied to areas as
widespread as data mining, dimension reduction, and manifold learning, all topics familiar to
statisticians, as well as to areas classically outside of Statistics.
In the lecture I shall describe some of the new ideas that have arisen in TDA, and discuss the
challenges they raise for Statistics. The lecture will be non-technical, and based on case studies and
examples rather than rigorous results.
The main aim of the lecture will be to convince statisticians that there exists an entirely new way of
handling data that, on the one hand, has a lot to offer current statistical thinking, and, on the other hand,
has an inherent need for internal strengthening from the addition of statistical tools. Hopefully this will
motivate some listeners to join in solving the exciting challenges that topological inference is already
providing to Statistics.
2..‫טכניון‬ ,‫חזן‬ ‫אלעד‬‫מושב‬I.
"Time Series Analysis - An Online Learning Approach"
Abstract
We present a new approach for time series analysis, with focus on the ARMA )autoregressive moving
average( model. Using regret minimization techniques, we develop effective online learning algorithms
for the prediction problem, without assuming that the noise terms are Gaussian, identically distributed or
even independent. Effectiveness of the methods is demonstrated on simulated and real-world data.
Joint work with Oren Anava, Shie Mannor and Ohad Shamir.
3..‫וייצמן‬ ‫מכון‬ ,‫נדלר‬ ‫בועז‬‫מושב‬I.
"On Estimation of Sparse Principal Components in High Dimensions"
Abstract
In many contemporary applications there is a need to analyze high dimensional data with relatively few
samples )the large p small n setting(. One common method for this task is principal component analysis,
routinely used for linear dimensionality reduction, denoising, visualization and more. In the large p small
n setting, however, standard PCA may provide poor approximations to the population PC's. In this talk
I'll review some methods and corresponding theory for estimating the leading population principal
components assuming those are approximately sparse. We'll present both minimax rates of sparse
eigenvector estimation as well as methods that achieve them, in the case of approximate L_q sparsity
with q>0. As we will see, the case of exact L_0 sparsity presents some interesting challenges.
4.‫תל‬ ‫אוניברסיטת‬ ,‫רוסט‬ ‫סהרון‬-.‫אביב‬‫מושב‬I.
"Isotonic Modeling: Methodology and Applications"
Abstract
In isotonic modeling, non parametric predictive models ˆˆ ( )y f x are fitted to data, requiring only that
ˆ( )f x is isotonic, i.e., monotone in all explanatory variables. The monotonicity assumption on the
underlying data generation process is appropriate in many applications, for example in modeling gene-
gene interactions in genetics. However, isotonic modeling has enjoyed limited interest as a tool for
modern data modeling due to a combination of statistical )over fitting( difficulties and computational
difficulties. I will first describe our Isotonic Recursive Partitioning )IRP( algorithm, which overcomes both
difficulties in fitting isotonic regression )i.e., isotonic modeling with squared loss( to large data. IRP
recursively partitions the covariate space to an increasing number of regions and at every iteration fits
the best isotonic model to the current partition. At each iteration a linear program is solved, and the
whole algorithm can be practically applied to datasets with tens of thousands of observations.
Surprisingly, this greedy algorithm provably converges to the global isotonic regression solution, and we
view the recursive partitioning process as a regularization path which allows over fitting control.
‫תקצירים‬

Finally, I will review some "modern'' applications of isotonic modeling, including isotonic stacking and
modeling of gene-gene interactions in human disease. This is joint work with Ronny Luss of IBM
Research.
5.‫מושב‬ .‫העברית‬ ‫האוניברסיטה‬ ,‫אלידן‬ ‫גל‬I.
"Lightning-speed Structure Learning of Non-Gaussian Networks"
Abstract:
Probabilistic graphical models build on a graph structure that encodes regularities in the domain to
reason about complex problems, and are widely used in varied fields ranging from computational
biology to machine vision to astronomy. Yet, learning the structure of the model from data remains a
formidable challenge, particularly in complex real-valued domains. We present a highly accelerated
structure learning algorithm that is based on a fusion between the frameworks of copulas and graphical
models and a novel theoretical insight. Specifically, for many copula families, we prove that the
expected likelihood of a building block edge in the model is monotonic in Spearman's rank correlation
measure. This allows us to perform structure learning while "magically" bypassing costly parameter
estimate as well as explicit computation of the log-likelihood function. We demonstrate the merit of our
approach for structure learning in varied real-life domains. Importantly, the computational benefits are
such that they open the door for practical scaling-up of structure learning in complex scenarios.
6.‫דביר‬‫קלפר‬,‫אליוט‬‫טורוול‬,‫תמר‬‫קנת‬-‫כהן‬,‫כרמל‬‫אורן‬,‫מרכז‬‫ארצי‬‫לבחינות‬‫מושב‬ .‫ולהערכה‬II.
"‫הוגנות‬‫מערכת‬‫המיון‬‫להשכלה‬‫הגבוהה‬‫כלפי‬‫מבקשי‬‫תנאים‬‫מותאמים‬‫בבחינה‬‫הפסיכומטרית‬(‫בשפה‬‫העברית‬")
‫תקציר‬
‫עבודה‬‫זו‬‫חוקרת‬‫את‬‫הוגנות‬‫הליך‬‫הקבלה‬‫להשכלה‬‫הגבוהה‬‫בישראל‬‫ביחס‬‫למועמדים‬‫המבקשים‬‫התאמות‬‫במבחן‬‫הכניסה‬
‫הפסיכומטרי‬‫לאוניברסיטאות‬(‫מכפ‬"‫ל‬.)‫מחקר‬‫זה‬‫מתעניין‬‫הן‬‫בנבחנים‬‫שקיבלו‬‫התאמות‬,‫והן‬‫בפונים‬‫שבקשתם‬‫לקבלת‬
‫תנאים‬‫מותאמים‬‫לא‬‫התקבלה‬,‫אם‬‫מסיבה‬‫טכנית‬(‫לא‬‫צרפו‬‫מסמכים‬)‫אם‬‫על‬‫בסיס‬‫שיפוט‬‫מקצועי‬.‫אוכלוסיית‬‫מבקשי‬
‫ההתאמות‬(‫בשנים‬0222-0222)‫חולקה‬‫לחמש‬‫תתי‬-‫קבוצות‬:‫שלוש‬‫קבוצות‬‫של‬‫מקבלי‬‫התאמות‬:‫בעלי‬‫לקות‬‫למידה‬
(N=958,)‫בעלי‬‫בעיות‬‫קשב‬‫וריכוז‬(N=187)‫ובעלי‬‫בעיות‬‫פיזיות‬(N=1,096,)‫ושתי‬‫קבוצות‬‫של‬‫מבקשי‬‫התאמות‬
‫שבקשתם‬‫לא‬‫התקבלה‬:‫חסרי‬‫נתונים‬(‫אי‬-‫הגשת‬‫כל‬‫המסמכים‬‫הדרושים‬‫בזמן‬‫ו‬/‫או‬‫בקשה‬‫שהוגשה‬‫מאוחר‬‫מדי‬,N=299)
‫ומסיבה‬‫מקצועית‬(‫אלה‬‫שבקשתם‬‫נבחנה‬‫על‬-‫ידי‬‫צוות‬‫מומחים‬‫ונדחתה‬,N=1,458.)‫שאלת‬‫ההוגנות‬‫נבדקה‬‫משני‬‫היבטים‬:
‫ההיבט‬‫של‬‫הטיה‬‫בברירה‬‫וההיבט‬‫של‬‫תוקף‬‫דיפרנציאלי‬.‫הטיה‬‫בברירה‬‫נבדקה‬‫בעזרת‬‫מודל‬‫הרגרסיה‬‫של‬‫קלירי‬(Cleary,
1968,)‫שלפיו‬‫הטיה‬‫מוגדרת‬‫כניבוי‬‫דיפרנציאלי‬.‫לצורך‬‫בדיקת‬‫התוקף‬‫הדיפרנציאלי‬‫נעשה‬‫שימוש‬‫במקדמי‬‫מתאם‬.‫אחד‬
‫הקשיים‬‫המרכזיים‬‫במחקרים‬‫העוסקים‬‫באנשים‬‫עם‬‫לקויות‬‫הוא‬‫הקושי‬‫לאתר‬‫קבוצות‬‫נבחנים‬‫שיהוו‬‫קבוצות‬‫ניתוח‬‫גדולות‬
‫מספיק‬.‫בנוסף‬,‫קיים‬‫קושי‬‫להגדיר‬‫את‬‫קבוצת‬‫ההשוואה‬‫המתאימה‬‫ביותר‬,‫שכן‬‫בעלי‬‫לקויות‬‫הלמידה‬‫אינם‬‫מהווים‬‫מדגם‬
‫מייצג‬‫מקרב‬‫כלל‬‫הנבחנים‬.‫כדי‬‫להקטין‬‫את‬‫הסיכון‬‫לתוצ‬‫אה‬‫מוטית‬‫שנובעת‬‫מתופעה‬‫זו‬,‫בחרנו‬‫להשתמש‬‫בציוני‬‫נטייה‬
(propensity score)‫שבהם‬‫משתני‬‫הרקע‬‫מגדר‬‫וגיל‬‫נלקחים‬‫בחשבון‬‫בתהליך‬‫הדגימה‬‫של‬‫קבוצת‬‫ההשוואה‬(‫קבוצת‬
‫נבחנים‬"‫רגילים‬"‫שלא‬‫ביקשו‬‫התאמות‬,N=120,503,)‫כך‬‫שתהיה‬‫דומה‬‫ככל‬‫האפשר‬‫לקבוצות‬‫מבקשי‬‫התנאים‬.
‫התוצאות‬‫לא‬‫הראו‬‫כל‬‫הטיה‬‫בברירה‬‫ביחס‬‫לקבוצת‬‫בעלי‬‫לקות‬‫למידה‬,‫בעלי‬‫בעיות‬‫קשב‬‫וריכוז‬‫ואלה‬‫שבקשתם‬‫נדחתה‬
‫מסיבות‬‫מקצועיות‬,‫בעוד‬‫שנמצאה‬‫הטיה‬‫קלה‬‫לטובת‬‫קבוצת‬‫בעלי‬‫הבעיות‬‫הפיזיות‬‫ולרעת‬‫קבוצת‬‫חסרי‬‫הנתונים‬.‫השוואת‬
‫מקדמי‬‫התוקף‬‫בין‬‫הקבוצות‬‫מגלה‬‫שעבור‬‫מכפ‬"‫ל‬‫מקדמים‬‫אלה‬‫היו‬‫דומים‬‫מאוד‬‫בקרב‬‫הקבוצות‬‫השונות‬,‫בעוד‬‫שעבור‬
‫הבגרות‬‫וציון‬‫הסכם‬,‫התוקף‬‫עבור‬‫קבוצת‬‫הנבחנים‬‫שלא‬‫ביקשו‬‫התאמות‬‫גבוה‬‫במקצת‬‫משל‬‫אלה‬‫שביקשו‬‫התאמות‬.
7..‫לסטטיסטיקה‬ ‫המרכזית‬ ‫הלשכה‬ ,‫רומנוב‬ ‫דימיטרי‬ ,‫גובמן‬ ‫יורי‬‫מושב‬II.
"Estimation of Measurement Error in Categorical Income Survey Data"
Abstract
Estimation of income distribution is of great interest in economic and econometric research, where
survey income data is frequently used. However, survey data fails to provide full information about
individuals and households incomes, due to measurement error and non-response bias. Additional
challenge in the income distribution estimation arises when a survey provides a categorical data about
personal or/and households incomes. In most cases, the reasons to replace opened income questions
by categorical ones are reducing of respondent burden, increasing of response rates, simplification of
survey questioner and saving time. In the current study, measurement error in categorical person's
gross income from the 2008 Israeli Social Survey )ISS( was estimated and analyzed. "Cognitive"
explanation variables were introduced: how long does it take for respondent to answer the gross income
question, whether he/she corrected his/her answer afterwards and what was the difference between the
first and the final responses regarding gross income. These variables were obtained from the Audit Trail
log file that contains, among others, all the changes in questionnaire which were done during the
interview and the duration of each action. In this work, integrated database was created by linking
survey data with the Audit Trail records, Population Register and the Tax Authority database. The
analyses were carried out by fitting parametric distribution to administrative income data in order to
assess measurement error and applying econometric models. Significant negative effect of duration of
respondent's answer and a decision to correct his/her previous answer on the measurement error was
found. These influences differ between salaried employees and self-employees, where a "pool" analysis
of survey data may be misleading. Significant differences in the measurement error distribution were
found between respondents that tend to overestimate their gross income and those that tend to
underestimate it. Existence and intensity of the measurement error strongly and positively depend on
the level of income. Among salaried employees, negative effect of increasing in educational level and
II ‫מושב‬
Finally, I will review some "modern'' applications of isotonic modeling, including isotonic stacking and
modeling of gene-gene interactions in human disease. This is joint work with Ronny Luss of IBM
Research.
5.‫מושב‬ .‫העברית‬ ‫האוניברסיטה‬ ,‫אלידן‬ ‫גל‬I.
"Lightning-speed Structure Learning of Non-Gaussian Networks"
Abstract:
Probabilistic graphical models build on a graph structure that encodes regularities in the domain to
reason about complex problems, and are widely used in varied fields ranging from computational
biology to machine vision to astronomy. Yet, learning the structure of the model from data remains a
formidable challenge, particularly in complex real-valued domains. We present a highly accelerated
structure learning algorithm that is based on a fusion between the frameworks of copulas and graphical
models and a novel theoretical insight. Specifically, for many copula families, we prove that the
expected likelihood of a building block edge in the model is monotonic in Spearman's rank correlation
measure. This allows us to perform structure learning while "magically" bypassing costly parameter
estimate as well as explicit computation of the log-likelihood function. We demonstrate the merit of our
approach for structure learning in varied real-life domains. Importantly, the computational benefits are
such that they open the door for practical scaling-up of structure learning in complex scenarios.
6.‫דביר‬‫קלפר‬,‫אליוט‬‫טורוול‬,‫תמר‬‫קנת‬-‫כהן‬,‫כרמל‬‫אורן‬,‫מרכז‬‫ארצי‬‫לבחינות‬‫מושב‬ .‫ולהערכה‬II.
"‫הוגנות‬‫מערכת‬‫המיון‬‫להשכלה‬‫הגבוהה‬‫כלפי‬‫מבקשי‬‫תנאים‬‫מותאמים‬‫בבחינה‬‫הפסיכומטרית‬(‫בשפה‬‫העברית‬")
‫תקציר‬
‫עבודה‬‫זו‬‫חוקרת‬‫את‬‫הוגנות‬‫הליך‬‫הקבלה‬‫להשכלה‬‫הגבוהה‬‫בישראל‬‫ביחס‬‫למועמדים‬‫המבקשים‬‫התאמות‬‫במבחן‬‫הכניסה‬
‫הפסיכומטרי‬‫לאוניברסיטאות‬(‫מכפ‬"‫ל‬.)‫מחקר‬‫זה‬‫מתעניין‬‫הן‬‫בנבחנים‬‫שקיבלו‬‫התאמות‬,‫והן‬‫בפונים‬‫שבקשתם‬‫לקבלת‬
‫תנאים‬‫מותאמים‬‫לא‬‫התקבלה‬,‫אם‬‫מסיבה‬‫טכנית‬(‫לא‬‫צרפו‬‫מסמכים‬)‫אם‬‫על‬‫בסיס‬‫שיפוט‬‫מקצועי‬.‫אוכלוסיית‬‫מבקשי‬
‫ההתאמות‬(‫בשנים‬0222-0222)‫חולקה‬‫לחמש‬‫תתי‬-‫קבוצות‬:‫שלוש‬‫קבוצות‬‫של‬‫מקבלי‬‫התאמות‬:‫בעלי‬‫לקות‬‫למידה‬
(N=958,)‫בעלי‬‫בעיות‬‫קשב‬‫וריכוז‬(N=187)‫ובעלי‬‫בעיות‬‫פיזיות‬(N=1,096,)‫ושתי‬‫קבוצות‬‫של‬‫מבקשי‬‫התאמות‬
‫שבקשתם‬‫לא‬‫התקבלה‬:‫חסרי‬‫נתונים‬(‫אי‬-‫הגשת‬‫כל‬‫המסמכים‬‫הדרושים‬‫בזמן‬‫ו‬/‫או‬‫בקשה‬‫שהוגשה‬‫מאוחר‬‫מדי‬,N=299)
‫ומסיבה‬‫מקצועית‬(‫אלה‬‫שבקשתם‬‫נבחנה‬‫על‬-‫ידי‬‫צוות‬‫מומחים‬‫ונדחתה‬,N=1,458.)‫שאלת‬‫ההוגנות‬‫נבדקה‬‫משני‬‫היבטים‬:
‫ההיבט‬‫של‬‫הטיה‬‫בברירה‬‫וההיבט‬‫של‬‫תוקף‬‫דיפרנציאלי‬.‫הטיה‬‫בברירה‬‫נבדקה‬‫בעזרת‬‫מודל‬‫הרגרסיה‬‫של‬‫קלירי‬(Cleary,
1968,)‫שלפיו‬‫הטיה‬‫מוגדרת‬‫כניבוי‬‫דיפרנציאלי‬.‫לצורך‬‫בדיקת‬‫התוקף‬‫הדיפרנציאלי‬‫נעשה‬‫שימוש‬‫במקדמי‬‫מתאם‬.‫אחד‬
‫הקשיים‬‫המרכזיים‬‫במחקרים‬‫העוסקים‬‫באנשים‬‫עם‬‫לקויות‬‫הוא‬‫הקושי‬‫לאתר‬‫קבוצות‬‫נבחנים‬‫שיהוו‬‫קבוצות‬‫ניתוח‬‫גדולות‬
‫מספיק‬.‫בנוסף‬,‫קיים‬‫קושי‬‫להגדיר‬‫את‬‫קבוצת‬‫ההשוואה‬‫המתאימה‬‫ביותר‬,‫שכן‬‫בעלי‬‫לקויות‬‫הלמידה‬‫אינם‬‫מהווים‬‫מדגם‬
‫מייצג‬‫מקרב‬‫כלל‬‫הנבחנים‬.‫כדי‬‫להקטין‬‫את‬‫הסיכון‬‫לתוצ‬‫אה‬‫מוטית‬‫שנובעת‬‫מתופעה‬‫זו‬,‫בחרנו‬‫להשתמש‬‫בציוני‬‫נטייה‬
(propensity score)‫שבהם‬‫משתני‬‫הרקע‬‫מגדר‬‫וגיל‬‫נלקחים‬‫בחשבון‬‫בתהליך‬‫הדגימה‬‫של‬‫קבוצת‬‫ההשוואה‬(‫קבוצת‬
‫נבחנים‬"‫רגילים‬"‫שלא‬‫ביקשו‬‫התאמות‬,N=120,503,)‫כך‬‫שתהיה‬‫דומה‬‫ככל‬‫האפשר‬‫לקבוצות‬‫מבקשי‬‫התנאים‬.
‫התוצאות‬‫לא‬‫הראו‬‫כל‬‫הטיה‬‫בברירה‬‫ביחס‬‫לקבוצת‬‫בעלי‬‫לקות‬‫למידה‬,‫בעלי‬‫בעיות‬‫קשב‬‫וריכוז‬‫ואלה‬‫שבקשתם‬‫נדחתה‬
‫מסיבות‬‫מקצועיות‬,‫בעוד‬‫שנמצאה‬‫הטיה‬‫קלה‬‫לטובת‬‫קבוצת‬‫בעלי‬‫הבעיות‬‫הפיזיות‬‫ולרעת‬‫קבוצת‬‫חסרי‬‫הנתונים‬.‫השוואת‬
‫מקדמי‬‫התוקף‬‫בין‬‫הקבוצות‬‫מגלה‬‫שעבור‬‫מכפ‬"‫ל‬‫מקדמים‬‫אלה‬‫היו‬‫דומים‬‫מאוד‬‫בקרב‬‫הקבוצות‬‫השונות‬,‫בעוד‬‫שעבור‬
‫הבגרות‬‫וציון‬‫הסכם‬,‫התוקף‬‫עבור‬‫קבוצת‬‫הנבחנים‬‫שלא‬‫ביקשו‬‫התאמות‬‫גבוה‬‫במקצת‬‫משל‬‫אלה‬‫שביקשו‬‫התאמות‬.
7..‫לסטטיסטיקה‬ ‫המרכזית‬ ‫הלשכה‬ ,‫רומנוב‬ ‫דימיטרי‬ ,‫גובמן‬ ‫יורי‬‫מושב‬II.
"Estimation of Measurement Error in Categorical Income Survey Data"
Abstract
Estimation of income distribution is of great interest in economic and econometric research, where
survey income data is frequently used. However, survey data fails to provide full information about
individuals and households incomes, due to measurement error and non-response bias. Additional
challenge in the income distribution estimation arises when a survey provides a categorical data about
personal or/and households incomes. In most cases, the reasons to replace opened income questions
by categorical ones are reducing of respondent burden, increasing of response rates, simplification of
survey questioner and saving time. In the current study, measurement error in categorical person's
gross income from the 2008 Israeli Social Survey )ISS( was estimated and analyzed. "Cognitive"
explanation variables were introduced: how long does it take for respondent to answer the gross income
question, whether he/she corrected his/her answer afterwards and what was the difference between the
first and the final responses regarding gross income. These variables were obtained from the Audit Trail
log file that contains, among others, all the changes in questionnaire which were done during the
interview and the duration of each action. In this work, integrated database was created by linking
survey data with the Audit Trail records, Population Register and the Tax Authority database. The

job satisfaction on the measurement error intensity was detected. Analyzing the response process by
means of Audit Trail variables, we found that inserting a question about net income immediately after
inquiring about gross income, as a logical way to allow respondents to control their responses, caused
22% of the respondents to go back and check the accuracy of their responses to the gross income
question. Reversion to the gross income question during the interview and correction of the answer
reduced measurement errors among both employees and the self-employed but did so more among the
former than among the latter. Appropriate changes in the ISS questionnaire are proposed, which can
reduce measurement error in the obtained data.
8.‫מושב‬ ,"‫"מדגם‬ ,‫גבע‬ ‫מנו‬II.
"‫ומגבלות‬ ‫חסרונות‬ ,‫יתרונות‬ ,‫באינטרנט‬ ‫"סקרים‬
‫תקציר‬
‫כפלטפורמה‬ ‫באינטרנט‬ ‫השימוש‬‫במהלך‬ ‫וגבר‬ ‫הלך‬ ‫אינטרנטים‬ ‫פאנלים‬ ‫באמצעות‬ ‫ובפרט‬ ,‫בכלל‬ ‫סקרים‬ ‫לביצוע‬5‫השנים‬
‫ובמ‬ ‫בשיעור‬ ‫המערבי‬ ‫העולם‬ ‫עם‬ ‫קו‬ ‫מיישרת‬ ‫ישראל‬ .‫האחרונות‬.‫באינטרנט‬ ‫המבוצעים‬ ‫הסקרים‬ ‫גוון‬‫באינטרנט‬ ‫לסקרים‬
‫באמ‬ ‫באינטרנט‬ ‫סקרים‬ ‫של‬ ‫הבולטים‬ ‫היתרונות‬ ‫לצד‬ .‫טכניות‬ ‫חלקן‬ ,‫רבות‬ ‫מבחינות‬ ‫ברורים‬ ‫יתרונות‬,‫אינטרנטים‬ ‫פאנלים‬ ‫צעות‬
‫אוכלוסייה‬ ‫וקבוצות‬ ‫צעירים‬ ,‫כגון‬ ‫שונות‬ ‫אוכלוסייה‬ ‫קבוצות‬ ‫של‬ ‫יותר‬ ‫גבוהה‬ ‫ייצוג‬ ‫יכולת‬ ,‫גבוהים‬ ‫מענה‬ ‫שיעורי‬ ‫קבלת‬
.‫באינטרנט‬ ‫סקרים‬ ‫בביצוע‬ ‫העולות‬ ‫המרכזיות‬ ‫בשאלות‬ ‫תדון‬ ‫ההרצאה‬ .‫הטלפונים‬ ‫הסקרים‬ ‫לעומת‬ ‫שונות‬ ‫סלקטיביות‬
9..‫ישראל‬ ‫בנק‬ ,‫מנצורה‬ ‫אריאל‬‫מושב‬II.
"‫משתנים‬ ‫רב‬ ‫עיתיות‬ ‫בסדרות‬ ‫חריגים‬ ‫"זיהוי‬
‫תקציר‬
( .‫בישראל‬ ‫מט"ח‬ ‫שוק‬ ‫נתוני‬ ‫על‬ ‫שיטות‬ ‫שתי‬ ‫והשוואת‬ ‫יישום‬ ‫באמצעות‬ ‫חריגים‬ ‫בזיהוי‬ ‫תעסוק‬ ‫ההרצאה‬1‫של‬ ‫שיטה‬ )hady
.‫זמן‬ ‫פני‬ ‫על‬ ‫התלות‬ ‫את‬ ‫בחשבון‬ ‫לוקחת‬ ‫שלא‬Identifying multiple outliers in multivariate data, Hadi )1992(.(0)
‫של‬ ‫שיטה‬Outlier detection in multivariate time series by projection pursuit, Galeano, P., D. Pena, and
R.S. Tzay )2006(.
.‫אצלנו‬ ‫יועץ‬ ‫בהיותו‬ ‫ז"ל‬ ‫נתן‬ ‫גד‬ ‫המליץ‬ ‫עליה‬ ‫שיטה‬ ‫היא‬ ‫השנייה‬ ‫השיטה‬
11.,‫למתמטיקה‬ ‫המחלקה‬ ,‫בוברובסקי‬ ‫עומר‬Duke University.‫מושב‬III.
"The Topology of Noise"
Abstract
A simplicial complex is a collection of vertices, edges, triangles, and simplexes of higher dimension, and
one can think of it as a generalization of a graph. In a geometric complex, the presence of simplexes is
determined by geometric properties of the vertices. Thus, choosing vertices at random yields a random
topological space with many interesting features. We focus on the limiting behavior of the Betti numbers
of such complexes, as the number of vertices goes to infinity. We study different ways to construct a
geometric complex, each resulting in a completely different structure.
11. Primoz Skraba, Artificial Intelligence Laboratory, Josef Stefan Institute, Ljubljana. Moshav III.
"A Topological Model of Recurrence"
Abstract
Analyzing systems is often made much easier by a good choice of parametrization. This talk will focus
on a natural model for periodic or recurrent systems -- the circle. I will present topological pipeline for
finding such parameterizations from data. To illustrate the techniques, I will show how this pipeline can
recover an astonishing amount of information about a system from time series measurements.
Beginning with simple periodic systems, several applications will be covered including the synthesis and
analysis of gaits and other types of motion and in the analysis of chaotic systems.
12.,‫לביוסטטיסטיקה‬ ‫המחלקה‬ ,‫שוורצמן‬ ‫ארמין‬Harvard University.‫מושב‬III.
Abstract
A new lognormal family of distributions on the set of symmetric positive definite )PD( matrices is
introduced as a matrix-variate extension of the univariate lognormal family of distributions. This family
arises as the large sample limiting distribution via the central limit theorem of two types of geometric
averages of i.i.d. PD matrices: the log-Euclidean average and the canonical geometric average. These
averages correspond to two different geometries imposed on the set of PD matrices. The limiting
distributions of these averages are used to provide large-sample confidence regions for the
corresponding population means. The methods are illustrated on a voxelwise analysis of diffusion tensor
imaging data, helping resolve the choice of voxelwise average type for this form of PD matrix data.
13..‫חיפה‬ ‫אוניברסיטת‬ ,‫לנדסמן‬ ‫זינובי‬‫מושב‬III.
"Asymptotic Data Analysis on Manifolds"
Abstract
Given an m-dimensional compact submanifold M of Euclidean space Rs, the concept of mean location
of a distribution, as related to mean or expected vector, is generalized to more general Rs valued
underestimate it. Existence and intensity of the measurement error strongly and positively depend on
the level of income. Among salaried employees, negative effect of increasing in educational level and
III ‫מושב‬
job satisfaction on the measurement error intensity was detected. Analyzing the response process by
means of Audit Trail variables, we found that inserting a question about net income immediately after
inquiring about gross income, as a logical way to allow respondents to control their responses, caused
22% of the respondents to go back and check the accuracy of their responses to the gross income
question. Reversion to the gross income question during the interview and correction of the answer
reduced measurement errors among both employees and the self-employed but did so more among the
former than among the latter. Appropriate changes in the ISS questionnaire are proposed, which can
reduce measurement error in the obtained data.
8.‫מושב‬ ,"‫"מדגם‬ ,‫גבע‬ ‫מנו‬II.
"‫ומגבלות‬ ‫חסרונות‬ ,‫יתרונות‬ ,‫באינטרנט‬ ‫"סקרים‬
‫תקציר‬
‫כפלטפורמה‬ ‫באינטרנט‬ ‫השימוש‬‫במהלך‬ ‫וגבר‬ ‫הלך‬ ‫אינטרנטים‬ ‫פאנלים‬ ‫באמצעות‬ ‫ובפרט‬ ,‫בכלל‬ ‫סקרים‬ ‫לביצוע‬5‫השנים‬
‫ובמ‬ ‫בשיעור‬ ‫המערבי‬ ‫העולם‬ ‫עם‬ ‫קו‬ ‫מיישרת‬ ‫ישראל‬ .‫האחרונות‬.‫באינטרנט‬ ‫המבוצעים‬ ‫הסקרים‬ ‫גוון‬‫באינטרנט‬ ‫לסקרים‬
‫באמ‬ ‫באינטרנט‬ ‫סקרים‬ ‫של‬ ‫הבולטים‬ ‫היתרונות‬ ‫לצד‬ .‫טכניות‬ ‫חלקן‬ ,‫רבות‬ ‫מבחינות‬ ‫ברורים‬ ‫יתרונות‬,‫אינטרנטים‬ ‫פאנלים‬ ‫צעות‬
‫אוכלוסייה‬ ‫וקבוצות‬ ‫צעירים‬ ,‫כגון‬ ‫שונות‬ ‫אוכלוסייה‬ ‫קבוצות‬ ‫של‬ ‫יותר‬ ‫גבוהה‬ ‫ייצוג‬ ‫יכולת‬ ,‫גבוהים‬ ‫מענה‬ ‫שיעורי‬ ‫קבלת‬
.‫באינטרנט‬ ‫סקרים‬ ‫בביצוע‬ ‫העולות‬ ‫המרכזיות‬ ‫בשאלות‬ ‫תדון‬ ‫ההרצאה‬ .‫הטלפונים‬ ‫הסקרים‬ ‫לעומת‬ ‫שונות‬ ‫סלקטיביות‬
9..‫ישראל‬ ‫בנק‬ ,‫מנצורה‬ ‫אריאל‬‫מושב‬II.
"‫משתנים‬ ‫רב‬ ‫עיתיות‬ ‫בסדרות‬ ‫חריגים‬ ‫"זיהוי‬
‫תקציר‬
( .‫בישראל‬ ‫מט"ח‬ ‫שוק‬ ‫נתוני‬ ‫על‬ ‫שיטות‬ ‫שתי‬ ‫והשוואת‬ ‫יישום‬ ‫באמצעות‬ ‫חריגים‬ ‫בזיהוי‬ ‫תעסוק‬ ‫ההרצאה‬1‫של‬ ‫שיטה‬ )hady
.‫זמן‬ ‫פני‬ ‫על‬ ‫התלות‬ ‫את‬ ‫בחשבון‬ ‫לוקחת‬ ‫שלא‬Identifying multiple outliers in multivariate data, Hadi )1992(.(0)
‫של‬ ‫שיטה‬Outlier detection in multivariate time series by projection pursuit, Galeano, P., D. Pena, and
R.S. Tzay )2006(.
.‫אצלנו‬ ‫יועץ‬ ‫בהיותו‬ ‫ז"ל‬ ‫נתן‬ ‫גד‬ ‫המליץ‬ ‫עליה‬ ‫שיטה‬ ‫היא‬ ‫השנייה‬ ‫השיטה‬
11.,‫למתמטיקה‬ ‫המחלקה‬ ,‫בוברובסקי‬ ‫עומר‬Duke University.‫מושב‬III.
"The Topology of Noise"
Abstract
A simplicial complex is a collection of vertices, edges, triangles, and simplexes of higher dimension, and
one can think of it as a generalization of a graph. In a geometric complex, the presence of simplexes is
determined by geometric properties of the vertices. Thus, choosing vertices at random yields a random
topological space with many interesting features. We focus on the limiting behavior of the Betti numbers
of such complexes, as the number of vertices goes to infinity. We study different ways to construct a
geometric complex, each resulting in a completely different structure.
11. Primoz Skraba, Artificial Intelligence Laboratory, Josef Stefan Institute, Ljubljana. Moshav III.
"A Topological Model of Recurrence"
Abstract
Analyzing systems is often made much easier by a good choice of parametrization. This talk will focus
on a natural model for periodic or recurrent systems -- the circle. I will present topological pipeline for
finding such parameterizations from data. To illustrate the techniques, I will show how this pipeline can
recover an astonishing amount of information about a system from time series measurements.
Beginning with simple periodic systems, several applications will be covered including the synthesis and
analysis of gaits and other types of motion and in the analysis of chaotic systems.
12.,‫לביוסטטיסטיקה‬ ‫המחלקה‬ ,‫שוורצמן‬ ‫ארמין‬Harvard University.‫מושב‬III.
Abstract
A new lognormal family of distributions on the set of symmetric positive definite )PD( matrices is
introduced as a matrix-variate extension of the univariate lognormal family of distributions. This family
arises as the large sample limiting distribution via the central limit theorem of two types of geometric
averages of i.i.d. PD matrices: the log-Euclidean average and the canonical geometric average. These
averages correspond to two different geometries imposed on the set of PD matrices. The limiting

cutlocus. An application is given to the context of independent but not identically distributed samples, in
particular to a multisample setup. Joint work with Harrie Hendriks, Radboud University Nijmegen.
14.‫מושב‬ .‫טבע‬ ,‫לוי‬ ‫יוסי‬IV.
"‫סטטיסטיקת‬‫ספורט‬-‫סקירה‬‫היסטורית‬‫ומבט‬‫לעתיד‬"
‫תקציר‬
‫של‬ ‫הפופלריות‬ ‫לעליית‬ ‫במקביל‬ ‫שהתרחשה‬ ,‫הספורט‬ ‫של‬ ‫הסטטיסטיקה‬ ‫בהתפתחות‬ ‫עיקריות‬ ‫ציון‬ ‫נקודות‬ ‫תסקור‬ ‫זו‬ ‫הרצאה‬
‫ה‬ ‫המאה‬ ‫באמצע‬ ‫כבר‬ ‫החל‬ ‫מפורטים‬ ‫סטטיסטיים‬ ‫נתונים‬ ‫איסוף‬ :‫בארה"ב‬ ‫הבייסבול‬ ‫ספורט‬-11‫שנתמך‬ ‫מעמיק‬ ‫ומחקר‬ ,
‫הש‬ ‫שנות‬ ‫בסוף‬ ‫שיא‬ ‫לנקודת‬ ‫הגיע‬ ‫אלה‬ ‫בנתונים‬‫השיט‬ .‫ג'יימס‬ ‫ביל‬ ‫של‬ ‫בעבודתו‬ ,‫הקודמת‬ ‫המאה‬ ‫של‬ ‫בעים‬‫ו‬‫הסטטיסטיות‬ ‫ת‬
‫ה‬ ‫הבייסבול‬ ‫אנשי‬ ‫רוב‬ ‫ידי‬ ‫על‬ ‫נדחו‬ ‫שפותחו‬‫ו‬.‫כולם‬ ‫ידי‬ ‫על‬ ‫לא‬ ‫אך‬ ,‫ותיקים‬‫בשנת‬ "‫"מאניבול‬ ‫הספר‬ ‫פרסום‬0222‫לציבור‬ ‫חשף‬
‫ני‬ ‫שנלמדו‬ ‫הלקחים‬ .‫הספורט‬ ‫בשדה‬ ‫להצלחה‬ ‫המוביל‬ ‫ניהולי‬ ‫ככלי‬ ‫הסטטיסטיקה‬ ‫של‬ ‫יעילותה‬ ‫את‬ ‫הרחב‬‫גם‬ ‫ליישום‬ ‫תנים‬
‫נעשים‬ ,‫במקביל‬ .‫ספורטיביים‬ ‫לא‬ ,‫אחרים‬ ‫בתחומים‬‫ניסיונו‬‫ת‬.‫אחרים‬ ‫ספורט‬ ‫ענפי‬ ‫עבור‬ ‫יעילים‬ ‫סטטיסטיים‬ ‫כלים‬ ‫לבנות‬
15.,‫נבו‬ ‫דניאל‬‫מושב‬ .‫העברית‬ ‫האוניברסיטה‬ ,‫ריטוב‬ ‫יעקב‬IV.
"Around the goal: Examining the effect of the first goal on the second goal in soccer using survival
analysis methods"
Abstract
In this work we apply survival techniques to soccer data, treating a goal scoring as the event of interest.
It specifically concerns the relationship between the time of the first goal in the game and the time of the
second goal. In order to do so, the relevant survival analysis concepts are readjusted to fit the problem
and a Cox model is developed for the hazard function. Attributes such as time dependent covariates and
a frailty term are also being considered. We use also a reliable propensity score to summarize the pre-
game covariates. The conclusions derived from the results are that a first goal occurrence could either
expedite or impede the next goal scoring, depending on the time it was scored. Moreover, once a goal is
scored, another goal scoring become more and more likely as the game progresses. Furthermore, the
first goal effect is the same whether the goal was scored or conceded.
16.‫רוני‬‫לידור‬,‫זיו‬ ‫גל‬ ,‫ומיכל‬‫המכללה‬ .‫ארנון‬‫האקדמי‬‫ת‬‫בו‬.‫וינגייט‬‫מושב‬IV.
"Predicting team ranking in basketball: The questionable use of on-court performance statistics"
Abstract
Statistics on on-court performances )e.g. free-throw shots, 2-point shots, defensive and offensive
rebounds, and assists( of basketball players during actual games are typically used by basketball
coaches and sport journalists not only to assess game performance of individual players and the entire
team, but also to predict future success )i.e. the final rankings of the team(. The purpose of this
correlational study was to examine the relationships between 12 basketball on-court performance
variables and the final rankings of professional basketball teams, using information gathered from seven
consecutive seasons and controlling for multicollinearity. Data analyses revealed that )a( different on-
court performance statistics can predict team rankings at the end of a season; )b( on-court performance
statistics can be highly correlated with one another )e.g. 2-point shots and 3-point shots(; and )c(
condensing the correlated variables )e.g. all types of shots as one category( can lead to more stable
regressional models. It is proposed that basketball coaches limit the use of individual on-court statistics
for predicting the final rankings of their teams. The prediction process may be more reliable if on-court
performance variables are grouped into a large category of variables.
17.‫בן‬ ‫דוד‬-‫ארנון‬ ‫ומיכל‬ ‫סירא‬‫המ‬ ,‫כללה‬‫האקדמי‬‫ת‬‫בו‬‫מושב‬ .‫וינגייט‬IV.
"‫מה‬‫מבנה‬‫ההישגים‬‫בקרב‬‫עשרה‬‫בספורטאי‬‫עילית‬‫ומה‬‫משמעויותיו‬?"
‫תקציר‬
‫יכולת‬‫ספורטיבית‬‫מבוססת‬‫על‬‫פרופיל‬‫רב‬-‫ממדי‬‫של‬‫תכונות‬‫ויכולות‬‫כלליות‬(‫מבנה‬‫גופני‬,‫כשרים‬‫גופניים‬‫כלליים‬,‫יכולות‬
‫מוטוריות‬‫כלליות‬‫וכד‬)'‫ויכולות‬‫ספציפיות‬‫הקשורות‬‫למאפייני‬‫העיסוק‬‫הספורטיבי‬.‫להבחנות‬‫בין‬‫מאפיינים‬‫כלליים‬‫לספציפיים‬
‫משמעות‬‫באשר‬‫להבנת‬‫מבנה‬‫היכולות‬‫הנדרשות‬‫בענף‬‫הספורט‬,‫לתכנון‬‫האימונים‬‫בו‬‫ולמינונם‬.‫קרב‬‫עשרה‬‫לגברים‬‫הוא‬
‫דוגמה‬‫לענף‬‫ספורט‬‫המאפשר‬‫ניתוח‬‫שיטתי‬‫של‬‫המרכיבים‬‫הכלליים‬‫והספציפיים‬‫של‬‫ההישג‬‫בתחרות‬.‫זו‬‫מורכבת‬‫מעשרה‬
‫מקצועות‬‫באתלטיקה‬‫הקלה‬(4‫מקצועות‬‫ריצה‬,2‫מקצועות‬‫קפיצה‬‫ו‬2‫מקצועות‬‫זריקה‬)‫הנמדדים‬‫במדויק‬‫ושיחד‬‫מיועדים‬
‫לאבחן‬‫את‬‫יכולתו‬‫הרב‬-‫גונית‬‫של‬‫האתלט‬.‫שאלה‬‫מרכזית‬‫היא‬‫האם‬‫קיימת‬‫יכולת‬‫כללית‬‫אחת‬‫או‬‫יותר‬‫שמאפשרות‬‫להסביר‬
‫הצלחה‬‫בענף‬‫זה‬‫ומה‬‫משקלן‬?‫לחלופין‬,‫ראוי‬‫לבחון‬‫את‬‫מידת‬‫הייחודיות‬‫של‬‫כל‬‫אחד‬‫מהמקצועות‬‫ואת‬‫תרומתו‬‫להסבר‬
‫השונות‬‫בהישג‬‫הכללי‬.‫עבודות‬‫קודמות‬‫שעסקו‬‫בניתוח‬‫היכולות‬‫בקרב‬‫עשרה‬‫לקו‬‫במספר‬‫נקודות‬‫תורפה‬( .Cox, 2002;
Ertel, 2011; Karvonen & Niemi, 1953; Park & Zatsiorsky, 2011; Schomaker & Heumann, 2011; Wimmer
et al., 2011; Woolf et al., 2007; Zarnowski, 1989).‫העבודה‬‫הנוכחית‬‫מתבססת‬‫על‬‫מאגר‬‫הנתונים‬‫שב‬‫פרסומי‬
‫ההתאחדות‬‫הבינלאומית‬‫לאתלטיקה‬‫קלה‬(IAAF)‫בו‬‫מפורטים‬‫ההישגים‬‫הטובים‬‫ביותר‬‫באתלטיקה‬‫העולמית‬(0522
Abstract
Given an m-dimensional compact submanifold M of Euclidean space Rs, the concept of mean location
of a distribution, as related to mean or expected vector, is generalized to more general Rs valued
functionals, including median location which is derived from spatial median. The asymptotic statistical
inference for general functionals of distributions on such submanifolds is elaborated. Convergence
properties are studied in relation to the behavior of the underlying distributions with respect to the
IV ‫מושב‬
cutlocus. An application is given to the context of independent but not identically distributed samples, in
particular to a multisample setup. Joint work with Harrie Hendriks, Radboud University Nijmegen.
14.‫מושב‬ .‫טבע‬ ,‫לוי‬ ‫יוסי‬IV.
"‫סטטיסטיקת‬‫ספורט‬-‫סקירה‬‫היסטורית‬‫ומבט‬‫לעתיד‬"
‫תקציר‬
‫של‬ ‫הפופלריות‬ ‫לעליית‬ ‫במקביל‬ ‫שהתרחשה‬ ,‫הספורט‬ ‫של‬ ‫הסטטיסטיקה‬ ‫בהתפתחות‬ ‫עיקריות‬ ‫ציון‬ ‫נקודות‬ ‫תסקור‬ ‫זו‬ ‫הרצאה‬
‫ה‬ ‫המאה‬ ‫באמצע‬ ‫כבר‬ ‫החל‬ ‫מפורטים‬ ‫סטטיסטיים‬ ‫נתונים‬ ‫איסוף‬ :‫בארה"ב‬ ‫הבייסבול‬ ‫ספורט‬-11‫שנתמך‬ ‫מעמיק‬ ‫ומחקר‬ ,
‫הש‬ ‫שנות‬ ‫בסוף‬ ‫שיא‬ ‫לנקודת‬ ‫הגיע‬ ‫אלה‬ ‫בנתונים‬‫השיט‬ .‫ג'יימס‬ ‫ביל‬ ‫של‬ ‫בעבודתו‬ ,‫הקודמת‬ ‫המאה‬ ‫של‬ ‫בעים‬‫ו‬‫הסטטיסטיות‬ ‫ת‬
‫ה‬ ‫הבייסבול‬ ‫אנשי‬ ‫רוב‬ ‫ידי‬ ‫על‬ ‫נדחו‬ ‫שפותחו‬‫ו‬.‫כולם‬ ‫ידי‬ ‫על‬ ‫לא‬ ‫אך‬ ,‫ותיקים‬‫בשנת‬ "‫"מאניבול‬ ‫הספר‬ ‫פרסום‬0222‫לציבור‬ ‫חשף‬
‫ני‬ ‫שנלמדו‬ ‫הלקחים‬ .‫הספורט‬ ‫בשדה‬ ‫להצלחה‬ ‫המוביל‬ ‫ניהולי‬ ‫ככלי‬ ‫הסטטיסטיקה‬ ‫של‬ ‫יעילותה‬ ‫את‬ ‫הרחב‬‫גם‬ ‫ליישום‬ ‫תנים‬
‫נעשים‬ ,‫במקביל‬ .‫ספורטיביים‬ ‫לא‬ ,‫אחרים‬ ‫בתחומים‬‫ניסיונו‬‫ת‬.‫אחרים‬ ‫ספורט‬ ‫ענפי‬ ‫עבור‬ ‫יעילים‬ ‫סטטיסטיים‬ ‫כלים‬ ‫לבנות‬
15.,‫נבו‬ ‫דניאל‬‫מושב‬ .‫העברית‬ ‫האוניברסיטה‬ ,‫ריטוב‬ ‫יעקב‬IV.
"Around the goal: Examining the effect of the first goal on the second goal in soccer using survival
analysis methods"
Abstract
In this work we apply survival techniques to soccer data, treating a goal scoring as the event of interest.
It specifically concerns the relationship between the time of the first goal in the game and the time of the
second goal. In order to do so, the relevant survival analysis concepts are readjusted to fit the problem
and a Cox model is developed for the hazard function. Attributes such as time dependent covariates and
a frailty term are also being considered. We use also a reliable propensity score to summarize the pre-
game covariates. The conclusions derived from the results are that a first goal occurrence could either
expedite or impede the next goal scoring, depending on the time it was scored. Moreover, once a goal is
scored, another goal scoring become more and more likely as the game progresses. Furthermore, the
first goal effect is the same whether the goal was scored or conceded.
16.‫רוני‬‫לידור‬,‫זיו‬ ‫גל‬ ,‫ומיכל‬‫המכללה‬ .‫ארנון‬‫האקדמי‬‫ת‬‫בו‬.‫וינגייט‬‫מושב‬IV.
"Predicting team ranking in basketball: The questionable use of on-court performance statistics"
Abstract
Statistics on on-court performances )e.g. free-throw shots, 2-point shots, defensive and offensive
rebounds, and assists( of basketball players during actual games are typically used by basketball
coaches and sport journalists not only to assess game performance of individual players and the entire
team, but also to predict future success )i.e. the final rankings of the team(. The purpose of this
correlational study was to examine the relationships between 12 basketball on-court performance
variables and the final rankings of professional basketball teams, using information gathered from seven
consecutive seasons and controlling for multicollinearity. Data analyses revealed that )a( different on-
court performance statistics can predict team rankings at the end of a season; )b( on-court performance
statistics can be highly correlated with one another )e.g. 2-point shots and 3-point shots(; and )c(
condensing the correlated variables )e.g. all types of shots as one category( can lead to more stable
regressional models. It is proposed that basketball coaches limit the use of individual on-court statistics
for predicting the final rankings of their teams. The prediction process may be more reliable if on-court
performance variables are grouped into a large category of variables.
17.‫בן‬ ‫דוד‬-‫ארנון‬ ‫ומיכל‬ ‫סירא‬‫המ‬ ,‫כללה‬‫האקדמי‬‫ת‬‫בו‬‫מושב‬ .‫וינגייט‬IV.
"‫מה‬‫מבנה‬‫ההישגים‬‫בקרב‬‫עשרה‬‫בספורטאי‬‫עילית‬‫ומה‬‫משמעויותיו‬?"
‫תקציר‬
‫יכולת‬‫ספורטיבית‬‫מבוססת‬‫על‬‫פרופיל‬‫רב‬-‫ממדי‬‫של‬‫תכונות‬‫ויכולות‬‫כלליות‬(‫מבנה‬‫גופני‬,‫כשרים‬‫גופניים‬‫כלליים‬,‫יכולות‬
‫מוטוריות‬‫כלליות‬‫וכד‬)'‫ויכולות‬‫ספציפיות‬‫הקשורות‬‫למאפייני‬‫העיסוק‬‫הספורטיבי‬.‫להבחנות‬‫בין‬‫מאפיינים‬‫כלליים‬‫לספציפיים‬

Ertel, 2011; Karvonen & Niemi, 1953; Park & Zatsiorsky, 2011; Schomaker & Heumann, 2011; Wimmer
et al., 2011; Woolf et al., 2007; Zarnowski, 1989).‫העבודה‬‫הנוכחית‬‫מתבססת‬‫על‬‫מאגר‬‫הנתונים‬‫שב‬‫פרסומי‬
‫ההתאחדות‬‫הבינלאומית‬‫לאתלטיקה‬‫קלה‬(IAAF)‫בו‬‫מפורטים‬‫ההישגים‬‫הטובים‬‫ביותר‬‫באתלטיקה‬‫העולמית‬(0522
‫נקודות‬‫ומעלה‬.)‫נבחרו‬‫שלושה‬‫מדגמים‬‫משנות‬‫המשחקים‬‫האולימפיים‬0224,0222‫ו‬0210.‫בשלוש‬‫עונות‬‫אלה‬‫דווחו‬
225‫תוצאות‬‫בהן‬01‫תוצאות‬‫של‬‫אתלטים‬‫שקבעו‬‫את‬‫ההישג‬‫ביותר‬‫משנה‬‫אולימפית‬‫אחת‬.‫ניתוח‬‫גורמים‬‫של‬‫ההישגים‬
‫הפרטניים‬‫בכל‬‫אחת‬‫מהשנים‬(‫עם‬‫רוטציה‬‫בשיטת‬Varimax‫והגבלה‬‫של‬Eigenvalue>1.2)‫מצביע‬‫על‬‫שני‬‫גורמים‬
‫עיקריים‬‫בקרב‬‫עשרה‬"(‫מהירות‬‫ריצה‬"‫ו‬"‫יכולת‬‫זריקה‬)"‫המסבירים‬‫יחד‬41%-44%‫מהשונות‬‫הכוללת‬‫בתוצאות‬.‫גורם‬
‫שלישי‬‫נמצא‬‫נמוך‬‫יחסית‬‫ובלתי‬‫יציב‬‫וכולל‬‫בעקביות‬‫את‬‫הריצה‬‫הארוכה‬.‫מאחר‬‫שלא‬‫נמצאו‬‫הבדלים‬‫בממוצעים‬‫בין‬‫השנים‬
‫אוחדו‬‫המקרים‬‫לקובץ‬‫אחד‬‫ובו‬223‫מקרים‬(‫אתלטים‬‫עם‬‫יותר‬‫מתוצאה‬‫אחת‬‫מיוצגים‬‫על‬‫ידי‬‫התוצאה‬‫המיטבית‬‫שהשיגו‬.)
‫גם‬‫ניתוח‬‫גורמים‬‫זה‬‫מצביע‬‫בבירור‬‫על‬‫אותם‬‫שני‬‫גורמים‬‫עיקריים‬‫שנמצאו‬‫בניתוח‬‫השנים‬‫הבודדות‬(42%‫מהשונות‬.)‫על‬‫פי‬
‫רגרסיה‬‫של‬‫תוצאות‬‫שני‬‫הגורמים‬‫כמשתנים‬‫בלתי‬‫תלויים‬‫ניתן‬‫להסביר‬‫כ‬00%‫מהשונות‬‫בהישג‬‫הכולל‬‫בקרב‬‫עשרה‬
‫באמצעות‬‫שני‬‫גורמים‬‫אלה‬.‫הניתוח‬‫מצביע‬‫על‬‫מדד‬‫ייחודיות‬(1-h2)‫גבוהה‬‫יחסית‬‫של‬‫המקצועות‬‫קפיצה‬‫במוט‬(2.10,)
‫קפיצה‬‫לגובה‬(2.12,)‫ריצה‬‫ל‬1522‫מ‬( '2.25,)‫הטלת‬‫כידון‬(2.02)‫וקפיצה‬‫למרחק‬(2.34)‫יחסית‬‫למקצועות‬‫האחרים‬;
‫זריקת‬‫דיסקוס‬(2.21,)‫הדיפת‬‫כדור‬‫ברזל‬(2.24,)‫ריצת‬‫משוכות‬(2.25)‫ריצת‬122‫מ‬( '2.20)‫וריצת‬422‫מ‬( '2.22.)
‫לממצאים‬‫אלה‬‫משמעות‬‫מקצועית‬‫באשר‬‫למיון‬‫אתלטים‬‫והפנייתם‬‫לענף‬,‫ולתכנון‬‫האימונים‬‫של‬‫אתלטים‬‫ברמה‬‫גבוהה‬.
18..‫ארה"ב‬ .‫פנסילבניה‬ ‫אוניברסיטת‬ ,‫קריגר‬ .‫מ‬ ‫ואבא‬ ,‫העברית‬ ‫האוניברסיטה‬ ,‫פולק‬ ‫משה‬‫מושב‬V.
"Shewhart Revisited"
Abstract
The Shewhart control chart was first to monitor an ongoing process and raise an alarm when it appears
that the level has changed. The folklore has it that the Shewhart control chart is preferable to other
surveillance schemes for detecting large changes. We show that the Shewhart chart is optimal for the
criterion of maximizing the probability of detecting a change upon its occurrence subject to a bound on
the ARL to false alarm. It is remarkable that this optimality persists even when the change is of
moderate size. In the multivariate setting, applying the Shewhart procedure to each process separately
is suboptimal. We create a generalized Shewhart procedure that is optimal for the aforementioned
criterion. The results are illustrated in surveillance of level of procedures.
19.‫תל‬ ‫אוניברסיטת‬ ,‫רוסט‬ ‫סהרון‬ ,‫פיינסקי‬ ‫עמיחי‬-.‫אביב‬‫מושב‬V.
"Memoryless Representation of Markov Processes"
Abstract
Memoryless processes hold many theoretical and practical advantages. They are easier to describe,
analyze, store and encrypt. They can also be seen as the essence of a family of regression processes,
or as an innovation process triggering a dynamic system. The Gram-Schmidt procedure suggests a
linear sequential method of whitening )decorrelating( any stochastic process. Applied on a Gaussian
process, statistical independence )memoryless( is guaranteed. It is not clear however, how to
sequentially construct a memoryless process from a non-Gaussian process. In this paper we present a
non-linear sequential method to generate a memoryless process from any given Markov process under
varying objectives and constraints. We differentiate between lossless and lossy methods, closed form
and algorithmic solutions and discuss the properties and uniqueness of our suggested methods.
21..‫בישראל‬ ‫ביולוגי‬ ‫למחקר‬ ‫המכון‬ ,‫שימושית‬ ‫למתמטיקה‬ ‫המחלקה‬ ,‫פטל‬ ‫ואייל‬ ‫קלויזנר‬ ‫זיו‬‫מושב‬V.
"A new beta distribution framework for respiratory protection based stochastic formulation"
Abstract
The problem of modeling respiratory protection is well known and has been dealt with extensively in the
literature. Often the efficiency of respiratory protection is quantified in terms of penetration, defined as
the proportion of an ambient contaminant concentration that penetrates the respiratory protection
equipment. Typically, the penetration modeling framework in the literature is based on the assumption
that penetration measurements follow the lognormal distribution. However, the analysis in this study
leads to the conclusion that the lognormal assumption is not always valid making it less adequate for
analyzing respiratory protection measurements.
This study presents a formulation of the problem from first principles, leading to a stochastic differential
equation whose solution is the probability density functions of the beta distribution. The data of
respiratory protection experiments were reexamined and indeed the beta distribution was found to
provide a better fit the data than the lognormal. Our results suggest a new theoretical framework for
modeling respiratory protection.
21.‫העברית‬ ‫האוניברסיטה‬ ,‫יקיר‬ ‫בני‬‫מושב‬ .VI.
"Scanning an image"
Abstract
In this exercise we consider the problem of detecting possible realizations of a given image within a
larger image. An alignment between pixels in a fragment of the larger image and the pixels of a rotation
of the given one is conducted using the nearest-neighbor algorithm and the similarity between the two is
measured in terms of the sum of squared differences of pixel illumination levels. Our goal is to
investigate the probabilistic characteristics of this detection problem. We will employ a simple model of
statistical independence in illumination levels between pixels. With each location in the larger image and
V ‫מושב‬
‫רגרסיה‬‫של‬‫תוצאות‬‫שני‬‫הגורמים‬‫כמשתנים‬‫בלתי‬‫תלויים‬‫ניתן‬‫להסביר‬‫כ‬00%‫מהשונות‬‫בהישג‬‫הכולל‬‫בקרב‬‫עשרה‬
‫באמצעות‬‫שני‬‫גורמים‬‫אלה‬.‫הניתוח‬‫מצביע‬‫על‬‫מדד‬‫ייחודיות‬(1-h2)‫גבוהה‬‫יחסית‬‫של‬‫המקצועות‬‫קפיצה‬‫במוט‬(2.10,)
‫קפיצה‬‫לגובה‬(2.12,)‫ריצה‬‫ל‬1522‫מ‬( '2.25,)‫הטלת‬‫כידון‬(2.02)‫וקפיצה‬‫למרחק‬(2.34)‫יחסית‬‫למקצועות‬‫האחרים‬;
‫זריקת‬‫דיסקוס‬(2.21,)‫הדיפת‬‫כדור‬‫ברזל‬(2.24,)‫ריצת‬‫משוכות‬(2.25)‫ריצת‬122‫מ‬( '2.20)‫וריצת‬422‫מ‬( '2.22.)
‫לממצאים‬‫אלה‬‫משמעות‬‫מקצועית‬‫באשר‬‫למיון‬‫אתלטים‬‫והפנייתם‬‫לענף‬,‫ולתכנון‬‫האימונים‬‫של‬‫אתלטים‬‫ברמה‬‫גבוהה‬.
18..‫ארה"ב‬ .‫פנסילבניה‬ ‫אוניברסיטת‬ ,‫קריגר‬ .‫מ‬ ‫ואבא‬ ,‫העברית‬ ‫האוניברסיטה‬ ,‫פולק‬ ‫משה‬‫מושב‬V.
"Shewhart Revisited"
Abstract
The Shewhart control chart was first to monitor an ongoing process and raise an alarm when it appears
that the level has changed. The folklore has it that the Shewhart control chart is preferable to other
surveillance schemes for detecting large changes. We show that the Shewhart chart is optimal for the
criterion of maximizing the probability of detecting a change upon its occurrence subject to a bound on
the ARL to false alarm. It is remarkable that this optimality persists even when the change is of
moderate size. In the multivariate setting, applying the Shewhart procedure to each process separately
is suboptimal. We create a generalized Shewhart procedure that is optimal for the aforementioned
criterion. The results are illustrated in surveillance of level of procedures.
19.‫תל‬ ‫אוניברסיטת‬ ,‫רוסט‬ ‫סהרון‬ ,‫פיינסקי‬ ‫עמיחי‬-.‫אביב‬‫מושב‬V.
"Memoryless Representation of Markov Processes"
Abstract
Memoryless processes hold many theoretical and practical advantages. They are easier to describe,
analyze, store and encrypt. They can also be seen as the essence of a family of regression processes,
or as an innovation process triggering a dynamic system. The Gram-Schmidt procedure suggests a
linear sequential method of whitening )decorrelating( any stochastic process. Applied on a Gaussian
process, statistical independence )memoryless( is guaranteed. It is not clear however, how to
sequentially construct a memoryless process from a non-Gaussian process. In this paper we present a
non-linear sequential method to generate a memoryless process from any given Markov process under
varying objectives and constraints. We differentiate between lossless and lossy methods, closed form
and algorithmic solutions and discuss the properties and uniqueness of our suggested methods.
21..‫בישראל‬ ‫ביולוגי‬ ‫למחקר‬ ‫המכון‬ ,‫שימושית‬ ‫למתמטיקה‬ ‫המחלקה‬ ,‫פטל‬ ‫ואייל‬ ‫קלויזנר‬ ‫זיו‬‫מושב‬V.
"A new beta distribution framework for respiratory protection based stochastic formulation"
Abstract
The problem of modeling respiratory protection is well known and has been dealt with extensively in the
literature. Often the efficiency of respiratory protection is quantified in terms of penetration, defined as
the proportion of an ambient contaminant concentration that penetrates the respiratory protection
equipment. Typically, the penetration modeling framework in the literature is based on the assumption
that penetration measurements follow the lognormal distribution. However, the analysis in this study
leads to the conclusion that the lognormal assumption is not always valid making it less adequate for
analyzing respiratory protection measurements.
"Scanning an image"
Abstract

"Scanning an image"
Abstract
measured in terms of the sum of squared differences of pixel illumination levels. Our goal is to
investigate the probabilistic characteristics of this detection problem. We will employ a simple model of
statistical independence in illumination levels between pixels. With each location in the larger image and
each angle of rotation of the given image one may compute a correlation statistic that summarizes
similarity. These correlations form a random field. Distributions of extremes of this random field are a
basis for assessing significance of detection and controlling the false detection rate.
22.‫יפה‬ ‫יונתן‬-.‫חיפה‬ ‫אוניברסיטת‬ ,‫גולדברג‬ ‫יאיר‬ ,‫העברית‬ ‫האוניברסיטה‬ ,‫ריטוב‬ ‫יעקב‬ ,‫נוף‬‫מושב‬VI.
"Confidence Interval for Test Error of Support Vector Machine Classifiers"
Abstract
Support vector machine )SVM( classifiers are well-known nonparametric statistical classification
algorithms. The test error of these classifiers, which is the probability of misclassification, is the most
common measure of the classifiers’ success. Estimation of the test error from the data is known to be a
difficult problem. In this work, we prove consistency and asymptotic normality of the empirical test error
VI ‫מושב‬
under some assumptions on the loss function and the approximation space. We propose confidence
intervals for the test error that are based on both the normal approximation and the nonparametric
bootstrap and prove their consistency. In addition, we propose adaptive confidence intervals. Finally, we
present a simulation study that demonstrates the performance of the different proposed confidence
intervals.
23.‫שטי‬ ‫דוד‬ ,‫הררי‬ ‫אופיר‬‫תל‬ ‫אוניברסיטת‬ ,‫ינברג‬-‫מושב‬ .‫אביב‬VI.
"Spectral Decomposition of Gaussian Processes and its Application to Minimum IMSPE Designs"
Abstract
Gaussian processes provide a popular statistical modelling approach in various fields, including
spatial statistics and computer experiments. Strategic experimental design could prove crucial
when data are hard to collect. One popular design criterion is to minimize the integrated mean
squared prediction error. We use the Karhunen-Lo`eve decomposition to derive elegant expressions
for this criterion. The expansion naturally suggests an approximate criterion, which may be
advantageous in terms of computation. We implement an optimization procedure for the approximate
criterion, assess the error, extend it to more complex models and sampling schemes and tie
it to the Bayesian linear regression model.
24.‫ח‬‫ג‬,‫בלר‬ ‫מיכל‬ ,‫גליקמן‬ ‫ית‬;‫החינוך‬ ‫משרד‬ ,‫ראמ"ה‬‫שמואלי‬ ‫עמיר‬‫העברית‬ ‫האוניברסיטה‬ ,.
‫פאנל‬?‫יער‬ ‫וגם‬ ‫עצים‬ ‫"גם‬ :-"‫ציבוריות‬ ‫מערכות‬ ‫על‬ ‫נתונים‬ ‫של‬ ‫פומבי‬ ‫בפרסום‬ ‫אתגרים‬
‫תקציר‬
‫הפאנל‬‫יעסוק‬‫של‬ ‫בפרסום/הפצה/הנגשה‬ ‫הקשורים‬ ‫ובקשיים‬ ‫בדילמות‬,‫גדולה‬ ‫ציבורית‬ ‫מערכת‬ ‫על‬ ‫מורכבים‬ ‫נתונים‬‫ו‬‫בפרט‬
.‫הבריאות‬ ‫ומערכת‬ ‫החינוך‬ ‫מערכת‬‫יתמקד‬ ‫הדיון‬:‫הבאות‬ ‫בנקודות‬(1)‫של‬ "‫"ביצועים‬ ‫על‬ ‫נתונים‬ ‫הרחב‬ ‫לציבור‬ ‫להציג‬ ‫נכון‬ ‫איך‬
‫ב‬ ‫והן‬ ‫המערכת‬ ‫כלל‬ ‫ברמת‬ ‫הן‬ ,‫ציבוריות‬ ‫מערכות‬‫ספר‬ ‫ובתי‬ ‫מחוזות‬ ,‫למשל‬ .‫המערכת‬ ‫בתוך‬ ‫שונות‬ ‫יחידות‬ ‫של‬ ‫רמה‬,‫בחינוך‬
‫ח‬ ‫וקופות‬‫בבריא‬ ‫ומרפאות‬ ‫חולים‬ ‫בתי‬ ,‫ולים‬( .‫ות‬0)‫הרחב‬ ‫לציבור‬ ‫מורכבת‬ ‫תמונה‬ ‫להציג‬ ‫בכלל‬ ‫אפשר‬ ‫האם‬–‫תמונה‬
‫מלא‬ ‫דוח‬ ‫לקרוא‬ ‫הרחב‬ ‫מהציבור‬ ‫לצפות‬ ‫ניתן‬ ‫האם‬ ?‫אחד‬ ‫למספר‬ ‫צומצמו‬ ‫לא‬ ‫אשר‬ ‫אינדיקטורים‬ ‫של‬ ‫רב‬ ‫ממספר‬ ‫המתקבלת‬
?‫שטחיות‬ "‫ליגה‬ ‫מ"טבלאות‬ ‫נמנעים‬ ‫כיצד‬ ?‫אותם‬ ‫המעניין‬ ‫הספר‬ ‫בית‬ ‫על‬(2)‫אינדיקטור‬ ‫פרסום‬ ‫של‬ ‫בהשלכות‬ ‫דיון‬‫ברמות‬ ‫ים‬
,‫המדדים‬ ‫ערכי‬ ‫את‬ ‫לשפר‬ ‫כשרים‬ ‫לא‬ ‫ניסיונות‬ ,‫שנמדד‬ ‫במה‬ ‫והמאמצים‬ ‫המשאבים‬ ‫מיקוד‬ ,‫חשוב‬ ‫לא‬ ‫נמדד‬ ‫שלא‬ ‫מה‬ :‫השונות‬
‫של‬ ‫מעבר‬ ‫בשל‬ ‫ספר‬ ‫בתי‬ ‫בין‬ ‫פערים‬ ‫והרחבת‬ ‫התלמידים‬ ‫תיוג‬ ‫כמו‬ ‫נוספות‬ ‫בתופעות‬ ‫גם‬ ‫מדובר‬ ‫החינוך‬ ‫למערכת‬ ‫בנוגע‬ .‫ועוד‬
.‫חזקים‬ ‫תלמידים‬
under some assumptions on the loss function and the approximation space. We propose confidence
intervals for the test error that are based on both the normal approximation and the nonparametric
bootstrap and prove their consistency. In addition, we propose adaptive confidence intervals. Finally, we
present a simulation study that demonstrates the performance of the different proposed confidence
intervals.
23.‫שטי‬ ‫דוד‬ ,‫הררי‬ ‫אופיר‬‫תל‬ ‫אוניברסיטת‬ ,‫ינברג‬-‫מושב‬ .‫אביב‬VI.
"Spectral Decomposition of Gaussian Processes and its Application to Minimum IMSPE Designs"
Abstract
Gaussian processes provide a popular statistical modelling approach in various fields, including
spatial statistics and computer experiments. Strategic experimental design could prove crucial
when data are hard to collect. One popular design criterion is to minimize the integrated mean
squared prediction error. We use the Karhunen-Lo`eve decomposition to derive elegant expressions
for this criterion. The expansion naturally suggests an approximate criterion, which may be
advantageous in terms of computation. We implement an optimization procedure for the approximate
criterion, assess the error, extend it to more complex models and sampling schemes and tie
it to the Bayesian linear regression model.
24.‫ח‬‫ג‬,‫בלר‬ ‫מיכל‬ ,‫גליקמן‬ ‫ית‬;‫החינוך‬ ‫משרד‬ ,‫ראמ"ה‬‫שמואלי‬ ‫עמיר‬‫העברית‬ ‫האוניברסיטה‬ ,.
‫פאנל‬?‫יער‬ ‫וגם‬ ‫עצים‬ ‫"גם‬ :-"‫ציבוריות‬ ‫מערכות‬ ‫על‬ ‫נתונים‬ ‫של‬ ‫פומבי‬ ‫בפרסום‬ ‫אתגרים‬
‫תקציר‬
‫הפאנל‬‫יעסוק‬‫של‬ ‫בפרסום/הפצה/הנגשה‬ ‫הקשורים‬ ‫ובקשיים‬ ‫בדילמות‬,‫גדולה‬ ‫ציבורית‬ ‫מערכת‬ ‫על‬ ‫מורכבים‬ ‫נתונים‬‫ו‬‫בפרט‬
.‫הבריאות‬ ‫ומערכת‬ ‫החינוך‬ ‫מערכת‬‫יתמקד‬ ‫הדיון‬:‫הבאות‬ ‫בנקודות‬(1)‫של‬ "‫"ביצועים‬ ‫על‬ ‫נתונים‬ ‫הרחב‬ ‫לציבור‬ ‫להציג‬ ‫נכון‬ ‫איך‬
‫ב‬ ‫והן‬ ‫המערכת‬ ‫כלל‬ ‫ברמת‬ ‫הן‬ ,‫ציבוריות‬ ‫מערכות‬‫ספר‬ ‫ובתי‬ ‫מחוזות‬ ,‫למשל‬ .‫המערכת‬ ‫בתוך‬ ‫שונות‬ ‫יחידות‬ ‫של‬ ‫רמה‬,‫בחינוך‬
‫ח‬ ‫וקופות‬‫בבריא‬ ‫ומרפאות‬ ‫חולים‬ ‫בתי‬ ,‫ולים‬( .‫ות‬0)‫הרחב‬ ‫לציבור‬ ‫מורכבת‬ ‫תמונה‬ ‫להציג‬ ‫בכלל‬ ‫אפשר‬ ‫האם‬–‫תמונה‬
‫מלא‬ ‫דוח‬ ‫לקרוא‬ ‫הרחב‬ ‫מהציבור‬ ‫לצפות‬ ‫ניתן‬ ‫האם‬ ?‫אחד‬ ‫למספר‬ ‫צומצמו‬ ‫לא‬ ‫אשר‬ ‫אינדיקטורים‬ ‫של‬ ‫רב‬ ‫ממספר‬ ‫המתקבלת‬
?‫שטחיות‬ "‫ליגה‬ ‫מ"טבלאות‬ ‫נמנעים‬ ‫כיצד‬ ?‫אותם‬ ‫המעניין‬ ‫הספר‬ ‫בית‬ ‫על‬(2)‫אינדיקטור‬ ‫פרסום‬ ‫של‬ ‫בהשלכות‬ ‫דיון‬‫ברמות‬ ‫ים‬
,‫המדדים‬ ‫ערכי‬ ‫את‬ ‫לשפר‬ ‫כשרים‬ ‫לא‬ ‫ניסיונות‬ ,‫שנמדד‬ ‫במה‬ ‫והמאמצים‬ ‫המשאבים‬ ‫מיקוד‬ ,‫חשוב‬ ‫לא‬ ‫נמדד‬ ‫שלא‬ ‫מה‬ :‫השונות‬
‫של‬ ‫מעבר‬ ‫בשל‬ ‫ספר‬ ‫בתי‬ ‫בין‬ ‫פערים‬ ‫והרחבת‬ ‫התלמידים‬ ‫תיוג‬ ‫כמו‬ ‫נוספות‬ ‫בתופעות‬ ‫גם‬ ‫מדובר‬ ‫החינוך‬ ‫למערכת‬ ‫בנוגע‬ .‫ועוד‬
.‫חזקים‬ ‫תלמידים‬
‫פאנל‬

תכניית כנס האיגוד לסטטיסטיקה 2013

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (7)

Destacado

Destacado (20)

Similar a תכניית כנס האיגוד לסטטיסטיקה 2013

Similar a תכניית כנס האיגוד לסטטיסטיקה 2013 (20)

Más de Anochi.com.

Más de Anochi.com. (20)

Último

Último (20)

תכניית כנס האיגוד לסטטיסטיקה 2013