SlideShare a Scribd company logo
1 of 9
Download to read offline
Modelling the expected loss of bodily injury claims using
gradient boosting
The following is an extract from a report on modelling the expected loss of bodily
injury claims using gradient boosting.
Summary
➢ Modelling the expected loss:
o Frequency model (classification)*
o Severity model (regression)
➢ The expected loss is given by multiplying the frequency and severity under the
assumption that they are independent**.
➢ Not so fast:
o The frequency model needs to take time into account
o Calculate a hazard rate
➢ Emphasis is on predictive accuracy
* Classification was used due to the nature of the policy and data – there could only be a claim (1) or no claim (0)
over the period examined. It was therefore a case of modeling the probability of a claim at an individual policy
holder level over the time period.
**There are important technical considerations here which are beyond the scope of this initial project. Various
approaches can and need to be considered when modelling the aggregate expected loss, and these approaches
can impact the modelling of the frequency and severity models. Such as the use of scenario analysis which
requires specification of the functional form of the frequency and severity models. One suggested approach is to
use Poisson frequency and Gamma distributed severity for individual claims so that the expected loss follows a
Tweedie compound Poisson distribution. See (Yang, Qian, Zou, 2014).
by Gregg Barrett
Overview of the modelling effort for the bodily injury claims data
A critical challenge in insurance is setting the premium for the policyholder. In a competitive market
an insurer needs to accurately price the expected loss of the policyholder. Failing to do so places the
insurer at the risk of adverse selection. Adverse selection is where the insurer loses profitable policies
and retains loss incurring policies resulting in economic loss. In personal car insurance for example,
this could occur if the insurer charged the same premium for old and young drivers. If the expected
loss for old drivers was significantly lower than that of the young drivers, the old drivers can be
expected to switch to a competitor leaving the insurer with a portfolio of young drivers (whom are
under-priced) incurring an economic loss.
In this project we have undertaken an attempt to accurately predict the expected loss for the
policyholder concerning a bodily injury claim. In doing so it is necessary to break down the process
into two distinct components: claim frequency and claim severity. For convenience and simplicity, we
have chosen to model the frequency and severity separately.
Other inputs into the premium setting (rating process) such as administrative costs, loadings, cost of
capital etc. have been omitted as we are only concerned with modelling the expected loss.
In modelling the claim frequency, a classification model will be used to model the probability of a
bodily injury claim given set of features that cover mostly vehicle characteristics. The actual claim
frequency for the dataset used in this project is around 1%.
In modelling the claim severity, a regression model will be used to model the expected claim amount
again using a set of features that cover mostly vehicle characteristics.
To ensure that the estimated performance of the model, as measured on the test sample, is an
accurate approximation of the expected performance on future ‘‘unseen’’ cases, the inception date
of policies in the test set is posterior to the policies used to train the model.
That dataset which covers the period from 2005 through 2007 was therefore split into three groups:
2005 through 2006 – Training set
2005 through 2006 – Validation set
2007 – Test set
An adjustment to the output of the claim frequency model is necessary in order to derive a probability
on an annual basis. This is due to the claim frequency being calculated over a period of two years
(2005 through 2006). For this project we assumed an exponential hazard function and adjusted the
claims frequency as follows:
P(t) = 1 - exp(-λ, T)
where:
P(t) = the annual probability of a claim
T = 1/2
λ = the probability of a claim predicted by the claim frequency model
In this project model validation is measured by the degree of predictive accuracy and this objective is
emphasized over producing interpretable models. The lack of interpretability in most algorithmic
models, appears to be a reason that their application to insurance pricing problems has been very
limited so far. (Guelman, 2011).
In modelling the claim frequency, a ROC (Receiver Operator Characteristics) curve will be used to
assess model performance measuring the AUC (Area Under the Curve). In modelling the claim severity,
the RMSE (Root Mean Squared Error) will be used to assess model performance.
The test data was not used for model selection purposes, but purely to assess the generalization error
of the final chosen model. Assessing this error is broken down into three components:
1) Assessing the performance of the classification model on the test data using the AUC score.
2) Assessing the performance of the regression model on the test data using the RMSE.
3) Assessing the performance in predicting the expected loss by comparing the predicted
expected loss for the 2007 portfolio of policyholders against the realised loss for the 2007
portfolio of policyholders.
Gradient Boosting, often referred to as simply “boosting”, was selected as the modelling approach for
this project. Boosting is a general approach that can be applied to many statistical learning methods
for regression or classification. Boosting is a supervised, non-parametric machine learning approach.
(Geurts, Irrthum, Wehenkel, 2009). Supervised learning refers to the subset of machine learning
methods which derive models in the form of input-output relationships. More precisely, the goal of
supervised learning is to identify a mapping from some input variables to some output variables on
the sole basis of a given sample of joint observations of the values of these variables. Non-parametric
means that we do not make explicit assumptions about the functional form of f. Where the intent is
to find a function 𝑓̂such that Y ≈ 𝑓̂(X) for any observation (X, Y).
With boosting methods optimisation is held out in the function space. That is, we parameterise the
function estimate 𝑓̂in the additive functional form:
In this representation:
𝑓̂0 is the initial guess
M is the number of iterations
(𝑓̂𝑖)𝑖=1
𝑀
are the function increments, also referred to as the “boosts”
It is useful to distinguish between the parameterisation of the “base-learner” function and the overall
ensemble function estimate 𝑓̂(𝑥) known as the “loss function”.
Boosted models can be implemented with different base-learner functions. Common base-learner
functions include; linear models, smooth models, decision trees, and custom base-learner functions.
Several classes of base-learner models can be implemented in one boosted model. This means that
the same functional formula can include both smooth additive components and the decision trees
components at the same time. (Natekin, Knoll, 2013).
Loss functions can be classified according to the type of outcome variable, Y. For regression problems
Gaussian (minimizing squared error), Laplace (minimizing absolute error), and Huber are considerations,
while Bernoulli or Adaboost are consideration for classification. There are also loss functions for
survival models and count data.
This flexibility makes the boosting highly customizable to any particular data-driven task. It introduces
a lot of freedom into the model design thus making the choice of the most appropriate loss function
a matter of trial and error. (Natekin, Knoll, 2013).
To provide an intuitive explanation we will use an example of boosting in the context of decision trees
as was used in this project. Unlike fitting a single large decision tree to the training data, which
amounts to fitting the data hard and potentially overfitting, the boosting approach instead learns
slowly. Given an initial model (decision tree), we fit a decision tree (the base-learner) to the residuals
from the initial model. That is, we fit a tree using the current residuals, rather than the outcome Y. We
then add this new decision tree into the fitted function in order to update the residuals. The process
is conducted sequentially so that at each particular iteration, a new weak, base-learner model is
trained with respect to the error of the whole ensemble learnt so far.
With such an approach the model structure is thus learned from data and not predetermined, thereby
avoiding an explicit model specification, and incorporating complex and higher order interactions to
reduce the potential modelling bias. (Yang, Qian, Zou, 2014).
The first choice in building the model involves selecting an appropriate loss function. Squared-error
loss was selected to define prediction error for the severity model and Bernoulli deviance was selected
for the frequency model.
There are three tuning parameters that need to be set:
- Shrinkage (the learning rate)
- Number of trees (the number of iterations)
- Depth (interaction depth)
The shrinkage parameter sets the learning rate of the base-learner models. In general, statistical
learning approaches that learn slowly tend to perform well. In boosting the construction of each tree
depends on the trees that have already been grown. Typical values are 0.01 or 0.001, and the right
choice can depend on the problem. (Ridgeway, 2012). It is important to know that smaller values of
shrinkage (almost) always give improved predictive performance. However, there are computational
costs, both storage and CPU time, associated with setting shrinkage to be low. The model with
shrinkage=0.001 will likely require ten times as many trees as the model with shrinkage=0.01,
increasing storage and computation time by a factor of 10.
It is generally the case that for small shrinkage parameters, 0.001 for example, there is a fairly long
plateau in which predictive performance is at its best. A recommended rule of thumb is to set
shrinkage as small as possible while still being able to fit the model in a reasonable amount of time
and storage. (Ridgeway, 2012).
Boosting can overfit if the number of trees is too large, although this overfitting tends to occur slowly
if at all. (James, Witten, Hastie, Tibshirani, 2013). Cross-validation and information criterion can be
used to select the number of trees. Again it is worth stressing that the optimal number of trees and
the shrinkage (learning rate) depend on each other, although slower learning rates do not necessarily
scale the number of optimal trees. That is, when shrinkage = 0.1 and the optimal number of tress =
100, does not necessarily imply that when shrinkage = 0.01 the optimal number of trees = 1000.
(Ridgeway, 2012).
Depth sets the number of splits in each tree, which controls the complexity of the boosted ensemble.
When depth = 1 each tree is a stump, consisting of a single split. In this case, the boosted ensemble is
fitting an additive model, since each term involves only a single variable. More generally depth is the
interaction depth, and controls the interaction order of the boosted model, since d splits can involve
at most d variables. (James, Witten, Hastie, Tibshirani, 2013).
A strength of tree based methods is that single depth tress are readily understandable and
interpretable. In addition, decision trees have the ability to select or rank the attributes according to
their relevance for predicting the output, a feature that is shared with almost no other non-parametric
methods. (Geurts, Irrthum, Wehenkel, 2009).
From the point of view of their statistical properties, tree-based methods are non-parametric universal
approximators, meaning that, with sufficient complexity, a tree can represent any continuous function
with an arbitrary high precision. When used with numerical attributes, they are invariant with respect
to monotone transformations of the input attributes. (Geurts, Irrthum, Wehenkel, 2009).
Importantly boosted decision trees require very little data pre-processing, which can easily be one of
the most time consuming activities in a project of this nature. As boosted decision trees handle the
predictor and response variables of any type without the need for transformation, and are insensitive
to outliers and missing values, it is natural choice not only for this project but for insurance in general
where there are frequently a large number of categorical and numerical predictors, non-linearities
and complex interactions, as well as missing values that all need to be modelled.
Lastly, the techniques used in this project can be applied independently of the limitations imposed by
any specific legislation.
Potential Improvements
Below are several suggestions for improving the initial model.
Specification
A careful specification of the loss function leads to the estimation of any desired characteristic of the
conditional distribution of the response. This coupled with the large number of base learners
guarantees a rich set of models that can be addressed by boosting. (Hofner, Mayr, Robinzonovz,
Schmid, 2014)
AUC loss function for the classification model
For the classification model AUC can be tested as a loss function to optimize the area under the ROC
curve.
Huber loss function for the regression model
The Huber loss function can be used as a robust alternative to the L2 (least squares error) loss.
Where:
𝜌 is the loss function
δ is the parameter that limits the outliers which are subject to absolute error loss
Quantile loss function for the regression model:
Another alternative for settings with continuous response is modeling conditional quantiles through
quantile regression (Koenker 2005). The main advantage of quantile regression is (beyond its
robustness towards outliers) that it does not rely on any distributional assumptions on the response
or the error terms. (Hofner, Mayr, Robinzonovz, Schmid, 2014)
Laplace loss function for the regression model:
The Laplace loss function is the function of choice if we are interested in the median of the conditional
distribution. It implements a distribution free, median regression approach especially useful for long-
tailed error distributions.
The loss function allows flexible specification of the link between the response and the covariates. The figure on
the left hand side illustrates the L2 loss, the figure on the right hand side illustrates the L1 (least absolute
deviation) loss function.
All of the above listed loss functions can be implemented within the “mboost” package. The table
below provides an overview of some of the currently available loss functions within the mboost
package.
An overview on the currently implemented families in mboost.
Optimal number of iterations using AIC:
To maximise predictive power and to prevent overfitting it is important that the optimal stopping
iteration is carefully chosen. Various possibilities to determine the stopping iteration exist. AIC was
considered however this is usually not recommended as AIC-based stopping tends to overshoot the
optimal stopping dramatically. (Hofner, Mayr, Robinzonovz, Schmid, 2014)
Package
Package xgboost
The package “xgboost” was also tested during this project. It’s benefit of over the gbm and mboost
package is that it is purportedly faster. It should be noted that xgboost requires the data to be in the
form of numeric vectors and thus necessitates some additional data preparation. It was also found to
be a little more challenging to implement as opposed to the gbm and mboost packages.
Reference
Geurts, P., Irrthum, A., Wehenkel, L. (2009). Supervised learning with decision tree-based methods in
computational and systems biology. [pdf]. Retrieved from
http://www.montefiore.ulg.ac.be/~geurts/Papers/geurts09-molecularbiosystems.pdf
Guelman, L. (2011). Gradient boosting trees for auto insurance loss cost modeling and prediction.
[pdf]. Retrieved from http://www.sciencedirect.com/science/article/pii/S0957417411013674
Hofner, B., Mayr, A., Robinzonovz, N., Schmid, M. (2014). Model-based boosting in r: a hands-on
tutorial using the r package mboost. [pdf]. Retrieved from https://cran.r-
project.org/web/packages/mboost/vignettes/mboost_tutorial.pdf
Hofner, B., Mayr, A., Robinzonovz, N., Schmid, M. (2014). An overview on the currently implemented families
in mboost. [table]. Retrieved from Hofner, B., Mayr, A., Robinzonovz, N., Schmid, M. (2014). Model-
based boosting in r: a hands-on tutorial using the r package mboost. [pdf]. Retrieved from
https://cran.r-project.org/web/packages/mboost/vignettes/mboost_tutorial.pdf
James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). Introduction to statistical learning with applications in R.
[ebook]. Retrieved from http://www-bcf.usc.edu/~gareth/ISL/getbook.html
Natekin, A., Knoll, A. (2013). Gradient boosting machines tutorial. [pdf]. Retrieved from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3885826/
Ridgeway, G. (2012). Generalized boosted models: a guide to the gbm package. [pdf]. Retrieved
from https://cran.r-project.org/web/packages/gbm/gbm.pdf
Yang, Y., Qian, W., Zou, H. (2014). A boosted nonparametric tweedie model for insurance premium.
[pdf]. Retrieved from https://people.rit.edu/wxqsma/papers/paper4

More Related Content

What's hot

operation research notes
operation research notesoperation research notes
operation research notesRenu Thakur
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and rPhilip Ramsey
 
Operations Research - Models
Operations Research - ModelsOperations Research - Models
Operations Research - ModelsSundar B N
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Sunny Mervyne Baa
 
A lognormal reliability design model
A lognormal reliability design modelA lognormal reliability design model
A lognormal reliability design modeleSAT Journals
 
A robust multi criteria optimization approach
A robust multi criteria optimization approachA robust multi criteria optimization approach
A robust multi criteria optimization approachPhuong Dx
 
Data minning gaspar_2010
Data minning gaspar_2010Data minning gaspar_2010
Data minning gaspar_2010Armando Vieira
 
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
 
Selection of Equipment by Using Saw and Vikor Methods
Selection of Equipment by Using Saw and Vikor Methods Selection of Equipment by Using Saw and Vikor Methods
Selection of Equipment by Using Saw and Vikor Methods IJERA Editor
 
Use of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodUse of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodPhilip Ramsey
 
Assignment oprations research luv
Assignment oprations research luvAssignment oprations research luv
Assignment oprations research luvAshok Sharma
 
Guidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability PredictionGuidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability Predictionijsrd.com
 
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYPROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYIJITCA Journal
 
Operation research ppt chapter one
Operation research ppt   chapter oneOperation research ppt   chapter one
Operation research ppt chapter onemitku assefa
 
internship project1 report
internship project1 reportinternship project1 report
internship project1 reportsheyk98
 
Churn Modeling For Mobile Telecommunications
Churn Modeling For Mobile TelecommunicationsChurn Modeling For Mobile Telecommunications
Churn Modeling For Mobile TelecommunicationsSalford Systems
 
Operation research techniques
Operation research techniquesOperation research techniques
Operation research techniquesRodixon94
 

What's hot (19)

operation research notes
operation research notesoperation research notes
operation research notes
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and r
 
Operations Research - Models
Operations Research - ModelsOperations Research - Models
Operations Research - Models
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
 
A lognormal reliability design model
A lognormal reliability design modelA lognormal reliability design model
A lognormal reliability design model
 
A robust multi criteria optimization approach
A robust multi criteria optimization approachA robust multi criteria optimization approach
A robust multi criteria optimization approach
 
Data minning gaspar_2010
Data minning gaspar_2010Data minning gaspar_2010
Data minning gaspar_2010
 
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
 
Df24693697
Df24693697Df24693697
Df24693697
 
Selection of Equipment by Using Saw and Vikor Methods
Selection of Equipment by Using Saw and Vikor Methods Selection of Equipment by Using Saw and Vikor Methods
Selection of Equipment by Using Saw and Vikor Methods
 
Use of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodUse of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical Method
 
Assignment oprations research luv
Assignment oprations research luvAssignment oprations research luv
Assignment oprations research luv
 
Guidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability PredictionGuidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability Prediction
 
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYPROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
 
Operation research ppt chapter one
Operation research ppt   chapter oneOperation research ppt   chapter one
Operation research ppt chapter one
 
internship project1 report
internship project1 reportinternship project1 report
internship project1 report
 
Churn Modeling For Mobile Telecommunications
Churn Modeling For Mobile TelecommunicationsChurn Modeling For Mobile Telecommunications
Churn Modeling For Mobile Telecommunications
 
Operation research techniques
Operation research techniquesOperation research techniques
Operation research techniques
 

Similar to Modelling the expected loss of bodily injury claims using gradient boosting

Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxcarlstromcurtis
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...eSAT Publishing House
 
Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...eSAT Journals
 
Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...butest
 
Sarcia idoese08
Sarcia idoese08Sarcia idoese08
Sarcia idoese08asarcia
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritizationijsrd.com
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...ijseajournal
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction SystemIRJET Journal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONaciijournal
 
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...ijccmsjournal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxnagarajan740445
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptxDhanuDhanu49
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysissunilgv06
 
Jacobs Kiefer Bayes Guide 3 10 V1
Jacobs Kiefer Bayes Guide 3 10 V1Jacobs Kiefer Bayes Guide 3 10 V1
Jacobs Kiefer Bayes Guide 3 10 V1Michael Jacobs, Jr.
 

Similar to Modelling the expected loss of bodily injury claims using gradient boosting (20)

Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docx
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...
 
Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...
 
Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...
 
Sarcia idoese08
Sarcia idoese08Sarcia idoese08
Sarcia idoese08
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
 
Manuscript dss
Manuscript dssManuscript dss
Manuscript dss
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
Credit scoring i financial sector
Credit scoring i financial  sector Credit scoring i financial  sector
Credit scoring i financial sector
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptx
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Jacobs Kiefer Bayes Guide 3 10 V1
Jacobs Kiefer Bayes Guide 3 10 V1Jacobs Kiefer Bayes Guide 3 10 V1
Jacobs Kiefer Bayes Guide 3 10 V1
 

More from Gregg Barrett

Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018Gregg Barrett
 
Cirrus: Africa's AI initiative
Cirrus: Africa's AI initiativeCirrus: Africa's AI initiative
Cirrus: Africa's AI initiativeGregg Barrett
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: InsuranceGregg Barrett
 
Road and Track Vehicle - Project Document
Road and Track Vehicle - Project DocumentRoad and Track Vehicle - Project Document
Road and Track Vehicle - Project DocumentGregg Barrett
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Gregg Barrett
 
Revenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla MotorsRevenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla MotorsGregg Barrett
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introductionGregg Barrett
 
Social networking brings power
Social networking brings powerSocial networking brings power
Social networking brings powerGregg Barrett
 
Procurement can be exciting
Procurement can be excitingProcurement can be exciting
Procurement can be excitingGregg Barrett
 
Machine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing BeerMachine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing BeerGregg Barrett
 
A note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managersA note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managersGregg Barrett
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Gregg Barrett
 
Efficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in REfficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in RGregg Barrett
 
Variable selection for classification and regression using R
Variable selection for classification and regression using RVariable selection for classification and regression using R
Variable selection for classification and regression using RGregg Barrett
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using RGregg Barrett
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R ServicesGregg Barrett
 
Insurance metrics overview
Insurance metrics overviewInsurance metrics overview
Insurance metrics overviewGregg Barrett
 
Review of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at IntermountainReview of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at IntermountainGregg Barrett
 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahoutGregg Barrett
 

More from Gregg Barrett (20)

Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018
 
Cirrus: Africa's AI initiative
Cirrus: Africa's AI initiativeCirrus: Africa's AI initiative
Cirrus: Africa's AI initiative
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
 
Road and Track Vehicle - Project Document
Road and Track Vehicle - Project DocumentRoad and Track Vehicle - Project Document
Road and Track Vehicle - Project Document
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?
 
Revenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla MotorsRevenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla Motors
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introduction
 
Social networking brings power
Social networking brings powerSocial networking brings power
Social networking brings power
 
Procurement can be exciting
Procurement can be excitingProcurement can be exciting
Procurement can be exciting
 
Machine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing BeerMachine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing Beer
 
A note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managersA note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managers
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
 
Efficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in REfficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in R
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
Variable selection for classification and regression using R
Variable selection for classification and regression using RVariable selection for classification and regression using R
Variable selection for classification and regression using R
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using R
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
Insurance metrics overview
Insurance metrics overviewInsurance metrics overview
Insurance metrics overview
 
Review of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at IntermountainReview of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at Intermountain
 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahout
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

Modelling the expected loss of bodily injury claims using gradient boosting

  • 1. Modelling the expected loss of bodily injury claims using gradient boosting
  • 2. The following is an extract from a report on modelling the expected loss of bodily injury claims using gradient boosting. Summary ➢ Modelling the expected loss: o Frequency model (classification)* o Severity model (regression) ➢ The expected loss is given by multiplying the frequency and severity under the assumption that they are independent**. ➢ Not so fast: o The frequency model needs to take time into account o Calculate a hazard rate ➢ Emphasis is on predictive accuracy * Classification was used due to the nature of the policy and data – there could only be a claim (1) or no claim (0) over the period examined. It was therefore a case of modeling the probability of a claim at an individual policy holder level over the time period. **There are important technical considerations here which are beyond the scope of this initial project. Various approaches can and need to be considered when modelling the aggregate expected loss, and these approaches can impact the modelling of the frequency and severity models. Such as the use of scenario analysis which requires specification of the functional form of the frequency and severity models. One suggested approach is to use Poisson frequency and Gamma distributed severity for individual claims so that the expected loss follows a Tweedie compound Poisson distribution. See (Yang, Qian, Zou, 2014). by Gregg Barrett
  • 3. Overview of the modelling effort for the bodily injury claims data A critical challenge in insurance is setting the premium for the policyholder. In a competitive market an insurer needs to accurately price the expected loss of the policyholder. Failing to do so places the insurer at the risk of adverse selection. Adverse selection is where the insurer loses profitable policies and retains loss incurring policies resulting in economic loss. In personal car insurance for example, this could occur if the insurer charged the same premium for old and young drivers. If the expected loss for old drivers was significantly lower than that of the young drivers, the old drivers can be expected to switch to a competitor leaving the insurer with a portfolio of young drivers (whom are under-priced) incurring an economic loss. In this project we have undertaken an attempt to accurately predict the expected loss for the policyholder concerning a bodily injury claim. In doing so it is necessary to break down the process into two distinct components: claim frequency and claim severity. For convenience and simplicity, we have chosen to model the frequency and severity separately. Other inputs into the premium setting (rating process) such as administrative costs, loadings, cost of capital etc. have been omitted as we are only concerned with modelling the expected loss. In modelling the claim frequency, a classification model will be used to model the probability of a bodily injury claim given set of features that cover mostly vehicle characteristics. The actual claim frequency for the dataset used in this project is around 1%. In modelling the claim severity, a regression model will be used to model the expected claim amount again using a set of features that cover mostly vehicle characteristics. To ensure that the estimated performance of the model, as measured on the test sample, is an accurate approximation of the expected performance on future ‘‘unseen’’ cases, the inception date of policies in the test set is posterior to the policies used to train the model. That dataset which covers the period from 2005 through 2007 was therefore split into three groups: 2005 through 2006 – Training set 2005 through 2006 – Validation set 2007 – Test set An adjustment to the output of the claim frequency model is necessary in order to derive a probability on an annual basis. This is due to the claim frequency being calculated over a period of two years (2005 through 2006). For this project we assumed an exponential hazard function and adjusted the claims frequency as follows: P(t) = 1 - exp(-λ, T) where: P(t) = the annual probability of a claim T = 1/2 λ = the probability of a claim predicted by the claim frequency model In this project model validation is measured by the degree of predictive accuracy and this objective is emphasized over producing interpretable models. The lack of interpretability in most algorithmic
  • 4. models, appears to be a reason that their application to insurance pricing problems has been very limited so far. (Guelman, 2011). In modelling the claim frequency, a ROC (Receiver Operator Characteristics) curve will be used to assess model performance measuring the AUC (Area Under the Curve). In modelling the claim severity, the RMSE (Root Mean Squared Error) will be used to assess model performance. The test data was not used for model selection purposes, but purely to assess the generalization error of the final chosen model. Assessing this error is broken down into three components: 1) Assessing the performance of the classification model on the test data using the AUC score. 2) Assessing the performance of the regression model on the test data using the RMSE. 3) Assessing the performance in predicting the expected loss by comparing the predicted expected loss for the 2007 portfolio of policyholders against the realised loss for the 2007 portfolio of policyholders. Gradient Boosting, often referred to as simply “boosting”, was selected as the modelling approach for this project. Boosting is a general approach that can be applied to many statistical learning methods for regression or classification. Boosting is a supervised, non-parametric machine learning approach. (Geurts, Irrthum, Wehenkel, 2009). Supervised learning refers to the subset of machine learning methods which derive models in the form of input-output relationships. More precisely, the goal of supervised learning is to identify a mapping from some input variables to some output variables on the sole basis of a given sample of joint observations of the values of these variables. Non-parametric means that we do not make explicit assumptions about the functional form of f. Where the intent is to find a function 𝑓̂such that Y ≈ 𝑓̂(X) for any observation (X, Y). With boosting methods optimisation is held out in the function space. That is, we parameterise the function estimate 𝑓̂in the additive functional form: In this representation: 𝑓̂0 is the initial guess M is the number of iterations (𝑓̂𝑖)𝑖=1 𝑀 are the function increments, also referred to as the “boosts” It is useful to distinguish between the parameterisation of the “base-learner” function and the overall ensemble function estimate 𝑓̂(𝑥) known as the “loss function”. Boosted models can be implemented with different base-learner functions. Common base-learner functions include; linear models, smooth models, decision trees, and custom base-learner functions. Several classes of base-learner models can be implemented in one boosted model. This means that the same functional formula can include both smooth additive components and the decision trees components at the same time. (Natekin, Knoll, 2013). Loss functions can be classified according to the type of outcome variable, Y. For regression problems Gaussian (minimizing squared error), Laplace (minimizing absolute error), and Huber are considerations,
  • 5. while Bernoulli or Adaboost are consideration for classification. There are also loss functions for survival models and count data. This flexibility makes the boosting highly customizable to any particular data-driven task. It introduces a lot of freedom into the model design thus making the choice of the most appropriate loss function a matter of trial and error. (Natekin, Knoll, 2013). To provide an intuitive explanation we will use an example of boosting in the context of decision trees as was used in this project. Unlike fitting a single large decision tree to the training data, which amounts to fitting the data hard and potentially overfitting, the boosting approach instead learns slowly. Given an initial model (decision tree), we fit a decision tree (the base-learner) to the residuals from the initial model. That is, we fit a tree using the current residuals, rather than the outcome Y. We then add this new decision tree into the fitted function in order to update the residuals. The process is conducted sequentially so that at each particular iteration, a new weak, base-learner model is trained with respect to the error of the whole ensemble learnt so far. With such an approach the model structure is thus learned from data and not predetermined, thereby avoiding an explicit model specification, and incorporating complex and higher order interactions to reduce the potential modelling bias. (Yang, Qian, Zou, 2014). The first choice in building the model involves selecting an appropriate loss function. Squared-error loss was selected to define prediction error for the severity model and Bernoulli deviance was selected for the frequency model. There are three tuning parameters that need to be set: - Shrinkage (the learning rate) - Number of trees (the number of iterations) - Depth (interaction depth) The shrinkage parameter sets the learning rate of the base-learner models. In general, statistical learning approaches that learn slowly tend to perform well. In boosting the construction of each tree depends on the trees that have already been grown. Typical values are 0.01 or 0.001, and the right choice can depend on the problem. (Ridgeway, 2012). It is important to know that smaller values of shrinkage (almost) always give improved predictive performance. However, there are computational costs, both storage and CPU time, associated with setting shrinkage to be low. The model with shrinkage=0.001 will likely require ten times as many trees as the model with shrinkage=0.01, increasing storage and computation time by a factor of 10. It is generally the case that for small shrinkage parameters, 0.001 for example, there is a fairly long plateau in which predictive performance is at its best. A recommended rule of thumb is to set shrinkage as small as possible while still being able to fit the model in a reasonable amount of time and storage. (Ridgeway, 2012). Boosting can overfit if the number of trees is too large, although this overfitting tends to occur slowly if at all. (James, Witten, Hastie, Tibshirani, 2013). Cross-validation and information criterion can be used to select the number of trees. Again it is worth stressing that the optimal number of trees and the shrinkage (learning rate) depend on each other, although slower learning rates do not necessarily scale the number of optimal trees. That is, when shrinkage = 0.1 and the optimal number of tress = 100, does not necessarily imply that when shrinkage = 0.01 the optimal number of trees = 1000. (Ridgeway, 2012).
  • 6. Depth sets the number of splits in each tree, which controls the complexity of the boosted ensemble. When depth = 1 each tree is a stump, consisting of a single split. In this case, the boosted ensemble is fitting an additive model, since each term involves only a single variable. More generally depth is the interaction depth, and controls the interaction order of the boosted model, since d splits can involve at most d variables. (James, Witten, Hastie, Tibshirani, 2013). A strength of tree based methods is that single depth tress are readily understandable and interpretable. In addition, decision trees have the ability to select or rank the attributes according to their relevance for predicting the output, a feature that is shared with almost no other non-parametric methods. (Geurts, Irrthum, Wehenkel, 2009). From the point of view of their statistical properties, tree-based methods are non-parametric universal approximators, meaning that, with sufficient complexity, a tree can represent any continuous function with an arbitrary high precision. When used with numerical attributes, they are invariant with respect to monotone transformations of the input attributes. (Geurts, Irrthum, Wehenkel, 2009). Importantly boosted decision trees require very little data pre-processing, which can easily be one of the most time consuming activities in a project of this nature. As boosted decision trees handle the predictor and response variables of any type without the need for transformation, and are insensitive to outliers and missing values, it is natural choice not only for this project but for insurance in general where there are frequently a large number of categorical and numerical predictors, non-linearities and complex interactions, as well as missing values that all need to be modelled. Lastly, the techniques used in this project can be applied independently of the limitations imposed by any specific legislation. Potential Improvements Below are several suggestions for improving the initial model. Specification A careful specification of the loss function leads to the estimation of any desired characteristic of the conditional distribution of the response. This coupled with the large number of base learners guarantees a rich set of models that can be addressed by boosting. (Hofner, Mayr, Robinzonovz, Schmid, 2014) AUC loss function for the classification model For the classification model AUC can be tested as a loss function to optimize the area under the ROC curve. Huber loss function for the regression model The Huber loss function can be used as a robust alternative to the L2 (least squares error) loss.
  • 7. Where: 𝜌 is the loss function δ is the parameter that limits the outliers which are subject to absolute error loss Quantile loss function for the regression model: Another alternative for settings with continuous response is modeling conditional quantiles through quantile regression (Koenker 2005). The main advantage of quantile regression is (beyond its robustness towards outliers) that it does not rely on any distributional assumptions on the response or the error terms. (Hofner, Mayr, Robinzonovz, Schmid, 2014) Laplace loss function for the regression model: The Laplace loss function is the function of choice if we are interested in the median of the conditional distribution. It implements a distribution free, median regression approach especially useful for long- tailed error distributions. The loss function allows flexible specification of the link between the response and the covariates. The figure on the left hand side illustrates the L2 loss, the figure on the right hand side illustrates the L1 (least absolute deviation) loss function. All of the above listed loss functions can be implemented within the “mboost” package. The table below provides an overview of some of the currently available loss functions within the mboost package.
  • 8. An overview on the currently implemented families in mboost. Optimal number of iterations using AIC: To maximise predictive power and to prevent overfitting it is important that the optimal stopping iteration is carefully chosen. Various possibilities to determine the stopping iteration exist. AIC was considered however this is usually not recommended as AIC-based stopping tends to overshoot the optimal stopping dramatically. (Hofner, Mayr, Robinzonovz, Schmid, 2014) Package Package xgboost The package “xgboost” was also tested during this project. It’s benefit of over the gbm and mboost package is that it is purportedly faster. It should be noted that xgboost requires the data to be in the form of numeric vectors and thus necessitates some additional data preparation. It was also found to be a little more challenging to implement as opposed to the gbm and mboost packages.
  • 9. Reference Geurts, P., Irrthum, A., Wehenkel, L. (2009). Supervised learning with decision tree-based methods in computational and systems biology. [pdf]. Retrieved from http://www.montefiore.ulg.ac.be/~geurts/Papers/geurts09-molecularbiosystems.pdf Guelman, L. (2011). Gradient boosting trees for auto insurance loss cost modeling and prediction. [pdf]. Retrieved from http://www.sciencedirect.com/science/article/pii/S0957417411013674 Hofner, B., Mayr, A., Robinzonovz, N., Schmid, M. (2014). Model-based boosting in r: a hands-on tutorial using the r package mboost. [pdf]. Retrieved from https://cran.r- project.org/web/packages/mboost/vignettes/mboost_tutorial.pdf Hofner, B., Mayr, A., Robinzonovz, N., Schmid, M. (2014). An overview on the currently implemented families in mboost. [table]. Retrieved from Hofner, B., Mayr, A., Robinzonovz, N., Schmid, M. (2014). Model- based boosting in r: a hands-on tutorial using the r package mboost. [pdf]. Retrieved from https://cran.r-project.org/web/packages/mboost/vignettes/mboost_tutorial.pdf James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). Introduction to statistical learning with applications in R. [ebook]. Retrieved from http://www-bcf.usc.edu/~gareth/ISL/getbook.html Natekin, A., Knoll, A. (2013). Gradient boosting machines tutorial. [pdf]. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3885826/ Ridgeway, G. (2012). Generalized boosted models: a guide to the gbm package. [pdf]. Retrieved from https://cran.r-project.org/web/packages/gbm/gbm.pdf Yang, Y., Qian, W., Zou, H. (2014). A boosted nonparametric tweedie model for insurance premium. [pdf]. Retrieved from https://people.rit.edu/wxqsma/papers/paper4