Learn more about solving supervised learning problems using BigML. This tutorial uses a loan dataset to explain the sunburst view and how to deal with unbalanced datasets.
2. BigML Education Program 2Models 2
In This Video
• A new supervised learning problem: Predicting peer-to-
peer loan defaults
• Exploring models using the sunburst view
• Objective field balancing and instance weighting
• Leveraging missing data
3. BigML Education Program 3Models 2
Unbalanced Datasets
• The problem: Unbalanced data
• One of the classes in the dataset is far more
rare than the other(s)
• This rare class is particularly important to classify
accurately
• Examples:
• Medical diagnosis: Disease is typically rare
compared with health
• Fraud detection: Fraud is typically rare
compared to legitimate activity
• Predictive maintenance: Systems typically
work far more often than they fail
4. BigML Education Program 4Models 2
Missing Data
• Data could be missing from your dataset for many
different reasons
• Human error, as in web forms
• Deliberate choice, as in medical tests
• Random corruption, as with database errors
• Data missing for reasons unrelated to the objective
should be ignored…
• …but often data is missing for reasons that are closely
related to the objective.
5. BigML Education Program 5Models 2
Review
• Supervised learning provides an effective way to detect
defaults on consumer loans
• Use objective balancing to create “balanced” models
from unbalanced datasets
• If the missing data in your dataset has meaning, use
“missing splits” to capture that meaning