Odin2018_Minh_ML_Risk_Prediction

•Descargar como PPTX, PDF•

1 recomendación•51 vistas

Minh Nguyen

Presentation slide deck in Odin conference in Oslo September 19 2018.

Datos y análisis

Applying Machine Learning
to Product Risk Prediction
@ Sparebank1
Odin 2018
Minh Nguyen

Agenda
►Motivation
►Problem definition
►Implementation
►Lesson learned & Take-aways
2

3
Terminologies
Source: www.newgenapps.com/blog/machine-learning-vs-predictive-analytics

Motivation
► Machine Learning (ML) has been widely adopted in most industries.
► But not fully exploited in Software industry, especially Development&Test.
► Huge amount of data collected and available in SDLC.
► Significant correlation between the attributes of Change and Risk.
► Frequently used Risk-based test technique can be time consuming,
subjective assessments and limited by human memory.
4
To build ML models to predict Product Risk with real training
dataset from a development team in SB1

Problem: Risk Prediction
5
User
Change
Risk
Score
ML Prediction Model
Training
Dataset
Manual Risk
Assessment
Optionally
combined with
Test
Operation
Defect
Requirement
Development
Improved Test Planning
and Prioritization

6
Data sources
Development
Requirement /
Change
Test
Operation
Statistics
Defect
Zephyr
Remedy/
Splunk
Confluence
Jira
Jira

Dataset for Supervised Training
Data extracted from Jira on Change:
► Summary
► Description
► Severity
► Priority
► Component
Complexity
ML-model
Derived from:
Defects
Incidents
Calculation
Risk Score
Label

ML-Classification
8
Complexity
Score
(3-High, 2-Normal
and 1-Low)
Natural Language Processing
Naive Bayes ML algoritme
Summary
&
Description
249 observations in training dataset:
Bag of Words
(incl. 1420 words)
1. Cleaning
2. Stop-words
3. Stemming

Risk Score Calculation
9
DEFECT
For a given
Change found
in test phase
INCIDENT
for a given
Component
reported from Prod
i = 1 &
y in (critical, major,
normal, trivial)
Max #
of defects
Defect(i) * Severity(i) * Weight(y)
Incident(i) * Severity(i) * Weight(y)
i = 1 &
y in (critical, major,
normal, trivial)
Max #
of incidents
RISK SCORE
For a given
Change

10
Risk Prediction with Regression
Change
Multiple Linear Regression
Risk Score
123 observations in labeled training dataset
ML-Classified
ML-Predicted

Measure success of prediction
How to demonstrate improvement of following metrics:
1. Test Effectiveness = No. of defect found / No. of Test Execution
2. MTBI – Mean time between Incident for a given Component
3. Test Productivity = Number of testing hours per Change
11

Choice of ML Technology
12
• Easy-to-use and quick enabler.
• Simulation with ML-algorithms.
• Lisence based – expensive.
• Cloud-based – «GDPR».
• Development (convenient packages).
• Flexibility and better understanding.
• Freeware.
• Local data governance.

Lesson learned
13
Problem
Definition
Data
Collection
Data
Pre-processing
Training
& Test
Deployment
& Learning
 Time-consuming.
 Data access.
 Data quality.
 Missing values.
 Inconsistent linkages.
 NLP – Norwegian.
 Out-of-the box
• ML-algorithms
• Performance metrics
 Limited dataset.
 Unprecise labeling.
 Seamlessly integrated
to current process.
 Self-learning cycle.

14
Inspiration…
Problem
Prediction/
Classification
Answer to a
specific question
Clustering
Understand &
Innsight
Strategy
Explore
Exploit
Reinforcement
Unsupervised
Supervised
ML-
algorithms
Labeled
past data
Unlabeled
past data
Possibilities,
Reward

Thank you for listening
15
Minh Nguyen
+47 982 28 460
minh.nguyen@sogeti.no
www.linkedin.com/in/minhng67/

Más contenido relacionado

La actualidad más candente

A practical guide for using Statistical Tests to assess Randomized Algorithms...

Lionel Briand

Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies

Lionel Briand

AI in SE: A 25-year Journey

Lionel Briand

Can we predict the quality of spectrum-based fault localization?

Lionel Briand

Practical Constraint Solving for Generating System Test Data

Lionel Briand

Assessing the Reliability of a Human Estimator

Tim Menzies

Using Developer Information as a Prediction Factor

Tim Menzies

Automated Inference of Access Control Policies for Web Applications

Lionel Briand

In today's increasingly digitalised world, software defects are enormously expensive. In 2018, the Consortium for IT Software Quality reported that software defects cost the global economy $2.84 trillion dollars and affected more than 4 billion people. The average annual cost of software defects on Australian businesses is A$29 billion per year. Thus, failure to eliminate defects in safety-critical systems could result in serious injury to people, threats to life, death, and disasters. Traditionally, software quality assurance activities like testing and code review are widely adopted to discover software defects in a software product. However, ultra-large-scale systems, such as, Google, can consist of more than two billion lines of code, so exhaustively reviewing and testing every single line of code isn't feasible with limited time and resources. This project aims to create technologies that enable software engineers to produce the highest quality software systems with the lowest operational costs. To achieve this, this project will invent an end-to-end explainable AI platform to (1) understand the nature of critical defects; (2) predict and locate defects; (3) explain and visualise the characteristics of defects; (4) suggest potential patches to automatically fix defects; (5) integrate such platform as a GitHub bot plugin.

Explainable Artificial Intelligence (XAI)  to Predict and Explain Future Soft...

Chakkrit (Kla) Tantithamthavorn

Software testing is a crucial activity to assess the correct behavior of a program. However, it is also costly since it consumes a large ratio of software development time. For this reason, researchers have investigated techniques to automate the process of creating test cases. The key idea is to use meta-heuristics (e.g., Genetic Algorithms) to automatically generate test cases that reveal software failures. In this talk, I will present a case study in the automotive and showing the larger effectiveness and efficiency of meta-heuristics compared to manual testing.

Speeding-up Software Testing With Computational Intelligence

Annibale Panichella

Feature Selection Techniques for Software Fault Prediction (Summary)

SungdoGu

Scalable and Cost-Effective Model-Based Software Verification and Testing

Lionel Briand

The adoption of machine learning techniques for software defect prediction: A...

RAKESH RANA

Research-Based Innovation with Industry: Project Experience and Lessons Learned

Lionel Briand

Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...

Lionel Briand

Search-Based Software Engineering is now a mature area with numerous techniques developed to tackle some of the most challenging software engineering problems, from requirements to design, testing, fault localisation, and automated program repair. SBSE techniques have shown promising results, giving us hope that one day it will be possible for the tedious and labour intensive parts of software development to be completely automated, or at least semi-automated. In this talk, I will focus on the problem of objective performance evaluation of SBSE techniques. To this end, I will introduce Instance Space Analysis (ISA), which is an approach to identify features of SBSE problems that explain why a particular instance is difficult for an SBSE technique. ISA can be used to examine the diversity and quality of the benchmark datasets used by most researchers, and analyse the strengths and weaknesses of existing SBSE techniques. The instance space is constructed to reveal areas of hard and easy problems, and enables the strengths and weaknesses of the different SBSE techniques to be identified. I will present on how ISA enabled us to identify the strengths and weaknesses of SBSE techniques in two areas: Search-Based Software Testing and Automated Program Repair. Finally, I will end my talk with future directions of the objective assessment of SBSE techniques.

Instance Space Analysis for Search Based Software Engineering

Aldeida Aleti

Scenario $4$

Jason121

Enabling Automated Software Testing with Artificial Intelligence

Lionel Briand

Final Exam Questions Fall03

Radu_Negulescu

Generating test cases using UML Communication Diagram

Praveen Penumathsa

La actualidad más candente (20)