"Custom ML Models for Each User", Siamion Karasik

#exadelML1
Custom ML Models for Each User
A Case Study Based on ML1, an ML-powered Jira Plug-in
by Exadel

3
#exadelML1
1. About Me & Exadel
2. About ML1
3. Multi-User ML Solutions
4. ML Pipeline
5. Monitoring & Improvement
6. Implementation
AGENDA
#exadelML1

About Me
4
An ML engineer at Exadel
The leader of the Exadel Python community and an
active member of Exadel AI & DS communities
Interested in NLP, problem-solving, and writing
Siamion
Karasik

Exadel is a software engineering company that delivers the digital platforms,
products, and applications our clients need to run and grow their businesses.

Exadel at a Glance
1998
Established in
ISO
27001 Certified
23
Offices in USA, Europe, Asia
25+
Solutions
20+
Open-source projects
1200+
Engineers
7

8
#ML1
Artificial Intelligence
The Exadel AI Practice examines existing products
and processes to discover how modern AI/ML
solutions can be applied to add value and then
brings them to life.

10
Technical Support at Exadel
Before:
Now:
Jira ticket
JC_Git_Management
Assign Category
Support Engineer
Assign Resolver
Support Engineer
Category
JC_Git_Management
Category
Assign Category Assign Resolver
Auto-Assignment Plugin
GIT help
I have an issue with GIT
Jira ticket
GIT help
Support Engineer
Support Engineer

About ML1
ML1 is an AI-powered Jira plug-in that predicts field values in issues/tickets
Predicting
Values
Training
Schedule
Training
Report
Users can select any
field to predict with their
ML model
Training can be set to a
schedule for automatically
improved accuracy
Users can get up-to-date
information on the success
of their model training
12

ML1 at Exadel
Our own Technical Support department uses ML1 to simplify the process of creating and processing
Jira tickets. Here are just a few of the benefits that we’ve seen so far:
Greatly Reduced
Assignment Time
Saved Time for
Our Employees
Saved Money on
Labor Costs
ML1 decreased the
amount of time
necessary to assign a
task from 10 minutes to 10
seconds
With around 10,000
tasks per year, ML1
saved our Technical
Support team
approximately 500
man hours
Even when the number
of technical support
tasks increased by 15%,
we didn’t have to hire
new technical support
staff
13

ML1 is available for free at Atlassian Marketplace🔗!
14

One-model-for-all
VS
A-model-for-each
#exadelML1

One-model-for-all ML Solution
16
ML Algorithm
Training Data
User 3
User 2
Metrics
ML Model
User 1
Feedback Data
Train
Predict
for
Feedback Loop - Retrain

Sometimes One-for-All Doesn’t Work
17
IoT
Legal restrictions IoT
Each client has a custom
ML problem - like in the
case of ML1

A-model-for-each ML Solution
18
ML Algorithm
Training Data
User 2
User 3
User 2
Metrics
ML Model
User 2
User 1
Train
Predict
for
Feedback Data
User 2
Feedback Data
User 3
Training Data
User 3
Training Data
User 1
ML Model
User 3
Metrics
Metrics
ML Model
User 1
Feedback Data
User 1
Train
Train

Choosing the ML Pipeline
20
Multiclass
text classification
42 unbalanced classes
and ~2500 samples
Experimented with:
● TfidfVectorizer, Word2Vec, TruncatedSVD
● Linear models (Logistic Regression, SVM)
● Tree-based models (Random Forest, Boosting)

In the end, this simple pipeline works best on our Jira data:
21
Jira ticket
Concatenate
Title + Description
TfidfVectorizer
Logistic
Regression
GIT Support
Predicted Category
● TfidfVectorizer learns user-specific words
● Logistic Regression does not require many samples
● 70% accuracy
GIT help

Training with Unknown Data
22
With ML1, training data
is provided by users in
runtime
We do not have control
over training data set
size and quality
So the question is: will
our pipeline work for
others?

Walking in Someone Else’s Shoes
We tried another data set and experimented (GitHub)
23
How much extra accuracy will we
get with every 1k samples?
How many features should we
select?

Quantifying the “Shortage of Data”
24
Testing set
Testing set
Testing set
Testing set
Training set
Training set
Training set
Training set
4-fold validation (k=4)
Fold 1
Fold 2
Fold 3
Fold 4
0% 25% 50% 75% 100%
Training set
Training set
high std (cross-validation scores) ⇒ shortage of data

Data Representation Score
What if there are many small classes?
25
● Rule of thumb in ML: there should be at least K samples per class
● representation_score = sum(k for k in Counter(y).values() if k >= K) / len(y)
● We can’t ensure a high-quality model if representation_score is low
C1
40
C2
30
C3
10
C4
10
C5
10
K = 20 representation_score = 70%
70 samples 30 samples

Monitoring &
Improvement
#exadelML1

Monitoring & Improvement Questions
27
How do we monitor a multi-
model ML solution?
● Accuracy
● Data drift
● Explainability
How do we improve the
system overall?
● AutoML?
● Federated Learning?

How does ML1 Work?
ML1 uses the historical data from any set of permissions to automatically predict the value of
any field
31

ML1’s Server Under the Hood
32
A single Docker container
Solves multiclass and multilabel text classification
Accepts training data right from the client
Trains & serves a separate ML model for every target

34
ML1’s Multimodel Server
Jira
POST/train?modelld
POST/predict?modelld
ML Server
ML
Algorith
m
CACHE
Training
Training Data
Storage
Model File
Storage
Training
Info DB
Input Data
ML Model
ML Model
Save
Read
Predictions

Multimodel ML Server from Scratch
An article with code examples to help
you write a multimodel server using
software engineering best practices:
35

THANK YOU!
Want to know more about ML at Exadel? Connect
to our Zoom session in 5 minutes:
https://tinyurl.com/ExadelML
or
copy the link
scan the QR code
CONTACT US Siamion Karasik - ML Engineer - skarasik@exadel.com

How do you Use ML1?
Step 1
Install ML1
Plugin for Jira in
your organization
Step 2
Install the ML Server
Step 3
Enable field prediction in
project settings and set
configurations
Step 4
Train your model
Step 5
Autocomplete selected
Jira field
In just five simple steps, Jira administrators can have ML1 up and running
39

In the end, this simple pipeline works best on our Jira data:
48
● TfidfVectorizer learns user-specific words
● Logistic Regression does not require many samples
● 70% accuracy

Training with Unknown Data
● With ML1, training data is provided by users in runtime
● We do not have control over training data set size and quality
● So the question is: will our pipeline work for others?
61

"Custom ML Models for Each User", Siamion Karasik

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a "Custom ML Models for Each User", Siamion Karasik

Similar a "Custom ML Models for Each User", Siamion Karasik (20)

Más de Fwdays

Más de Fwdays (20)

Último

Último (20)

"Custom ML Models for Each User", Siamion Karasik