Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

AI Yellow Belt - Day 1 - case by Sagacify

118 visualizaciones

Publicado el

AI Yellow Belt - Day 1 - case by Sagacify
Email classification

Publicado en: Datos y análisis
  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

AI Yellow Belt - Day 1 - case by Sagacify

  1. 1. Yellow Belt Case study Amaury Beeckman, Machine learning Engineer at Sagacify 28 May 2019 Automatic Claim Email Classification
  2. 2. We are Sagacify • Experts in Artificial Intelligence • Natural Language Processing • Computer vision • Predictive models • Experts in Software Development • Web & Mobile • R&D oriented • Strong collaboration with Universities • Focused on moonshot ideas!
  3. 3. Project’s scope
  4. 4. Copyright Sagacify SPRL, Confidential – Do not share Automatic claim email classification in the insurance business 1. Incoming emails Categories Category 1 Category 2 Category 3 … 2. Read emails content The model has learned its own set of rules that associates the text of an email to a label 3. Learned model predicts labels ML Model Context of the project
  5. 5. Copyright Sagacify SPRL, Confidential – Do not share Main business problem 5 Too many categories About a thousand ! Become difficult for the business Too many possibilities to memorize Will be complex for the ML model There are many subtleties that the model will need to understand
  6. 6. Copyright Sagacify SPRL, Confidential – Do not share Answer: Clustering 6 Group closely related categories together From 1000’s to less than 100’s Allow new set of labels Closely related to business process Complexity reduction for the ML model Fewer labels that makes more sense
  7. 7. Copyright Sagacify SPRL, Confidential – Do not share What about Clustering 7 Machine learning algorithm ◼ Groups entries that are closely related ◼ Uses the mean euclidean distance as metric ◼ https://www.naftaliharris.com/blog/visualizing-k-means -clustering/
  8. 8. Copyright Sagacify SPRL, Confidential – Do not share What about the dataset 8 ◼ One row represents one email ◼ One column represents one class ◼ We have ~25 000 mails and 339 classes ◼ One cell corresponds to the probability of a mail being in a particular class
  9. 9. It’s time for a Jupyter notebook yellow_case_study.ipynb
  10. 10. Whole process is more complex
  11. 11. Copyright Sagacify SPRL, Confidential – Do not share First Step: Deep-Learning 11 Categories Probas of category 1 Probas of category 2 Probas of category 3 … Text input The model has learned its own set of rules that associates the text of an email to a label Deep-Learning model
  12. 12. Copyright Sagacify SPRL, Confidential – Do not share Second step: Clustering algorithms 12 ◼ Same idea as what we already done. ◼ Start with output probabilities of our Deep-Learning model ◼ Cluster the emails in different groups ◼ Use Graph theory to link closely related classes together
  13. 13. Copyright Sagacify SPRL, Confidential – Do not share Third step: Validation with business 13 ◼ The results must be validated by the business ◼ We had several focus sessions to derive the ideal labellisation ○ That perfectly underlies the process of the company ○ That make sense algorithmically for our models.
  14. 14. “Just like electricity did 100 years ago, artificial intelligence will revolutionize all industry” “The value of AI is not to be found in the models themselves, but in organizations abilities to harness them “ – Andrew Ng – McKinsey Global Institute – April 2018

×