Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

BigML Fall 2015 Release

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 31 Anuncio

BigML Fall 2015 Release

Descargar para leer sin conexión

BigML is the first Machine Learning service offering Association Discovery on the cloud! With these slides you can learn how to use Association Discovery and other new features such as Partial Dependence Plots, Logistic Regression, Correlations, Statistical Tests and Flatline Editor.

BigML is the first Machine Learning service offering Association Discovery on the cloud! With these slides you can learn how to use Association Discovery and other new features such as Partial Dependence Plots, Logistic Regression, Correlations, Statistical Tests and Flatline Editor.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a BigML Fall 2015 Release (20)

Anuncio

Más de BigML, Inc (20)

Más reciente (20)

Anuncio

BigML Fall 2015 Release

  1. 1. Introducing Association Discovery BigML 2015 Fall Release
  2. 2. BigML Inc Fall 2015 Release 2 Today’s Webinar •Speaker: •Poul Petersen, CIO •Moderator: •Atakan Ce>nsoy, VP Predic>ve Applica>ons •Enter ques>ons into chat box – we’ll answer some via text; others at the end of the session •email: info@bigml.com •TwiPer: @bigmlcom
  3. 3. BigML Inc Fall 2015 Release 3 Associa1on Discovery Algorithm “Magnum Opus” from Geoff Webb Unsupervised Learning: unlabelled data Learning Task: Find “interes1ng” rela1ons between variables.
  4. 4. BigML Inc Fall 2015 Release Decision Trees Bagging Decision Forest 4 BigML Workflow MODEL DATASET CLUSTER ANOMALY ASSOCIATION SOURCE K-Means G-Means Isola>on Forest Magnum Opus
  5. 5. BigML Inc Fall 2015 Release 5 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 The Sally 6788 sign food 26339 51 Clustering date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 The Sally 6788 sign food 26339 51 Anomaly Detec1on similar unusual Unsupervised Learning
  6. 6. BigML Inc Fall 2015 Release date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 The Sally 6788 sign food 26339 51 6 {customer = Bob, account = 3421} zip = 46140 Rules: {class = gas} amount > 80 Associa1on Rules
  7. 7. BigML Inc Fall 2015 Release date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 The Sally 6788 sign food 26339 51 7 {customer = Bob, account = 3421} zip = 46140 Rules: {class = gas} amount > 80 Antecedent Consequent Associa1on Rules
  8. 8. BigML Inc Fall 2015 Release 8 Use Cases • Market Basket Analysis • Web usage paPerns • Intrusion detec>on • Fraud detec>on • Bioinforma>cs • Medical risk factors
  9. 9. BigML Inc Fall 2015 Release 9 Market Basket Analysis • Dataset of 9,834 grocery cart transac>ons • Each row is a list of all items in a cart at checkout GOAL: Discover “interes1ng” rules about what store items are typically purchased together.
  10. 10. BigML Inc Fall 2015 Release 10 Associa1on Metrics Instances A C Coverage Percentage of instances which match antecedent “A”
  11. 11. BigML Inc Fall 2015 Release 11 Associa1on Metrics Instances A C Support Percentage of instances which match antecedent “A” and Consequent “C”
  12. 12. BigML Inc Fall 2015 Release Confidence Percentage of instances in the antecedent which also contain the consequent. Support Coverage 12 Associa1on Metrics Instances A C
  13. 13. BigML Inc Fall 2015 Release C Instances A C A Instances C Instances A 13 Associa1on Metrics Instances A C 0% 100% Instances A C Confidence A never implies C A some1mes implies C A always implies C
  14. 14. BigML Inc Fall 2015 Release LiO Ra>o of observed support to support if A and C were sta>s>cally independent. Support == Confidence p(A) * p(C) p(C) 14 Associa1on Metrics Independent A C C Observed A
  15. 15. BigML Inc Fall 2015 Release C Observed A 15 Associa1on Metrics Observed A C < 1 > 1 Independent A C Lift = 1 Nega>ve Correla>on No Associa>on Posi>ve Correla>on Independent A C Independent A C Observed A C
  16. 16. BigML Inc Fall 2015 Release 16 Associa1on Metrics Independent A C C Observed A Leverage Difference of observed support and support if A and C were sta>s>cally independent. Support - [ p(A) * p(C) ]
  17. 17. BigML Inc Fall 2015 Release C Observed A 17 Associa1on Metrics Observed A C < 0 > 0 Independent A C Leverage = 0 Nega>ve Correla>on No Associa>on Posi>ve Correla>on Independent A C Independent A C Observed A C -1… …1
  18. 18. BigML Inc Fall 2015 Release 18 GOAL: Find general rules that indicate diabetes. • Dataset of diagnos>c measurements of 768 pa>ents. • Each pa>ent labelled True/False for diabetes. Medical Risk
  19. 19. BigML Inc Fall 2015 Release 19 Medical Risk Associa1on Rule If plasma glucose > 146 then diabetes = TRUE Decision Tree If plasma glucose > 155 and bmi > 29.32 and diabetes pedigree > 0.32 and insulin <= 629 and age <= 44 then diabetes = TRUE
  20. 20. BigML Inc Fall 2015 Release 20 Par1al Dependence Plots Visualize Ensembles
  21. 21. BigML Inc Fall 2015 Release 21 Flatline Editor hPps://github.com/bigmlcom/flatline
  22. 22. BigML Inc Fall 2015 Release Decision Trees Bagging Decision Forest 22 BigML Workflow MODEL DATASET CLUSTER ANOMALY ASSOCIATION SOURCE K-Means G-Means Isola>on Forest Magnum Opus DATASET Flatline Flatline Editor
  23. 23. BigML Inc Fall 2015 Release 23 Logis1c Regression DATASET LOGISTIC REGRESSION • Classifica>on algorithm • Categorical: one-hot encoded • Text: mapped to token freq • Bindings support local model • I1/I2 regulariza>on • Currently API only hPps://bigml.com/developers/logis>cregressions
  24. 24. BigML Inc Fall 2015 Release Decision Trees Bagging Decision Forest Logis>c Regression 24 BigML Workflow MODEL DATASET CLUSTER ANOMALY ASSOCIATION SOURCE K-Means G-Means Isola>on Forest Magnum Opus DATASET Flatline Flatline Editor
  25. 25. BigML Inc Fall 2015 Release 25 BigML Classifiers Advantages Disadvantages Single Tree easy to interpret robust to missing data overfiong Ensemble top performer robust to missing data hard to interpret Logis1c Regression robust to noise outputs probability no missing data hard to interpret
  26. 26. BigML Inc Fall 2015 Release Decision Trees Bagging Decision Forest Logis>c Regression 26 BigML Workflow MODEL DATASET CLUSTER ANOMALY ASSOCIATION SOURCE K-Means G-Means Isola>on Forest Magnum Opus Sta>s>cal Tests Correla>ons STATS DATASET Flatline Flatline Editor
  27. 27. BigML Inc Fall 2015 Release 27 Correla1ons DATASET CORRELATION • Pearson Coefficient • Spearman Coefficient • Chi-Square • Cramér's V • Tschuprow's T • One-way ANOVA hPps://bigml.com/developers/correla>ons
  28. 28. BigML Inc Fall 2015 Release 28 Sta1s1cal Tests DATASET STATISTICAL TESTS • Benford’s Law • Anderson-Darling • Jarque-Bera • Z-score • Grubbs hPps://bigml.com/developers/sta>s>caltests
  29. 29. BigML Inc Fall 2015 Release Decision Trees Bagging Decision Forest Logis>c Regression 29 BigML Workflow MODEL DATASET CLUSTER ANOMALY ASSOCIATION SOURCE K-Means G-Means Isola>on Forest Magnum Opus Sta>s>cal Tests Correla>ons STATS DATASET Flatline Flatline Editor
  30. 30. BigML Inc Fall 2015 Release 30 Q&A •Ask ques1ons and get a Free BigML T-shirt! •All demonstrated features are immediately available to all users including: •All subscrip1on plans •Virtual Private Cloud (VPC) customers •On-premise implementa1ons. •Documenta1on@ hRps://bigml.com/releases
  31. 31. BigML Inc Fall 2015 Release 31 FEEDBACK @bigmlcomTWITTER info@bigml.com Get Started Today! RESOURCES Join us for future webinars & hangouts OFFICE HOURS Every Wednesday 9:30am Pacific Time

×