Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Barga Data Science lecture 2

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Cargando en…3
×

Eche un vistazo a continuación

1 de 137 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Anuncio

Similares a Barga Data Science lecture 2 (20)

Anuncio

Más reciente (20)

Barga Data Science lecture 2

  1. 1. Deriving Knowledge from Data at Scale
  2. 2. Deriving Knowledge from Data at Scale An Introduction to Statistical Learning with Applications in R The Elements of Statistical Learning A Programmer’s Guide to Data Mining Probabilistic Programming & Bayesian Methods for Hackers Think Bayes, Bayesian Statistics Made Simple
  3. 3. Deriving Knowledge from Data at Scale Data Mining and Analysis, Fundamental Concepts and Algorithms An Introduction to Data Science Machine Learning Machine Learning – The Complete Guide
  4. 4. Deriving Knowledge from Data at Scale Bayesian Reasoning and Machine Learning A Course in Machine Learning Information Theory, Inference and Learning Algorithms Modeling with Data Mining of Massive Datasets Information Theory, Inference and Learning Algorithms
  5. 5. Deriving Knowledge from Data at Scale
  6. 6. Deriving Knowledge from Data at Scale • Introduction to machine learning • Overview of the machine learning process • Introduction to select algorithms • Introduction to concepts we will examine later in course: bias/variance, generalization, underfitting, overfitting, ensemble methods, etc. • Elements of a time series • Functions to manipulate time series • Lay a foundation for Prediction and Forecasting…
  7. 7. Deriving Knowledge from Data at Scale http://www.cs.waikato.ac.nz/ml/weka/ install by next class, get the developer version…
  8. 8. Deriving Knowledge from Data at Scale • Class Discussions 20 minutes • Machine Learning Primer 60 minutes • Break 15 minutes • Time Series & Forecasting, 1/2 60 minutes • Closing discussion 15 minutes
  9. 9. Deriving Knowledge from Data at Scale so we need teams
  10. 10. Deriving Knowledge from Data at Scale Discussion on Assigned Reading… Week One
  11. 11. Deriving Knowledge from Data at Scale The Data Science Workflow Lecture 1 Review Stay in the immediate zone during exploratory modeling, to extent possible: • 5 to 10 minutes per experiment, results in 100’s per day; • Small, statistically sound/relevant samples; • Linear modelling during feature exploration; • Don’t write ML algorithms, use packages, do learn to write data manipulation code; Start with a sample of data, as soon as you can get it… • You will learn a considerable amount from that first data sample • Quality of the data, what fields are being collected, possible missing values and/or missing fields, and the questions will begin to flow (dialogue with customer will be at a much more meaningful level). If your customer can’t quantify it, you can’t change/improve it…
  12. 12. Deriving Knowledge from Data at Scale The Data Science Workflow Loops within loops…
  13. 13. Deriving Knowledge from Data at Scale The Data Science Workflow Define the Goal The first task in a data science project is to define a measurable & quantifiable goal. At this stage, learn all that you can about the context of your project. • Why do the project sponsors want the project in the first place? What do they lack, and what do they need? • What are they doing to solve the problem now, and why isn't that good enough? • What resources will you need: what kind of data, how much staff, will you have domain experts to collaborate with, what are the computational resources? • How do the project sponsors plan to deploy your results? What are the constraints that have to be met for successful deployment? Not We want to get better at finding bad loans. But We want to reduce our rate of loan charge-offs by at least 10%, using a model that predicts which loan applicants are likely to default.
  14. 14. Deriving Knowledge from Data at Scale A concrete goal begets concrete stopping conditions and concrete acceptance criteria. The less specific the goal, the likelier that the project will go unbounded, because no result will be "good enough." If you don't know what you want to achieve, you don't know when to stop trying – or even what to try. When the project eventually terminates – because either time or resources run out – no one will be happy with the outcome…
  15. 15. Deriving Knowledge from Data at Scale The Data Science Workflow Data Collection and Management This step encompasses identifying the data you need, exploring it, and conditioning it to be suitable for analysis. This stage is often the most time- consuming step in the process. It's also one of the most important. • What data is available to me? • Will it help me solve the problem? • Is it enough? • Is it of good enough quality? Rule First piece of data is very informative, data set utility is roughly logarithmic in size. Prefer Direct measurements, but if they are not available identify proxy variables.
  16. 16. Deriving Knowledge from Data at Scale
  17. 17. Deriving Knowledge from Data at Scale The Data Science Workflow Modeling We get to statistics and machine learning during the modeling stage. Here is where you try to extract useful insights from the data. Many modeling procedures make specific assumptions about data distribution and relationships, there will be back-and- forth between the modeling and data cleaning stages as you try to find the best way to represent and model the data. The most common data science modeling tasks are: • Classification: deciding if something belongs to one category or another. • Scoring: predicting or estimating a numeric value such as a price or probability. • Ranking: learning to order items by preferences. • Clustering: grouping items into most similar groups. • Finding Relations: finding correlations or potential causes of effects seen in the data. • Characterization: very general plotting and report generation from data.
  18. 18. Deriving Knowledge from Data at Scale The Data Science Workflow Model Evaluation and Critique Once you have a model, you need to determine if it meets your goals. • Is it accurate enough for your needs? • Does it generalize well? • Does it perform better than "the obvious guess"? Better than whatever estimate you currently use? • Do the results of the model (coefficients, clusters, rules) make sense in the context of the problem domain? If you've answered "no" to any of the above questions, it's time to loop back to the modeling step — or decide that the data doesn't support the goal you are trying to achieve.
  19. 19. Deriving Knowledge from Data at Scale The Data Science Workflow Model Deployment and Maintenance Finally, the model is put into operation. In many organizations this means the data scientist no longer has primary responsibility for the day-to-day operation of the model. However, you still should ensure that the model will run smoothly and will not make disastrous unsupervised decisions. You also want to make sure that the model can be updated as its environment changes. And in many situations, the model will initially be deployed in a small pilot program. The test might bring out issues that you didn't anticipate, and you will have to adjust the model appropriately.
  20. 20. Deriving Knowledge from Data at Scale The Data Science Workflow Presentation and Documentation Once you have a model that meets your success criteria, you will present your results to your project sponsor and other stakeholders. You must also document the model for those in the organization who are responsible for using, running, and maintaining the model once it has been deployed. Model interpretability may be an issue…
  21. 21. Deriving Knowledge from Data at Scale The Data Science Workflow Loops within loops…
  22. 22. Deriving Knowledge from Data at Scale • Class Discussions 30 minutes • Machine Learning Primer 60 minutes • Break 15 minutes • Time Series & Forecasting 60 minutes • Closing discussion 15 minutes
  23. 23. Deriving Knowledge from Data at Scale
  24. 24. 1 1 5 4 3 7 5 3 5 3 5 5 9 0 6 3 5 2 0 0
  25. 25. training data (expensive) synthetic training data (cheaper)
  26. 26. solve hard problems value from Big Data human intelligence
  27. 27. Machine learning enables nearly every value proposition of web search.
  28. 28. Hundreds of thousands of machines… Hundreds of metrics and signals per machine… Which signals correlate with the real cause of a problem? How can we extract effective repair actions?
  29. 29. solve hard problems value from Big Data human intelligence business analytics
  30. 30. Predicting future performance from historical data Recommenda- tion engines Advertising analysis Weather forecasting for business planning Social network analysis IT infrastructure and web app optimization Legal discovery and document archiving Pricing analysis Fraud detection Churn analysis Equipment monitoring Location-based tracking and services Personalized Insurance Predictive analytics address the likelihood of something happening in the future, even if it is just an instant later…
  31. 31. object encoded with features (think DB attributes/ OO member fields of primitive types) 𝑑 is the feature dimensionality. classifier prediction (response/dependent variable). Can be qualitative/quantitative (classification/regression). 𝑶𝒃𝒋𝒆𝒄𝒕 → 𝑶𝒖𝒕𝒄𝒐𝒎𝒆 Entity → Category Entity → PopularityComplex decision making: 𝑿 → 𝒀 𝑿 = (𝒙 𝟎, … , 𝒙 𝒅) → 𝒀 input/independent variable We may know the relation for certain values of 𝑋 and 𝑌: In fact, we may know the relation for many 𝒙s and 𝑦s: 𝒙, 𝑦 𝒙 𝟏 , 𝑦 𝟏 , … , 𝒙 𝑵 , 𝑦 𝑵 The 𝑖-th 𝑥 is: 𝒙(𝑖) = 𝑥0 (𝑖) , … , 𝑥 𝑑 (𝑖)
  32. 32. TRAINING Input 𝒙 = (𝑥 𝟎, … , 𝑥 𝒅) Online System object encoded with features classifier prediction (response/dependent variable) Final Output 𝑦 Model Offline Training Sub-system Training Data 𝒙 𝟏 , 𝑦 𝟏 , … , 𝒙 𝑵 , 𝑦 𝑵 where 𝒙(𝒊) = 𝑥 𝟎 (𝒊) , … , 𝑥 𝒅 (𝒊) 𝑓 𝑋 = 𝑌𝑋 → 𝑌 Task is very complex . Hard to construct good 𝑓. We construct an approximation to 𝑓: 𝑔(𝑋) ≈ 𝑌 Hypothesis space: 𝑔(𝑋) ∈ 𝐻.
  33. 33. Deriving Knowledge from Data at Scale The decision is driven by both the nature of your data and the question you are trying to answer
  34. 34. Deriving Knowledge from Data at Scale Out of Class Reading Week Two
  35. 35. gender age smoker eye color male 19 yes green female 44 yes gray male 49 yes blue male 12 no brown female 37 no brown female 60 no brown male 44 no blue female 27 yes brown female 51 yes green female 81 yes gray male 22 yes brown male 29 no blue lung cancer no yes yes no no yes no no yes no no no male 77 yes gray male 19 yes green female 44 no gray ? ? ?
  36. 36. gender age smoker eye color male 19 yes green female 44 yes gray male 49 yes blue male 12 no brown female 37 no brown female 60 no brown male 44 no blue female 27 yes brown female 51 yes green female 81 yes gray male 22 yes brown male 29 no blue lung cancer no yes yes no no yes no no yes no no no male 77 yes gray male 19 yes green female 44 no gray ? ? ? Train ML Model
  37. 37. gender age smoker eye color male 19 yes green female 44 yes gray male 49 yes blue male 12 no brown female 37 no brown female 60 no brown male 44 no blue female 27 yes brown female 51 yes green female 81 yes gray male 22 yes brown male 29 no blue lung cancer no yes yes no no yes no no yes no no no male 77 yes gray male 19 yes green female 44 no gray yes no no Train ML Model
  38. 38. Always focuses on a bias-variance decomposition We have a lecture dedicated to evaluation metrics for models.
  39. 39. key easily the most important factor
  40. 40. key CITY 1 LAT. CITY 1 LNG. CITY 2 LAT. CITY 2 LNG. DRIVABLE? 123.24 46.71 121.33 47.34 Yes 123.24 56.91 121.33 55.23 Yes 123.24 46.71 121.33 55.34 No 123.24 46.71 130.99 47.34 No
  41. 41. key Feature engineering
  42. 42. More data wins Once you’ve defined your input fields, there’s only so much analytic gymnastics you can do. Computer algorithms trying to learn models have only a relatively few tricks they can do efficiently, and many of them are not so very different. Performance differences between algorithms are typically not large. Thus, if you want better classifiers: 1. Engineer better features 2. Get your hands on more high-quality data
  43. 43. The power of ensembles models vote for the final prediction
  44. 44. Occam’s Razor smaller faster to fit more interpretable
  45. 45. representable could will
  46. 46. observational data can only show us that two variables are related, but it cannot tell us the “why” Freakonomics tended to have higher standardized test scores
  47. 47. Deriving Knowledge from Data at Scale Break, 5 minutes…
  48. 48. Deriving Knowledge from Data at Scale boosting)
  49. 49. Deriving Knowledge from Data at Scale
  50. 50. Deriving Knowledge from Data at Scale Ensemble Classification
  51. 51. Deriving Knowledge from Data at Scale Let’s look at the Netflix Prize Competition…
  52. 52. Deriving Knowledge from Data at Scale Began October 2006
  53. 53. Deriving Knowledge from Data at Scale
  54. 54. Deriving Knowledge from Data at Scale from http://www.research.att.com/~volinsky/netflix/ However, improvement slowed…
  55. 55. Deriving Knowledge from Data at Scale Today, the top team has posted a 8.5% improvement. Ensemble methods are the best performers…
  56. 56. Deriving Knowledge from Data at Scale “Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75” Rookies
  57. 57. Deriving Knowledge from Data at Scale “My approach is to combine the results of many methods (also two- way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” Arek Paterek http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
  58. 58. Deriving Knowledge from Data at Scale “When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” U of Toronto http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
  59. 59. Deriving Knowledge from Data at Scale Gravity home.mit.bme.hu/~gtakacs/download/gravity.pdf
  60. 60. Deriving Knowledge from Data at Scale “Our common team blends the result of team Gravity and team Dinosaur Planet.” Might have guessed from the name… When Gravity and Dinosaurs Unite
  61. 61. Deriving Knowledge from Data at Scale And, yes, the top team which is from AT&T… “Our final solution (RMSE=0.8712) consists of blending 107 individual results. “ BellKor / KorBell
  62. 62. Deriving Knowledge from Data at Scale
  63. 63. Deriving Knowledge from Data at Scale • 83.7% majority vote accuracy • 99.9% majority vote accuracy Intuitions
  64. 64. Deriving Knowledge from Data at Scale Boosting Bagging
  65. 65. Deriving Knowledge from Data at Scale Leo Breiman
  66. 66. Deriving Knowledge from Data at Scale • Resampl few data • Resample the minority data skew
  67. 67. Deriving Knowledge from Data at Scale N=10
  68. 68. Deriving Knowledge from Data at Scale
  69. 69. Deriving Knowledge from Data at Scale
  70. 70. Deriving Knowledge from Data at Scale
  71. 71. Deriving Knowledge from Data at Scale Break, 5 minutes…
  72. 72. Deriving Knowledge from Data at Scale Time Series & Forecasting, 1/2
  73. 73. Deriving Knowledge from Data at Scale median filter, a windowing technique that moves through the data point-by-point, and replaces it with the median value calculated for the current window…
  74. 74. Deriving Knowledge from Data at Scale Visual inspection reveals smoothing of the outliers without dampening the naturally occurring peaks and troughs (no signal loss). Prior to smoothing, we could see no correlation in our data, but afterwards, Spearman’s Rho was ~0.5 for almost all parameters.
  75. 75. Deriving Knowledge from Data at Scale ? ? ?
  76. 76. Deriving Knowledge from Data at Scale 3.7 12.5 9.0
  77. 77. Deriving Knowledge from Data at Scale Forecasting Quantitative Causal Model Trend Time series Stationary Trend Trend + Seasonality Qualitative Expert Judgment Delphi Method Grassroots
  78. 78. Deriving Knowledge from Data at Scale Year 1 Year 2 Year 3 Year 4 Demandforproductorservice
  79. 79. Deriving Knowledge from Data at Scale Year 1 Year 2 Year 3 Year 4 Demandforproductorservice Trend component Actual demand line Seasonal peaks Random variation
  80. 80. Deriving Knowledge from Data at Scale87
  81. 81. Deriving Knowledge from Data at Scale88
  82. 82. Deriving Knowledge from Data at Scale • Naïve • Moving Average • Exponential Smoothing • Regression
  83. 83. Deriving Knowledge from Data at Scale
  84. 84. Deriving Knowledge from Data at Scale 91
  85. 85. Deriving Knowledge from Data at Scale92 smoothing time n A+...+A+A+A =F 1n-t2-t1-tt 1t   Ft+1 = Forecast for the upcoming period, t+1 n = Number of periods to be averaged A t = Actual occurrence in period t
  86. 86. Deriving Knowledge from Data at Scale 3 n A+...+A+A+A =F 1n-t2-t1-tt 1t   Month Sales (000) 1 4 2 6 3 5 4 ? 5 ? 6 ?
  87. 87. Deriving Knowledge from Data at Scale Month Sales (000) Moving Average (n=3) 1 4 NA 2 6 NA 3 5 NA 4 ? 5 ? (4+6+5)/3=5 6 ? n A+...+A+A+A =F 1n-t2-t1-tt 1t   You’re manager in Amazon’s electronics department. You want to forecast ipod sales for months 4-6 using a 3-period moving average.
  88. 88. Deriving Knowledge from Data at Scale Month Sales (000) Moving Average (n=3) 1 4 NA 2 6 NA 3 5 NA 4 3 5 ? 5 6 ? ?
  89. 89. Deriving Knowledge from Data at Scale Month Sales (000) Moving Average (n=3) 1 4 NA 2 6 NA 3 5 NA 4 3 5 ? 5 6 ? (6+5+3)/3=4.667
  90. 90. Deriving Knowledge from Data at Scale Month Sales (000) Moving Average (n=3) 1 4 NA 2 6 NA 3 5 NA 4 3 5 7 5 6 ? 4.667?
  91. 91. Deriving Knowledge from Data at Scale Month Sales (000) Moving Average (n=3) 1 4 NA 2 6 NA 3 5 NA 4 3 5 7 5 6 ? 4.667 (5+3+7)/3=5
  92. 92. Deriving Knowledge from Data at Scale99
  93. 93. Deriving Knowledge from Data at Scale 100
  94. 94. Deriving Knowledge from Data at Scale10
  95. 95. Deriving Knowledge from Data at Scale Month Weighted Moving Average 1 4 NA 2 6 NA 3 5 NA 4 31/6 = 5.167 5 6 ? ? ? 1n-tn2-t31-t2t11t Aw+...+Aw+Aw+Aw=F  Sales (000)
  96. 96. Deriving Knowledge from Data at Scale Month Sales (000) Weighted Moving Average 1 4 NA 2 6 NA 3 5 NA 4 3 31/6 = 5.167 5 7 6 25/6 = 4.167 32/6 = 5.333 1n-tn2-t31-t2t11t Aw+...+Aw+Aw+Aw=F 
  97. 97. Deriving Knowledge from Data at Scale10
  98. 98. Deriving Knowledge from Data at Scale10
  99. 99. Deriving Knowledge from Data at Scale 106 simple any • Previous time previous actual and forecasted value
  100. 100. Deriving Knowledge from Data at Scale • gives more weight to recent time periods Ft+1 = Ft + a(At - Ft) et Ft+1 = Forecast value for time t+1 At = Actual value at time t a = Smoothing constant Need initial forecast Ft to start.
  101. 101. Deriving Knowledge from Data at Scale Week Demand 1 820 2 775 3 680 4 655 5 750 6 802 7 798 8 689 9 775 10 Given the weekly demand data what are the exponential smoothing forecasts for periods 2-10 using a=0.10? Assume F1=D1 Ft+1 = Ft + a(At - Ft) i Ai
  102. 102. Deriving Knowledge from Data at Scale Week Demand 0.1 0.6 1 820 820.00 820.00 2 775 820.00 820.00 3 680 815.50 793.00 4 655 801.95 725.20 5 750 787.26 683.08 6 802 783.53 723.23 7 798 785.38 770.49 8 689 786.64 787.00 9 775 776.88 728.20 10 776.69 756.28 Ft+1 = Ft + a(At - Ft) a = F2 = F1+ a(A1–F1) =820+.1(820–820) =820 i Ai Fi
  103. 103. Deriving Knowledge from Data at Scale Week Demand 0.1 0.6 1 820 820.00 820.00 2 775 820.00 820.00 3 680 815.50 793.00 4 655 801.95 725.20 5 750 787.26 683.08 6 802 783.53 723.23 7 798 785.38 770.49 8 689 786.64 787.00 9 775 776.88 728.20 10 776.69 756.28 Ft+1 = Ft + a(At - Ft) a = F3 = F2+ a(A2–F2) =820+.1(775–820) =815.5 i Ai Fi
  104. 104. Deriving Knowledge from Data at Scale Week Demand 0.1 0.6 1 820 820.00 820.00 2 775 820.00 820.00 3 680 815.50 793.00 4 655 801.95 725.20 5 750 787.26 683.08 6 802 783.53 723.23 7 798 785.38 770.49 8 689 786.64 787.00 9 775 776.88 728.20 10 776.69 756.28 Ft+1 = Ft + a(At - Ft) This process continues through week 10 a = i Ai Fi
  105. 105. Deriving Knowledge from Data at Scale Week Demand 0.1 0.6 1 820 820.00 820.00 2 775 820.00 820.00 3 680 815.50 793.00 4 655 801.95 725.20 5 750 787.26 683.08 6 802 783.53 723.23 7 798 785.38 770.49 8 689 786.64 787.00 9 775 776.88 728.20 10 776.69 756.28 Ft+1 = Ft + a(At - Ft) What if the a constant equals 0.6 a = a = i Ai Fi
  106. 106. Deriving Knowledge from Data at Scale α • depends on the emphasis you want to place on the most recent data α
  107. 107. Deriving Knowledge from Data at Scale Ft+1 = a At + a(1- a) At - 1 + a(1- a)2At - 2 + ... Weights Prior Period a 2 periods ago a(1 - a) 3 periods ago a(1 - a)2 a= a= 0.10 a= 0.90 10% 9% 8.1% 90% 9% 0.9% Ft+1 = Ft + a (At - Ft) or w1 w2 w3
  108. 108. Deriving Knowledge from Data at Scale 115
  109. 109. Deriving Knowledge from Data at Scale Trend Seasonal
  110. 110. Deriving Knowledge from Data at Scale11
  111. 111. Deriving Knowledge from Data at Scale 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10
  112. 112. Deriving Knowledge from Data at Scale b0 b1 0 2 4 6 8 10 12 10 11 12 13 14 15 16 17 18 19 20 Best line! Intercept
  113. 113. Deriving Knowledge from Data at Scale deseasonalize • Reseasonalize
  114. 114. Deriving Knowledge from Data at Scale Deseasonalize Forecast Reseasonalize Actual data Deseasonalized data Example (SI + Regression)
  115. 115. Deriving Knowledge from Data at Scale12
  116. 116. Deriving Knowledge from Data at Scale 123 Year Sales ($ mil.) 2002 7 2003 10 2004 9 2005 11 2006 13 Linear Trend – Using the Least Squares Method: An Example Year t Sales ($ mil.) 2002 1 7 2003 2 10 2004 3 9 2005 4 11 2006 5 13
  117. 117. Deriving Knowledge from Data at Scale 124
  118. 118. Deriving Knowledge from Data at Scale12
  119. 119. Deriving Knowledge from Data at Scale 126 (Excel function: =log(x) or log(x,10)
  120. 120. Deriving Knowledge from Data at Scale12
  121. 121. Deriving Knowledge from Data at Scale SI SI
  122. 122. Deriving Knowledge from Data at Scale year raw index quarter four
  123. 123. Deriving Knowledge from Data at Scale130 The table below shows the quarterly sales for Toys International for the years 2001 through 2006. The sales are reported in millions of dollars. Determine a quarterly seasonal index using the ratio-to-moving-average method. Seasonal Index – An Example
  124. 124. Deriving Knowledge from Data at Scale13
  125. 125. Deriving Knowledge from Data at Scale13
  126. 126. Deriving Knowledge from Data at Scale13
  127. 127. Deriving Knowledge from Data at Scale deseasonalize
  128. 128. Deriving Knowledge from Data at Scale
  129. 129. Deriving Knowledge from Data at Scale • Class Discussions 15 minutes • Forecasting, continued (2/2) 45 minutes • Weka Tutorial 30 minutes • Break 10 minutes • Decision Trees and Random Forests 60 minutes • Hands on, decision tree in Weka 15 minutes
  130. 130. Deriving Knowledge from Data at Scale That’s all for tonight….

×