Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

How Data Science Can Grow Your Business?


Eche un vistazo a continuación

1 de 42 Anuncio

How Data Science Can Grow Your Business?

Descargar para leer sin conexión

What is data science?
How is it used in the industry?
DS methodology and life cycle
Who are the Data-team members?
Limitations and caveats
(**Google slides upload didn't go well)

What is data science?
How is it used in the industry?
DS methodology and life cycle
Who are the Data-team members?
Limitations and caveats
(**Google slides upload didn't go well)


Más Contenido Relacionado

Presentaciones para usted (16)

Similares a How Data Science Can Grow Your Business? (20)


Más reciente (20)

How Data Science Can Grow Your Business?

  1. 1. How Data Science Can Grow Your Business? Le-Wagon talk ,Tel-Aviv 2018
  2. 2. Hi! I am Noam Cohen Lead Data-Scientist, 2
  3. 3. Talk agenda 3 ◎What is data science? ◎How is it used in the industry? ◎DS methodology and life cycle ◎Who are the Data-team members? ◎Limitations and caveats
  4. 4. 1. What is Data science? 4
  5. 5. It is... 5
  6. 6. It’s not! 6
  7. 7. “ Data Science (Wikipedia): An interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured 7
  8. 8. Data Science vs. Statistics ◎ The term data scientist was originally coined by a statistician, trying to rebrand statisticians (Chien-Fu 1998) ◎ Statistics vs. DS - Data models vs. Algorithmic modeling (Leo Breiman 2001) ◎ Data Science = Aggr(`stats`,`advanced computing`,`hacking`,`business logic`,`math`,`domain knowledge`,`data analysis`) 8
  9. 9. Demystifying data science ◎ DS Purpose - achieving ‘Data Driven Decision making’ (basing decisions on data with certain confidence) 9
  10. 10. Buzzwords terminology 10 Data Science (DS) The science of recognizing and utilizing patterns in data in order to develop actionable insight and confidence for decisions. Artificial Intelligence (AI) Any technique which enables computers to mimic human behaviour Machine Learning (ML) Subset of AI techniques which use statistical methods to enable machine -tasks to improve with experience Deep learning (DL) Subset of ML which allows in certain conditions to model the data with less human intervention
  11. 11. 2. How is it used in the industry? Typical use cases and market overview 11
  12. 12. Why should I use DS in my business? ◎ Derive insights on business challenges ○ Sales ○ Pricing ○ Marketing ○ Churn ◎ Improve user experience ○ Faster ○ Personalized ○ Accurate ◎ Automate cumbersome routines involving human labor 12
  13. 13. Data science business - where and how 13
  14. 14. Marketing ◎ Advertisement targeting ◎ User profiling ◎ Targeting direct marketing ◎ Churn ◎ Causal modeling ◎ Optimized viral marketing 14
  15. 15. Sales ◎ Discount offering ◎ Demand forecasting ◎ Dynamic pricing ◎ Product bundling ◎ Sales monitoring and investigation ◎ Leads discovery and prioritization ◎ Upselling 15
  16. 16. Transportation ◎ Customer wait time estimation ◎ Recommending driver- location via heatmap ◎ Surge pricing (Geosurge) ◎ Traffic and demand visualization ◎ Drive duration estimation 16
  17. 17. Israel companies overview 17
  18. 18. Israel companies overview 18
  19. 19. Israel companies overview 19
  20. 20. 3. How should I use it? Methodology and DS lifecycle 20
  21. 21. Methodology - preliminaries 21 Digital service with growing user community
  22. 22. Methodology - preliminaries 22 Basic analytics (no need for AI/ML) Database instrumentation and data structuring ◎ ** If needed, create a rule-based system of expert-defined thresholds as the ‘AI’ backend and continue to gather data
  23. 23. Methodology - CRISP DM 23 * Cross Industry Standard Process for Data Mining (CRISP-DM)
  24. 24. Methodology - Business understanding 24 Problem definition & Business understanding ◎ Define business targets and qualitative success metrics ◎ Asses risks, costs, benefits, data- resources ◎ Project to data science subtasks and identify the class of the problems ◎ Plan the project - estimate requirements, timeline and budget
  25. 25. Methodology - Data understanding 25 Data understanding ◎ Refine initial data and enrich with if needed ◎ Match data to business problem ◎ Describe and explore the data ○ Spot anomalies ○ Basic amounts and value types ◎ Verify data quality ○ Missing data ○ Collection errors/biases anomalies? outliers?
  26. 26. Methodology - Data Preparation 26 Data Preparation ◎ Clean data ○ Correct errors ○ Fill missing data ◎ Select right data ○ Representative ○ Data partitioning - train/test/hold- out ◎ Format data ◎ Beware of “leaks” Source: KDNuggets Poll 2003
  27. 27. Methodology - Modeling 27 Modeling ◎ Build cost/risk target to optimize ◎ Understand models assumptions and check data compatibility ◎ Build model and optimize parameters ◎ Generate test design ◎ Assess model on provided data
  28. 28. Methodology - Evaluation 28 Evaluation ◎ Analyze model performance and summarize results ○ New insights ○ A/B testing ○ Validation cases ◎ Error analysis ◎ Prediction interpretability ◎ Robustness and maintainability of model ◎ Business related performance - cumulative response and lift curves
  29. 29. Methodology - Deployment 29 Deployment ◎ Integrate prototype into productions system ◎ Implement software features inspired by the data-mining process ◎ Plan model maintenance and support
  30. 30. Methodology - CRISP-DM 30
  31. 31. 4. Who should I recruit? Building your data team 31
  32. 32. Unicorn fairytale Data science is actually comprised of multiple disciplines. Typically, a single creature cannot manage the engineering process, lead modeling efforts, coordinate the product roadmap, and articulate results to stakeholders. 32
  33. 33. The magnificent data warriors ★ Descriptive and conditional statistics ★ Error analysis ★ Finding sense in results and monitoring production model performance ★ Feature engineering and formalization of prior knowledge ★ Domain expertise ★ Validation ★ Excel, SQL, DB, R (Scripting), Statistics 33 AI Analyst
  34. 34. The magnificent data warriors ★ Machine learning and statistical analysis ★ Experiment design and research ★ Familiar with Big Data technologies ★ Dev foundations - Pipelines, testing, performance optimization ★ Storytelling and visualization ★ Feature engineering ★ Bias and leakages discovery ★ Generalization and overfitting ★ Python, R, Matlab, SQL, OOP, Spark, Pig, Hive 34 Data Scientist
  35. 35. The magnificent data warriors ★ Data orchestration and system architecture ★ Scaling with Big Data technologies ★ Database maintenance and data storage ★ Production processes - code deployment, optimization and testing 35 Data Engineer ★ OOP and functional programming, Python/Java/Scala/Ruby/Closure, Spark, Hadoop, Pig, Hive, DB & SQL, Jenkins, Luigi/Airflow
  36. 36. The magnificent data warriors ★ Setting goals ★ Tracking progress ★ Coordinates between team members ★ Strong understanding of data mining , evaluation metrics and statistics ★ Deliver results to stakeholders ★ Leader 36 Manager
  37. 37. Who & where? 37
  38. 38. Other options Let your developers carefully integrate Data science licensed APIs for predictive modeling in product. Outsource DS task to a consulting company 38
  39. 39. 5. Limitations and caveats Setting realistic expectations 39
  40. 40. Limitations ◎ No magic - when there is no predictive information in data ◎ No 100% ◎ No hidden golden feature ◎ For tomorrow, it is impossible ◎ Tasks with subjective nature are hard ◎ Outdated data and outdated models ◎ Train and test data discrepancies 40
  41. 41. Thanks! Any questions? You can find me at: 41
  42. 42. References 42 ◎ Foster Provost and Tom Fawcett. 2013. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking (1st ed.). O'Reilly Media, Inc. ◎ ◎ ◎ ◎ artificial-intelligence-and-machine-learning ◎ ◎ ◎ presentations-on-slideshare ◎ ◎ team-498041b88dae ◎

Notas del editor

  • In data models you assume to know in some sense the prediction function and the types of interactions between predictor variables. Then you only need to seek for the optimal settings (params) for the model to fit the data.

    In algorithmic modeling you assume that the function is an unknown box and you let an algorithm and the data to find out the prediction function and the variables.
  • Optimove’s Customer Marketing Cloud automatically schedules, executes and evaluates highly individualized marketing campaigns. helps marketers retarget ads only to website visitors most likely to make a purchase on the site.

    Datorama’s - process of mapping new sources of marketing information to generate enhanced insights for decision-makers.

    Predictive advertisement targeting
    What’s predicted: which female customer will have a baby in coming months, which ad each customer is most likely to click
    What’s done about it: suggests relevant offers for soon-to-be Parents, display best add

    Targeting direct marketing
    What’s predicted: which customers will respond to marketing contact
    What’s done about it: contact customers more likely to respond

    What’s predicted: which customers will leave
    What’s done about it: retention efforts targeting at risk customers

    Causal modeling
    predictive modeling to target advertisements to consumers.
    Was this because the advertisements influenced the consumers to purchase? Or did the predictive models simply do a good job

    Viral marketing
    Recognize influencers and seed them with free products
    they will cause an increase in the likelihood that the people they know will purchase the product.

  • Gong “ shining the light on their sales conversations.” Automatically record, transcribe and analyze all “sales calls, demos, and meetings so sales teams can scale the effectiveness of their sales conversations.”

    Conversica uses AI to automate “routine business conversations in a human way.” They sell an automated sales assistant that “engages, qualifies and follows-up with sales leads via human-like, two-way email conversations.” The idea is that salespeople can talk to the right people at the right time, while AI does the heavy lifting the rest of the time.
    Demand forecasting (strawberry pop-tarts and beer in hurricane (NY - TIMES 2004)
    What’s predicted: products to be consumed before an event (such as hurricane)
    What’s done about it: pricing, supply

    Upselling and cross-selling
    What’s predicted: identify which of your existing clients are more likely to buy a better version of what they currently own (up-sell).The net effect is an increase in revenue and a drop in marketing costs.

    Predicting which leads are most likely to be converted into a deal, while considering the geography, size of a company, and titles, to engagement such as signing up for a trial or downloading a white paper.

  • Then - Uber was originally started as a black car-hailing service: UberCab, in San Francisco.
    Now - closely monitor which features of the Service are used most, to analyze usage pattern.Predict everything from the customer’s wait time, to recommending where drivers should place themselves via heatmap in order to take advantage of the best fares and most passengers.
    Dynamic pricing is similar to the pricing strategy used by hotels and flights for their weekend or holiday fares and rates – except Uber leverages predictive modeling in real-time based on traffic patterns, supply and demand.
  • AI - Brain inspired programming
    ML - data driven optimization
  • Simple business questions -
    User profile (age, gender, background etc.)
    How pays more and for what product
  • Iterative and very difficult step

    Be able to tell what is unrealistic or ill defined

    If data is good, be patience for vaguely defined problems
  • Do not economize on this phase
    The earlier you discover issues with your data the better (yes, your data will
    have issues!)

    Data understanding leads to domain understanding, it will pay off in
    the modelling phase

    Do not trust data quality estimates provided by your customer
    Verify as far as you can, if your data is correct, complete, coherent,
    deduplicated, representative, independent, up-to-date, stationary

    Investigate what sort of processing was applied to the raw data

    Understand anomalies and outliers
  • Data understanding and preparation will usually consume half or more of your project time!

    converting data to tabular format
    Removing or inferring missing values,
    converting data to different types.
    Scaling and normalizing
    Some data mining techniques are designed for symbolic and categorical data, while others handle only numeric values.

  • Whenever possible, peek inside your model and consult it with
    domain expert
    • Assess feature importance
    • Run your model on simulated data

  • Cumulative response curves - plot the hit rate (tp rate; y axis). You return a list ranked by your model, and you check your accuracy vs. the change in the size of the list. the percentage of positives correctly classified, as a function of the percentage of the population that is targeted (x axis). So, conceptually as we move down the list of instances ranked by the model, we target increasingly larger proportions of all the instances.

    Intuitively, the lift of a classifier represents the advantage it provides over random
    guessing. The lift is the degree to which it “pushes up” the positive instances in a list above the negative instances
  • \
  • Analysts monitor processes, evaluate data quality, and monitor production model performance. These steps seem relatively routine but when you realize the fact that a model is never “complete” and will always require some oversight then appointing an analyst to manage the process makes sense. This allows your more senior assets to focus on innovation instead of maintenance.
  • Data Scientist then owns the modeling process. Generally, they take input parameters from product or other team leads in order to understand the model’s business objective. They then work to articulate requirements to the engineers and other stakeholders. Once these criteria have been defined, the process of building tests, models, and evaluating performance begins.
  • Data Engineers are responsible for building and maintaining the technical infrastructure required in order do modeling, predictions, and analysis. The engineers create and maintain databases, machine learning pipelines, and production processes. Without having properly stored data, modeling processes, and the ability to serve predictions in production a Data Scientist is essentially useless.
  • As the data team and number of models grows, the need for a Data Science Manager appears. This person coordinates the quants, devs, and analysts as well as manages external demand of the data science team. The Data Science Manager essentially guides the process, allocates resources, and occasionally shields the team from ad hoc requests so they are able to achieve their primary objectives.
  • Ignoring methodology and overlooking phases lead to fragile insights and unreliable products
  • \