Más contenido relacionado



Embracing data science

  1. Department of Statistics The Maharaja Sayajirao University of Baroda
  2. Agenda  What is Data Science?  What does Data Science promise for your business?  Investment in Data Science and ROI  Data Science Process  Data Science Roles  Infrastructure Requirements  Data Science Tools and Techniques  Where do I begin?  Developing Data Science Culture  Questions
  3. What is Data Science? Everything concerning Data is in the purview of Data Science
  4. What is Data Science? Data science is a young inter-disciplinary field that uses scientific principles, methods, processes, algorithms and systems to extract knowledge and insights from data.  Data science involves Statistics at its core.  Data Science extends the field of statistics to incorporate advances in computing with data  Apart from Statistics, Computer Science is another major discipline that plays a major role in capturing, managing and sharing data.  It is a driving force behind innovations is almost all disciplines of Science.  This new approach is termed Data driven science.
  5. Data Science Discipline
  6. Data Science Profession
  7. The Data Science promise Top Objectives of Successful Businesses  Increase profitability  Ensure customer satisfaction  Optimize productivity  Make your employees happy  Social and public responsibility Businesses traditionally rely on intuition, creativity and experience to fulfill these objectives. This has been reflected by HIPPO phenomenon for decades.
  8. The Data Science promise Without Data, you are just another person with an opinion – Edwards Deming Although, intuition, experience, etc. are important, these work gets much better when supported with data. Data Science helps you to  Understand your customers better by  Learning about their needs  Their struggles, their motivations, their habits and their relationships to your product or service.  Use this understanding to create a better product and/or service and turning that into profit.
  9. The Data Science promise Data science helps you to  See clearly how your business performs.  Understand dynamics of your business  Improve business processes  Discover new opportunities / products / services that your customers need.  Discover new audiences for your current products / services. and much more...
  10. The Data Science promise If you manage to collect the right data and use it well,  You will be able to make better decisions more quickly and more easily.  That will lead to a better product, happier customers and eventually more revenue. That’s what business data science is all about. If you are among the first in your domain to embrace data science, you can outsmart your competition.
  11. Signs that You Should Invest in Data Science  Your marketing budgets are growing, but your sales numbers are not.  Your company is struggling with personalization  It’s taking too long for the sales team to score leads  You are unable to analyze your marketing ROI  You want the competitive edge without significantly increasing your budget  Your competitors are already investing in Data Science
  12. Data Science Investments Human Resource According to an estimate, good teams spend about 5% of their total working hours with data and quantitative research.  So, if you are working alone, that's around 2-3 hours a week.  If you are a team of 50, then ideally you should have one or two full-time dedicated people for Data Science projects.  As your business grows, you may setup Data Science division
  13. Data Science Investments Data Infrastructure A data infrastructure is a digital infrastructure for promoting data sharing and consumption.  It includes data assets, hardware, software and processes.  It includes data ingestion and storage infrastructure  It includes data management, data security and data privacy.
  14. Data Science Investments Analytics Infrastructure Much of data science work involves computationally intensive experiments.  Thus, Data scientists should be able to access large machines/ specialized hardware for running experiments or doing exploratory analysis.  They should also be able to easily use burst/elastic compute on demand.  Data Scientists need software support for communicating their findings to business stakeholders.
  15. Cloud Analytics On-premises analytics solutions have challanges  Cost of infrastructure  Need for specialized skills  Time required to configure and maintain these systems  Nonscalability Cloud Analytics provides solution. Some major players  IBM Cognos analytics  Microfost Azure Stream Analytics  AWS Analytics
  16. Success Stories  Southwest Airlines saved $ 100 million by reducing the time its planes stood idle on the airstrip.  UPS, a logistics company, saved 38 million gallons of fuel by optimizing its fleet.  $ 2 billion tax dollars saved by the Internal Revenue Service by improving its ability to detect identity fraud and improper payments.  Croma, a subsidiary of Tata sons used data science to understand 360° view of its users and used it to give personalized shopping experience to its online customers and their conversions have significantly improved. And many more…
  17. With Data in your possession, You are sitting on a gold mine… However, if you don't know this fact OR don’t know how to extract it, you won't be able to benefit from it.
  18. Data Science Process The diagram shows the major phases of data science process. The diagram presents the CRISP-DM methodology
  19. Data Science Process The six steps of a data science project  Data Collection  Data Storage  Data Preparation  Data Utilization  Business Analytics  Predictive Analytics  Developing Data Product  Communication, data visualization  Data-driven Decision
  20. Data Collection This is where many businesses fail. Too many companies collect incomplete, unreliable data and everything they do after that is just messed up. Proper tracking and collection of data, and ensuring its quality is crucial for every business doing data science. What to collect?  It is important to decide the details of the data that must be collected/ captured.  The general idea is to collect everything you can – because the value of data can be realized any time in future.  However, the more data you capture, the more engineering time you need to allocate to implement it, the slower your business processes will be, the more complex your data infrastructure becomes, and so on… Also consider legal and ethical aspects!
  21. Data Wrangling Data wrangling is all about getting the data into the right form that is suitable for feeding into the modeling and visualization stages. This activity involves variety of tasks from discovering data to acquiring and transforming it into the form where the Data that is ready to be processed. The tasks following the data acquisition are also referred to by different terms such as Data Munging or Data Preprocessing.
  22. Big Data Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it. - Dan Ariely
  23. What is Big data?  Big data is a data set whose volume is beyond the ability of commonly used hardware and software tools to capture, manage, and process the data within a tolerable execution time.  They are gathered by information-sensing mobile devices, remote sensing technologies, software logs, cameras, microphones, RFID readers, and many such devices.  As a result, such datasets are continuously growing in size.  By 2020, there will be around 40 trillion gigabytes of data  90% of the data in the world today was created within just the past two years.  Internet users generate about 2.5 quintillion bytes (2.5 million terabytes) of data each day
  24. Twitter  500 million tweets per day Facebook  Facebook generates 4 petabytes of data per day.  Users generate 4 million likes every minute.  350 million photos are uploaded per day. Instagram  The Like button is hit an average of 4.2 billion times/ day. WhatsApp  In 2018, WhatsApp users sent 65 billion messages per day Almost every field Some Examples
  25. Characteristics of big data (3V’s) In a 2001 research report, Gartner analyst, Doug Laney, defined data growth challenges (and opportunities) as being three-dimensional - increasing volume, velocity , and variety. Data volume:  This is the primary attribute of big data. Most people define big data in multi terabytes—sometimes petabytes. Data variety  Big data is coming from a greater variety of sources than ever before. Many of the newer ones are Web sources, including logs, click-streams, and social media. Data velocity  Big data can be described by its velocity or speed. The rate at which new data is generated.
  26. Data Analysis Data Analysis is process for extracting value from Data. This is where data science gets exciting. It’s a creative process.  Ask right Questions It is important to ask right questions. They usually comes from the management/ or other colleagues, who may already have suspicions based on their experience.  Do Qualitative research It’s important to understand the things concerning business and its customers in detail. This can be achieved through qualitative research, which in turn gives direction to the useful investigations through data.
  27. Three Major Business Applications  Business Analytics It answers the questions of “what has happened in the past?” and “where are we now?” E.g. reporting, measuring retention, finding the right user segments, funnel analysis, etc.  Predictive Analytics It answers the question, “what will happen in the future?” E.g. early warning, predicting the marketing budget you will need in the next quarter, etc.  Data (Based) Product A product that is built, and works using your data. E.g. recommendation systems, image recognition, voice recognition, etc.
  28.  SafetiPin is a map-based mobile phone application, which leverages the power of big data to make our communities and cities safer for women.  It provides safety-related information collected through crowdsourcing.  The app captures data on 9 parameters (Lighting, openness, visibility, people density, security in the area, walk path, transportation, gender diversity, feeling in the area), and uses it to compute and provide safety score, the information on personal vulnerability to crime, in every pocket of the city.  App utilizes this score ang integrates with big data sources such as Google map to recommends Safest Route to provide the best possible route in terms of safety.
  29. Data Communication This is the step where most data science projects fail. To reap the benefits of Data Science, effective communication of the findings is crucial.  It is necessary to build a culture where people can communicate and use data. For this, everyone at your company needs to be involved.  Business people should also educate data scientists by helping them to create and deliver better presentations.  Communication should be as simple as it can be.  No fancy scientific words  No complicated charts
  30. What People you need in your Team? You data science team should feature  Best Data Engineers,  Best software developers, and  Best statisticians They need to have domain knowledge to know the actual business application of their data projects.
  31. Data Science Roles: Data Engineer The data engineer is someone who develops, constructs, tests and maintains data architectures, such as databases, data warehouses, data lakes and large-scale processing systems. Data engineers manage data of all sizes, and types. They develop, deploy, manage, and optimize data pipelines and infrastructure to transform and transfer data to data scientists for querying. Skills needed: SQL, Data bases, Data warehousing, ETL, Big data tools, Building API’s
  32. Data Science Roles: Data Analyst Data analysts perform the following tasks  Data wrangling  Create Data visualizations and Dash boards  Analyze data to discover and interesting trends in the data  Presenting the results of analysis to business clients or internal teams  Help other stakeholders to optimize their data utilization Skills needed: Programming skills (SAS, R, Python), statistical and mathematical skills, data wrangling, data visualization tools like tableau/ Power BI
  33. Data Science Roles: Data Scientist A data scientist is a specialist having expertise in Statistics and developing models, including predictive models and machine learning models.  Data scientists can tackle more open-ended questions by leveraging their knowledge of advanced statistics.  Data scientists bring an entirely new approach and perspective to understanding data Skills needed: Programming skills (SAS, R, Python), statistical and mathematical skills, storytelling and data visualization, Hadoop, SQL, machine learning, Big data analytics.
  34. Data Science projects can fail Yes, that’s true! Here are some of the reasons.  Not every manager is ready for this change. Even a very well-executed data project can fail, just because someone’s feelings or ego is hurt.  Answering the wrong question  Failure to integrate into business operations  Stakeholders disengaged  Benefits don’t justify the costs
  35. Developing Data Science culture Failures can be prevented by establishing a data-driven company culture early on. As the company size increases, it becomes harder to make the organization data-driven.  It’s important that the managers develop the right mindset.  It important that everyone in the organization understands importance of data science. Data professionals should hold frequent presentations about their recent findings.
  36. Data Strategy Why Data Strategy? If you don't have a data strategy, you won't have enough information to make the right decisions. Having data strategy is crucial to become a data-driven organization. Without it  you will waste money on the wrong marketing campaigns  you will have wrong product development plans
  37. Where do I begin? It is recommended to start with development of Data Strategy. For this, following questions need to be answered  What are the right metrics to focus on? And how to figure it out?  How to collect and store the data. Which tools should you use?  Can you trust your data? And how can you make it trustworthy?  How to communicate the data in your organization efficiently? Start with a simple data project that answers the basic questions about your business. Subsequently, as you recognize your customers’ needs, you may initiate other projects such as Predictive modelling, and Machine learning
  38. Pick your first data project Develop and use the Prioritization matrix.
  39. Your first data project Your first data project should be a simple project (feasible) with an aim to understanding your own business and your customers better (High business value) In other words, Start with investing in business analytics and simple reports. This project answers the basic questions about your business, such as  Who prefers what and why?  How to win customer loyalty?  Why a particular product failed? And so on …
  40. Questions?
  41. You can write to me
  42. Thanks!