This talk is an introduction to Data Science. It explains Data Science from two perspectives - as a profession and as a descipline. While covering the benefits of Data Science for business, It explaints how to get started for embracing data science in business.
2. Agenda
What is Data Science?
What does Data Science promise for your business?
Investment in Data Science and ROI
Data Science Process
Data Science Roles
Infrastructure Requirements
Data Science Tools and Techniques
Where do I begin?
Developing Data Science Culture
Questions
3. What is Data Science?
Everything concerning Data
is in the purview of Data Science
4. What is Data Science?
Data science is a young inter-disciplinary field that uses
scientific principles, methods, processes, algorithms and
systems to extract knowledge and insights from data.
Data science involves Statistics at its core.
Data Science extends the field of statistics to
incorporate advances in computing with data
Apart from Statistics, Computer Science is another
major discipline that plays a major role in capturing,
managing and sharing data.
It is a driving force behind innovations is almost all
disciplines of Science.
This new approach is termed Data driven science.
7. The Data Science promise
Top Objectives of Successful Businesses
Increase profitability
Ensure customer satisfaction
Optimize productivity
Make your employees happy
Social and public responsibility
Businesses traditionally rely on intuition, creativity and
experience to fulfill these objectives.
This has been reflected by HIPPO phenomenon for
decades.
8. The Data Science promise
Without Data, you are just another person with an opinion
– Edwards Deming
Although, intuition, experience, etc. are important, these work
gets much better when supported with data.
Data Science helps you to
Understand your customers better by
Learning about their needs
Their struggles, their motivations, their habits and their
relationships to your product or service.
Use this understanding to create a better product and/or
service and turning that into profit.
9. The Data Science promise
Data science helps you to
See clearly how your business performs.
Understand dynamics of your business
Improve business processes
Discover new opportunities / products / services that
your customers need.
Discover new audiences for your current products /
services.
and much more...
10. The Data Science promise
If you manage to collect the right data and use it well,
You will be able to make better decisions more quickly
and more easily.
That will lead to a better product, happier
customers and eventually more revenue.
That’s what business data science is all about.
If you are among the first in your domain to embrace
data science, you can outsmart your competition.
11. Signs that You Should Invest in Data
Science
Your marketing budgets are growing, but your sales
numbers are not.
Your company is struggling with personalization
It’s taking too long for the sales team to score leads
You are unable to analyze your marketing ROI
You want the competitive edge without significantly
increasing your budget
Your competitors are already investing in Data Science
12. Data Science Investments
Human Resource
According to an estimate, good teams spend about 5% of
their total working hours with data and quantitative
research.
So, if you are working alone, that's around 2-3 hours a
week.
If you are a team of 50, then ideally you should have
one or two full-time dedicated people for Data Science
projects.
As your business grows, you may setup Data Science
division
13. Data Science Investments
Data Infrastructure
A data infrastructure is a digital infrastructure for
promoting data sharing and consumption.
It includes data assets, hardware, software and
processes.
It includes data ingestion and storage infrastructure
It includes data management, data security and data
privacy.
14. Data Science Investments
Analytics Infrastructure
Much of data science work involves computationally
intensive experiments.
Thus, Data scientists should be able to access large
machines/ specialized hardware for running
experiments or doing exploratory analysis.
They should also be able to easily use burst/elastic
compute on demand.
Data Scientists need software support for
communicating their findings to business
stakeholders.
15. Cloud Analytics
On-premises analytics solutions have challanges
Cost of infrastructure
Need for specialized skills
Time required to configure and maintain these
systems
Nonscalability
Cloud Analytics provides solution. Some major players
IBM Cognos analytics
Microfost Azure Stream Analytics
AWS Analytics
16. Success Stories
Southwest Airlines saved $ 100 million by reducing the
time its planes stood idle on the airstrip.
UPS, a logistics company, saved 38 million gallons of
fuel by optimizing its fleet.
$ 2 billion tax dollars saved by the Internal Revenue
Service by improving its ability to detect identity fraud
and improper payments.
Croma, a subsidiary of Tata sons used data science to
understand 360° view of its users and used it to give
personalized shopping experience to its online
customers and their conversions have significantly
improved.
And many more…
17. With Data in your possession,
You are sitting on a gold mine…
However, if you don't know this fact OR don’t know how
to extract it, you won't be able to benefit from it.
18. Data Science Process
The diagram shows the major phases of data science
process. The diagram presents the CRISP-DM methodology
19. Data Science Process
The six steps of a data science project
Data Collection
Data Storage
Data Preparation
Data Utilization
Business Analytics
Predictive Analytics
Developing Data Product
Communication, data visualization
Data-driven Decision
20. Data Collection
This is where many businesses fail. Too many companies collect
incomplete, unreliable data and everything they do after that is just
messed up.
Proper tracking and collection of data, and ensuring its quality is
crucial for every business doing data science.
What to collect?
It is important to decide the details of the data that must be
collected/ captured.
The general idea is to collect everything you can – because the
value of data can be realized any time in future.
However, the more data you capture, the more engineering time
you need to allocate to implement it, the slower your business
processes will be, the more complex your data infrastructure
becomes, and so on…
Also consider legal and ethical aspects!
21. Data Wrangling
Data wrangling is all about getting the data into the right
form that is suitable for feeding into the modeling and
visualization stages.
This activity involves variety of tasks from discovering
data to acquiring and transforming it into the form
where the Data that is ready to be processed.
The tasks following the data acquisition are also referred
to by different terms such as Data Munging or Data
Preprocessing.
22. Big Data
Big data is like teenage sex: everyone talks about it,
nobody really knows how to do it, everyone thinks
everyone else is doing it, so everyone claims they are
doing it.
- Dan Ariely
23. What is Big data?
Big data is a data set whose volume is beyond the ability of
commonly used hardware and software tools to capture, manage,
and process the data within a tolerable execution time.
They are gathered by information-sensing mobile devices,
remote sensing technologies, software logs, cameras,
microphones, RFID readers, and many such devices.
As a result, such datasets are continuously growing in size.
By 2020, there will be around 40 trillion gigabytes of data
90% of the data in the world today was created within just the
past two years.
Internet users generate about 2.5 quintillion bytes (2.5 million
terabytes) of data each day
24. Twitter
500 million tweets per day
Facebook
Facebook generates 4 petabytes of data per day.
Users generate 4 million likes every minute.
350 million photos are uploaded per day.
Instagram
The Like button is hit an average of 4.2 billion times/ day.
WhatsApp
In 2018, WhatsApp users sent 65 billion messages per
day
Almost every field
Some Examples
25. Characteristics of big data (3V’s)
In a 2001 research report, Gartner analyst, Doug Laney,
defined data growth challenges (and opportunities) as being
three-dimensional - increasing volume, velocity , and variety.
Data volume:
This is the primary attribute of big data. Most people
define big data in multi terabytes—sometimes petabytes.
Data variety
Big data is coming from a greater variety of sources than
ever before. Many of the newer ones are Web sources,
including logs, click-streams, and social media.
Data velocity
Big data can be described by its velocity or speed. The rate
at which new data is generated.
26. Data Analysis
Data Analysis is process for extracting value from Data.
This is where data science gets exciting. It’s a creative process.
Ask right Questions
It is important to ask right questions. They usually comes
from the management/ or other colleagues, who may
already have suspicions based on their experience.
Do Qualitative research
It’s important to understand the things concerning
business and its customers in detail. This can be achieved
through qualitative research, which in turn gives direction
to the useful investigations through data.
27. Three Major Business Applications
Business Analytics
It answers the questions of “what has happened in the
past?” and “where are we now?”
E.g. reporting, measuring retention, finding the right user
segments, funnel analysis, etc.
Predictive Analytics
It answers the question, “what will happen in the future?”
E.g. early warning, predicting the marketing budget you will
need in the next quarter, etc.
Data (Based) Product
A product that is built, and works using your data.
E.g. recommendation systems, image recognition, voice
recognition, etc.
28. SafetiPin is a map-based mobile phone application, which
leverages the power of big data to make our communities
and cities safer for women.
It provides safety-related information collected through
crowdsourcing.
The app captures data on 9 parameters (Lighting,
openness, visibility, people density, security in the area,
walk path, transportation, gender diversity, feeling in the
area), and uses it to compute and provide safety score, the
information on personal vulnerability to crime, in every
pocket of the city.
App utilizes this score ang integrates with big data sources
such as Google map to recommends Safest Route to
provide the best possible route in terms of safety.
29. Data Communication
This is the step where most data science projects fail.
To reap the benefits of Data Science, effective
communication of the findings is crucial.
It is necessary to build a culture where people can
communicate and use data. For this, everyone at your
company needs to be involved.
Business people should also educate data scientists by
helping them to create and deliver better presentations.
Communication should be as simple as it can be.
No fancy scientific words
No complicated charts
30. What People you need in your Team?
You data science team should feature
Best Data Engineers,
Best software developers, and
Best statisticians
They need to have domain knowledge to know the actual
business application of their data projects.
31. Data Science Roles: Data Engineer
The data engineer is someone who develops, constructs,
tests and maintains data architectures, such as
databases, data warehouses, data lakes and large-scale
processing systems.
Data engineers manage data of all sizes, and types. They
develop, deploy, manage, and optimize data pipelines
and infrastructure to transform and transfer data to data
scientists for querying.
Skills needed: SQL, Data bases, Data warehousing,
ETL, Big data tools, Building API’s
32. Data Science Roles: Data Analyst
Data analysts perform the following tasks
Data wrangling
Create Data visualizations and Dash boards
Analyze data to discover and interesting trends in the data
Presenting the results of analysis to business clients or
internal teams
Help other stakeholders to optimize their data utilization
Skills needed: Programming skills (SAS, R, Python),
statistical and mathematical skills, data wrangling, data
visualization tools like tableau/ Power BI
33. Data Science Roles: Data Scientist
A data scientist is a specialist having expertise in
Statistics and developing models, including predictive
models and machine learning models.
Data scientists can tackle more open-ended questions
by leveraging their knowledge of advanced statistics.
Data scientists bring an entirely new approach and
perspective to understanding data
Skills needed: Programming skills (SAS, R, Python),
statistical and mathematical skills, storytelling and data
visualization, Hadoop, SQL, machine learning, Big data
analytics.
34. Data Science projects can fail
Yes, that’s true!
Here are some of the reasons.
Not every manager is ready for this change.
Even a very well-executed data project can fail, just
because someone’s feelings or ego is hurt.
Answering the wrong question
Failure to integrate into business operations
Stakeholders disengaged
Benefits don’t justify the costs
35. Developing Data Science culture
Failures can be prevented by establishing a data-driven
company culture early on. As the company size
increases, it becomes harder to make the organization
data-driven.
It’s important that the managers develop the right
mindset.
It important that everyone in the organization
understands importance of data science.
Data professionals should hold frequent presentations
about their recent findings.
36. Data Strategy
Why Data Strategy?
If you don't have a data strategy, you won't have enough
information to make the right decisions. Having data
strategy is crucial to become a data-driven organization.
Without it
you will waste money on the wrong marketing
campaigns
you will have wrong product development plans
37. Where do I begin?
It is recommended to start with development of Data Strategy. For
this, following questions need to be answered
What are the right metrics to focus on? And how to figure it out?
How to collect and store the data. Which tools should you use?
Can you trust your data? And how can you make it trustworthy?
How to communicate the data in your organization efficiently?
Start with a simple data project that answers the basic questions
about your business.
Subsequently, as you recognize your customers’ needs, you may
initiate other projects such as Predictive modelling, and Machine
learning
38. Pick your first data project
Develop and use the Prioritization matrix.
39. Your first data project
Your first data project should be a simple project (feasible)
with an aim to understanding your own business and your
customers better (High business value)
In other words, Start with investing in business analytics and
simple reports.
This project answers the basic questions about your business,
such as
Who prefers what and why?
How to win customer loyalty?
Why a particular product failed?
And so on …