Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
The Data Unicorns
1. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 1
STKI Summit 2019
THE DATA
UNICORNS
2. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 2
Main Themes
2019
for Data and Analytics
DATA-CENTRIC THE DATA DEBT
01
02
03
06
05
08
07
04
Applications, processes &
decisions becoming data-
centric
Data Catalogs proliferation
But Lack of data ownership
and strategy remains
REAL PROBLEMS
Use of Design Thinking
and Empathy concepts to
solve REAL problems
DATA LITERACY
The data “language” in
organizations will
increase
DATA SCIENCE FOR ALL
AI, ML and
Automation will
empower citizen DS
DATA PRODUCTS
“Data product
managers” will manage
the entire lifecycle
DATA TEAMS
Agile-like teams will
collaborate around
data “products”
AUTOMATION
Automation in data
management data
science processes
STKI Summit 2019
3. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 3
ARE WE READY FOR A DATA-CENTRIC REALITY?
Intelligent Automation
Seamless Experiences
AI-fueled processes
4. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 4
Payroll Sales Call center
Software Software Software
Infra Infra Infra
Developers Developers Developers
Users Users
Silo Silo Silo
Application Centric Computing
(systems of transactions)
Customer Facing Computing
(systems of engagement)
DATA Centric Computing
(systems of decisions)
Automation
Revolution
(Preemptive)
AI/ML/DL Data Science
Intelligence
Systems
Human/ Machine Workforce
IoT Process
Engineering
UsersUsers
Digital
(forced)
Transformation
Channels
APIs
AGILE
Customer
Journey
UX
Marketing
Automation
RPA
Data Analytics
5. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 5
of organizations have adopted or have plans to
adopt AI in the next 5 years (IDC)
AI-Driven companies will steal
$1.2 Trillion from competitors
by 2020
The
race
for AI
6. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 6
63% of CEOs think AI will have a
greater impact than the internet
Source: PWC
The
race
for AI
7. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 7
Like it or not, the AGE of AUTOMATION is here
Ratio of human-machine working hours – 2018 vs. 2022
human machinehuman machine
8. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 8
Processes and business operations rely on data
This means future businesses will be
DATA CENTRIC
9. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 9
Source: PWC
10-year old gap!
WHAT DO CEOs SAY ABOUT THEIR OWN DATA GAP?
10. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 10
3 reasons for this gap:
lack of
analytical
Mindset
Data Siloing
Poor data
reliability
11. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 11
DATA DRIVEN
is more of a cultural thing
Being data-driven
means that people’s
decisions & actions
rely on data
12. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 12
DATA LITERACY* is a new
language, and we all need to
be fluent in it
*Data literacy: the ability to read, write and communicate data in context
13. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 13
Source: The data literacy project
14. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 14
POOR DATA LITERACY IS A MAJOR ROAD BLOCK FOR CDOs
Source: Gartner CDO Survey
15. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 15
thedataliteracyproject.org
A global community dedicated to building a data-literate culture for all
16. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 16
The rise of CDOs
Source: Gartner CDO Survey
29 FTEs reporting
directly to CDOs
25% increase in
CDOs funding
Will be Mission-critical
function in 75% of orgs.
17. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 17
Source: Gartner CDO Survey
Risk
Mitigation
Cost
Cutting
Value Creation
27% 28% 45%
CDOs time allocation:
18. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 18
63% 28%
DO YOU HAVE A CDO (DATA) IN PLACE?
Source: STKI DATA Survey, 2019Source: Gartner CDO Study YES
19. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 19
CDO survey: Israel
STKI CX
Survey
2019
20. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 20
CDO survey: Israel
STKI CX
Survey
2019
21. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 21
ONE CDO STRUCTURE >DOESN’T< FIT ALL
24. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 24
WE WANT
DATA
DEMOCRACY
25. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 25
WE WANT
DATA
DEMOCRACY
WHO ARE WE? DATA SCIENTISTS!
WHAT DO WE WANT?
WHEN DO WE WANT IT? NOW!!!
SELF SERVICE!
26. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 26
…and then came the DATA LAKE
27. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 27
MYTH: REALITY:
Data lakes are the answer to
data democratization and
self service.
Let’s upload a lot of data
into the lake as quickly as
possible.
28. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 28
This actually created a big problem.
Data is not harmonized, data lakes are
full of isolated data islands:
Organizations widened their data debt
29. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 29
The DATA DEBT
64% Duplicate data
Missing data - Fields that should
contain values, but do not.
25% data entry errors
No single version of the truth
30. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 30
“80% of data science is
cleaning the data
20 % is complaining
about cleaning the
data”
Source: Kaggle State of Data Science Survey
WHAT ARE DATA SCIENTISTS’ MAIN CHALLENGES?
1. Dirty data
2. No access
3. Privacy issues
32. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 32
20% 33%600B$ 12%
of average
data set is
dirty
is the annual
cost to the
U.S economy
due to bad
data
of company
projects fail
because of
weak data
is the average
annual
revenue loss
(Sources: Springer Link; IBM )
BAD DATA = BAD DECISIONS
33. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 33
STORE
ACCESSDEPLOY
PREPARE
MODEL
6 1
2
3
5
4 Store the data:
DW/ DL/ Data Mart/
Logical DW
Transform
Clean
Understand
DATA CENTRIC ARCHITECTURE
6.DEPLOY 1.ACCESS
2.INGEST
3.STORE4.PREPARE
Model
5.MODEL
Learn
Train
GOVERN INGEST
Run code in operational
processes
Systems of Decisions
Data Dictionary
34. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 34
From Waterfall to Agile, Iterative Processes
35. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 35
WHAT’S THE RIGHT BALANCE
for a DATA-CENTRIC-READY BUSINESS?
36. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 36
Define a data governance strategy
Enforce a “data catalog” policy
Define harmonized data definitions
Create a central COE for DS teams
HARMONIZED DATA PLATFORM
AGILE TEAM
AUTOMATION (DATAOPS)
1
2
3
Create data teams/ squads
Product owner
Automate as many processes as
possible in the DS value chain
Use DataOps/ MLOps as reference
37. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 37
Define a data governance strategy
Enforce a “data catalog” policy
Define harmonized data definitions
Create a central COE for DS teams
HARMONIZED DATA PLATFORM
AGILE TEAM
AUTOMATION: DATAOPS
1
2
3
Create data teams/ squads
Product owner
Automate as many processes as
possible in the DS value chain
Use DataOps/ MLOps as reference
38. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 38
What is stopping you from becoming data centric?
Source: Atscale Big Data Maturity Report
39. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 39
Do you need a data catalog?
Yes.
40. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 40
DISCOVERY
Easily Search
and browse
data
WHY DO
YOU NEED
A DATA
CATALOG?
ENABLES
Self service
to DS and
analysts
TAGGING
Data is
described
technical &
business
CURATION
Self service
to DS and
analysts
FEEDBACK
Rating and
reviews by
users
BALANCE
Between
the need
to control
and to
consume
AWARENESS
Be informed
of relevant
and available
data
HARMONIZE
Enable single
version of the
truth
41. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 41
Through to the end of 2022, manual tasks in
data management will be cut by 45% thanks
to ML and automated service-level
management (Gartner)
AUTOMATION
in
Data Management
Cleaning, wrangling, transforming, and loading
42. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 42
Define a data governance strategy
Enforce a “data catalog” policy
Define harmonized data definitions
Create a central COE for DS teams
HARMONIZED DATA PLATFORM
AGILE TEAM
AUTOMATION: DATAOPS
1
2
3
Create data teams/ squads
Product owner
Automate as many processes as
possible in the DS value chain
Use DataOps/ MLOps as reference
43. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 43
WANTED: Analytics Engineer
Research tasks
Build/plan models
Statistical languages
Prototype ML
models
DATA
ENGINEER
BUSINESS
ANALYST
DATA SCIENTIST
Ingestion
Storage
Transformation
Preparation
Virtualization
Enrichment
Business Logic
Understand
the impact to
the business
R, Python
Hadoop
Spark
Kafka
ML
Data Visualization
Unstructured data
Business understanding
Communication skills
Storytelling skills
DB Administration
Storage
Visualization
SQL
Data Pipeline
Business understanding
Communication skills
Data Architecture
NoSQL
Data Integration, ETL, APIs
44. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 44
Data Engineer
Business Analyst Operations Data Scientist
45. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 45
Define a data governance strategy
Enforce a “data catalog” policy
Define harmonized data definitions
Create a central COE for DS teams
HARMONIZED DATA PLATFORM
AGILE TEAM
AUTOMATION: DATAOPS
1
2
3
Create data teams/ squads
Product owner
Automate as many processes as
possible in the DS value chain
Use DataOps/ MLOps as reference
46. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 46
THE GOAL:
Managing data products
47. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 47
DATAOPS IS NOW A “THING”
48. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 48
The next evolution: MLOps (“A/B Testing” for DS)
ML training (a.k.a model generation, model build or model fit) that generates the model
ML inference (a.k.a prediction, scoring, or model serve) that generates the insights.
50. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 50
Source: EY
Chatbots,
NLP/NLG
and RPA.
Chatbots,
NLP/NLG.
IPA, ML,
NLP/NLG,
RPA
IPA (Intelligent
process automation),
ML and RPA
Deep Learning, ML
and IPA (Intelligent
process automation)
51. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 51
GNS HEALTH: DISCOVERING CAUSAL LINKS
AGRICULTURE : FARMERS RECOMMENDATIONS
GNS applies ML to find overlooked relationships in
patients’ health records. It creates hypotheses to
explain it and then suggests which are most likely.
Result: GNS uncovered a new drug interaction
hidden in unstructured patient notes.
AI system provides real-time recommendations for
farmers on how to increase productivity (which
crops to plant, where to grow, nitrogen in soil…)
Result: farmers happy about the crop yields
obtained with AI’s guidance.
AI solution that improves accuracy of fraud detection.
Monitors millions of transactions daily, purchase
location, customer behavior, IP addresses… to identify
patterns that signal possible fraud.
DANSKE BANK: AI FOR FRAUD DETECTION
AI & ML USE CASE EXAMPLES
Result: Improved fraud detection rate by
50%, decreased false positives by 60%.
Investigators can concentrate efforts on
flagged transactions.
52. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 52
OCEAN MEDALLION
A “LOVE BOAT” EXPERIENCE
DANSKE BANK: AI FOR FRAUD DETECTION
Instead of just alleviating the “friction”
of typical travel experiences (lines,
room keys, paying for things) it will use
data to anticipate what you want to
do, eat, and see.
The medallion can be used to pay;
unlock the door to your room as you
approach; can be used on the ship’s
gambling platform; provide
recommendations based on preferences
53. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 53
35% consider Machine learning models to be ‘black
boxes’ (but feel the models can be explained by
experts – “explainers”).
10% of the participants are confident of explaining
most or all models.
58. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 58
42% say that lack of trust prevents
them from using digital services
Source: Sitra’s 2018 four country survey (Europe: Finland, Netherlands, France, Germany)
Trust is built by having the power to
influence how your data is used
In a survey for IBM, 75 percent of respondents said they will
not buy a product from a company – no matter how great the
product – if they don’t trust that company to protect their data
“Give me your data”
59. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 59
• Build a data catalog as a “data-lake gatekeeper”
• Tackle point-specific data quality projects
• Assign mixed data teams and an “agile way of
working” for specific dynamic analytic
• Automate as much as possible!
• Define key business questions:
“Start with the problem, not the data”
• Design a data governance strategy
• Establish CDO-IT-LOBs collaborative processes
• Focus on promoting data literacy
• Implement DevOps/DataOps principals
60. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 60
Einat Shimoni
EVP & Senior Analyst
STKI