Más contenido relacionado La actualidad más candente (20) Similar a Minne analytics presentation 2018 12 03 final compressed (20) Minne analytics presentation 2018 12 03 final compressed1. 1
The Road Ahead
Artificial Intelligence, Machine Learning, Data
Analytics, and Visualization
Bonnie K. Holub, Ph.D., Principal Data Scientist
Midwest Geo Data Science Lead
December 3, 2018
2. 2
Introductions: Bonnie Holub, PhD
Created over $1B value for
companies
PhD Artificial Intelligence
Career: correlating disparate sets of
Big Data for actionable results
Entrepreneur: Baby-Time.com, KPMI,
Advenitum Labs, ArcLight Inc.
Researcher & Academic:
University of Minnesota,
University of St. Thomas Graduate
Programs in Software,
Carnegie Mellon University
Business Professional:
Teradata, Cognizant,
Honeywell, PwC, Korn Ferry,
Ucare, Object Partners
3. 3 ©2018 Teradata
• Agenda Item 1
• Agenda Item 2
• Agenda Item 3
• Agenda Item 4
• Agenda Item 5
• Agenda Item 6
• Agenda Item 7
• Agenda Item 8
The road ahead: where are we coming from?
4. 4
Abstract
“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it,
everyone thinks everyone else is doing it, so everyone claims they are doing it...”
Professor Dan Aierly (Duke University)
This talk will discuss working examples of how some of the 12 million worldwide
Teradata users are contributing to $10 trillion dollars worth of revenue through 11 trillion
annual queries utilizing just under one zettabyte of data. She will cover hyper-
segmentation of large data sets, fraud detection, preventive maintenance and
techniques working systems employ to dig deep, aim high and manage operations at
scale.
14. 14
Exponential Growth
Source: O’Keefe, Brian, “The Smartest (or the Nuttiest) Futurist on Earth,”
Fortune, 2007/05/14, http://fortune.com/2007/05/14/ray-kurzweil-
innovation-artificial-intelligence/.
15. 15
Exponential Growth Detailed
• Today’s smartphone has the same
computing power as the whole US
government in 1983.
• 3D printing is the only technology where a
more complex object doesn’t cost more to
make.
• 6 US states now have licensed
autonomous vehicles (cars that drive
themselves).
• The average lifespan of an S&P 500
company has gone from 67 years in the
1920’s to 12 years today.
• Changes to autonomous cars means that
a current 3 year old will not get a drivers
license (as cars will drive themselves in 14
years) and it will prevent 30,000 road
deaths in the US
Source: https://www.evolutionpartners.com.au/exponential-growth-vs-linear-thinking-in-management-teams.html,
downloaded 2018/02/09
16. 16
Oxford Study - The Future of Employment: how
susceptible are jobs to computerization?
Source: Frey, Carl Benedikt, Osborne, Michael A. “The Future of Employment: how susceptible
are jobs to computerization?”, Oxford Martin School, 9/17/2013.
FIGURE III. The distribution of BLS 2010
occupational employment over the probability of
computerisation, along with the share in low,
medium and high probability categories. Note
that the total area under all curves is equal to
total US employment.
19. 19
Context Summarized
• There is a lot of hype.
• There are various (data based) predictions.
• There will be vast business and social upheaval due to accelerating change.
Our challenge: ride the wave of change, don’t be swamped by it.
(I promise you it will be thrilling!)
© 2018 Bonnie K. Holub
20. 20 ©2018 Teradata
• Agenda Item 1
• Agenda Item 2
• Agenda Item 3
• Agenda Item 4
• Agenda Item 5
• Agenda Item 6
• Agenda Item 7
• Agenda Item 8
The road ahead: where are we coming from?
The road ahead: where are we headed?
The road ahead: how can we get there first (and safely)?
21. 21
Practical Steps to Take Themes:
• Hiring/Onboarding/
Retaining Talent
• Business Intimacy
• Balance
• Outsourcing
• Attitudes
• Leadership
• Problem Choice
• Open source
Alexander Linden, Carlie Idoine, Peter Krensky, Neil Chandler , “15 Inisghts for
Managing Data Science Teams,” Gartner, 2018/10/19.
22. 22
Data Science Discipline Needs &Expertise Required
Statistical Skill
Usability
Knowledge (HCI)
Technology Skills
(including a
Historical
Perspective)
Business
Understanding
Implementation
Expertise
(Ops Savvy)
© 2018 Bonnie K. Holub
Arun Batchu & Bonnie Holub, Private Conversations, 2018/11/05.
Discipline needs:
• Best Practices
/Playbooks
• Implementation
Patterns
• Explanation AI
(DARPA)
23. 23
MSP Leaders to Folow
• Arun Batchu
• Sona Maniyan
• Patrick Sanchez
• Stephen Thompson, League of Extraordinary Algorithms Meetup
© 2018 Bonnie K. Holub
24. 24
Tools the Market is Choosing
“ KDnuggets Analytics/Data Science 2016 Software Poll: top 10 most popular
tools in 2016 ,” https://www.kdnuggets.com/2016/06/r-python-top-analytics-
data-mining-data-science-software.html, downloaded 2018/11/13.
25. 25
Animations
• Hans Rosling
• Time Magazine Top 100 in 2012.
• Swedish public health researcher.
• TED.com star (you should watch his talks…)
• 200 Countries, 200 Years, 4 Minutes - The Joy of Stats - BBC Four
• https://www.youtube.com/watch?v=jbkSRLYSojo (4 minutes)
• Gapminder.com “Wealth and Health of Nations”
© 2011-2012 ArcLight, Inc.
26. 26
Data Science SQUADs
Monthly Subscription
An agile cross-functional team that
executes to achieve a customer business
outcome. Right person for the job as
project and customer needs shift.
AI & ML Solutions
Outcome Focused
Detailed scoped Data Science, Artificial
Intelligence or Machine Learning
projects focused on delivering value tied
to outcome.
Accelerators
Library of IP
We draw upon our ever increasing
portfolio of accelerators, algorithms, code
and frameworks to accelerate delivery.
Data Science
Data Foundation
BI & Cognitive Design
Analytics Software Dev
Architecture
Our World-Class
Technologists
Cross-Functional
Squads
To Accelerate
Delivery
SQUADs
27. 27 © 2018 Teradata
Data Science Accelerators
28. 28 © 2018 Teradata
Hyper Segmentation
Business Challenge What We Did?
By leveraging the long tail of customer uniqueness,
tailored offers and messages can drive incremental
revenue through marketing channels.
Intuition of human markets supported by data and
evidence.
• A self contained Most Valuable Persona
application that identifies HyperSegments and
describes these persona segments in human
interpretable and actionable format.
• The personas describes can be ingested into a
data warehouse and/or marketing campaign tools.
• Built using Aster/Teradata Analytic Platform, Spark
and Python.
Organizations with large numbers of
customers do not have a arithmetic
capability to create named persona’s
for their top customers.
Without refined segments or
hypersegments, organizations are
limited to coarse offers across its
customer based and limited
optimization techniques.
Bringing human interpretable actionable insights for better marketing and customer personalization
Proof: Large Online Retailer
Enable an algorithmic capability to
create named personas for 100K most
valuable buyers in Computers, Tablets,
& Networking utilizing buying behaviors
of last 2 years and their account
attributes.
100K
MVBs
8K
Hyper-
segments
Enable fine grained behavioral and
demographic segmentation based on
extended customer and purchase history
data, which resulted in improved
personalization and context to drive
revenue
What’s The Customer Value?
29. 29 © 2018 Teradata
Customer Complaints
Business Challenge What We Did?
By leveraging the Customer Complaints application
Businesses gain complete understanding of all
emerging customer issues and are able to dramatically
increase the effectiveness of the analyst.
• Developed a Customer Complaints Analysis
Application that helps the analyst quickly resolve
complaints and deliver insights to business
leaders
• AI/ML used to prioritize complaints and prescribe
resolutions methods
• Ability to detect emerging issues and drill down to
determine root cause
• Available on Teradata Analytics Platform and
Open source technologies(spark, python, nltk)
Organizations with large numbers of
customers often get inundated with
complaints by the thousand per day.
In highly regulated industries, like
consumer financial, you must respond
and resolve every single complaint in
a timely manner.
Complaints may contain early warning
signs for systemic problems.
Organizations must be able to detect
and understand these critical insights
to better manage risk. Addressing an
issue before it becomes a news
headline will help improve customer
satisfaction and save millions in
regulatory fines.
Bringing human interpretable actionable insights for better marketing and customer personalization
Proof: Large US Consumer Bank
Proven test cases on historical data to
measure analytic effectiveness. Wells
Fargo Fake accounts, Citibank False
Introductory offer advertisement, Equifax
hack
3
Emerging
issues
3x
Productivity
Dramatic Increase in time saved by
identifying and solving global issues
instead of resolving case by case.
Recommended resolutions decreases
redundant work to determine root cause.
What’s The Customer Value?
30. 30 © 2018 Teradata
Transportation Model Optimization
Business Challenge What We Did?
Teradata was engaged to develop a detailed transportation
optimization model and user interface tool. The
optimization tool takes future forecasted volumes and
determines the most optimal transportation mode and
routes to deliver the product to the customer. Capacities,
capability constraints and various business rules are all
utilized to maximize results.
• Optimization models that selects the lowest cost
solutions for each material/customer pair
• A User Interface Tool (UIT) that graphically
presents results and “what-if” analysis of potential
scenarios to aid in further reduction of total costs
• Build using Teradata Database and Open Source
R libraries
Global manufacturing companies
often struggle to ensure that their
logistics network delivers the lowest
cost possible.
Todays complex manufacturing
distribution networks may include:
• A network of contract manufacturing
centers and storage facilities
• Constantly changing freight rates
makes optimal mode selection
challenging
• Lowest cost logistics solution
difficult to achieve with manual
calculations and analysis
Helping manufacturing improve on-time delivery and lower costs of their supply chain
Optimized network is estimated to
save the facility over $6.3MM or 10%
of transportation costs
The UIT has further identified cost
savings attributed to potential
relocation of processing centers
The business users can now quickly
assess and estimate changes to
transportation costs and plan
accordingly
Proof: Commodity Processing
$6.3M
Transportation
Cost Savings
What’s The Customer Value?
31. 31 © 2018 Teradata
Data Science Project Portfolio
32. 32 © 2018 Teradata
Cell Tower Coverage Optimization
Business Challenge What We Did?
• New state of the art algorithm in identify call quality
issues for providers before their customers see it.
• Significant improvement in Net Promoter Score via
increased customer satisfaction and lower customer
churn.
• Develop method for identifying undershooting cell
towers that could be uptilted to improve cell quality
• Combined geospatial and statistic techniques
• Identified overshooting towers that have the
highest negative impact on neighboring cells
• Automated Antenna tilts change based on
algorithm
• Built using Teradata and Tableau
• Cell tower and antenna position can
can have a dramatic impact on the
call quality of a mobile service
provider, despite having good
coverage
• Misconfigured towers can lead to
dropped calls, poor customer
experience, and customer churn.
Improving customer satisfaction via better call quality
Teradata Consulting expert team
worked with a leading telecom
provider to identify problem spots and
improve cell service quality.
In a single region better service is
estimated to provide $10M in savings.
Proof: Large Telecom Provider
$10M
Savings
6 deg: Contained4 deg: Overshooting
What’s The Customer Value?
33. 33 © 2018 Teradata
Fraud Detection using Deep Learning
Business Challenge What We Did? What’s The Customer Value?
• State of the art experimentation framework for
testing and deploying new fraud detection model.
• Lower false-positive rates mean lower review costs
and better customer experiences
• Higher detection rates mean lower loss rates.
• Developed Machine Learning and Deep Learning
algorithms to detect fraudulent transactions
• Explored LSTM, Auto-encoders and CovNets to
find better fraud detection at scale.
• Built using GPUs, CUDA, Python and TensorFlow
• In 2016, a large international bank
set a strategic goal to use
enterprise data insights and AI to
help detect fraud in business
transactions.
• The bank’s ‘human-written’ rule
engine was outdated - fraud
detection rates were as low as
30% and non-fraud cases up
around 99.5%, which was losing
them millions a month
Adapting the best in Artificial Intelligence, Deep Learning and GPU computing from Computer Vision into Financial Fraud
Teradata Consulting expert team used
deep learning in real-time to
accelerate fraud detection, reducing
false positives by 50% and increasing
the detection rate by 60%, thus
saving the bank millions.
Proof: Large International Bank
50%
False Positive
60%
Detection Rate
34. 34 © 2018 Teradata
Communications Compliance
Business Challenge What We Did?
By leveraging the Communications Compliance IP
accelerator customers with regulatory burdens on
communication can reduce millions of dollars in fines
per year, dramatically increase the effectiveness of
their staff, and promote a better customer experience
Developed ML based noise reduction techniques to
clear the junk out of email and messaging,
dramatically reducing false positives
Created NLP based workflow to predict risk and
categorize messages correctly with dramatic
increases in performance over current process
UI application delivered for compliance analyst to
assist with case management and resolution of
compliance violations
Available on Teradata Analytics Platform and Open
source technologies(spark, python, nltk)
In Consumer Financial and many
other regulated industries companies
are required to monitor
communications to pre-emptively
resolve financial crimes or prevent
misleading information from affecting
customers. Failure to do this properly
leads to millions of dollars in fines per
year.
The task of screening thousands of
employee to customer
communications can be monumental,
requiring many intelligent resources
and computational horsepower. The
benefit of saving millions in fines and
keeping your name out of the news
headlines is very much worth the
effort.
Stopping Financial Crimes before they happen
Proof: Large US Consumer Bank
Sophisticated noise reduction
techniques and NLP/AI models
dramatically reduce the false positives
created by rules based systems
40x
Less False
Positives
3x
Productivity
Compliance analysts are given much
less false positives and a much greater
level of intelligence to make informed
decisions regarding emerging threats to
compliance
What’s The Customer Value?
35. 35 © 2018 Teradata
Predictive Asset Maintenance
Business Challenge What We Did?
Reducing the number of outages by
even a small percentage will results in
large savings to the utility. Although
pole failures are rare, they have huge
impact. A 2007 fire, attributed to a
downed pole caused by Santa Ana
winds, cost the utility $351M (above
insurance payments).
Proof: Utility Client
• Highly scalable asset maintenance and asset
survivability models to both existing data sources
and components as well deployed IoT sensor data
• Identify assets in most need of maintenance
integrated with repair parts identification,
engineering and repair scheduling all on a single
platform.
• Models to assess the probability of failure, used to
represent risk. Our risk numbers may be factored
into a client’s composite risk model to generate a
composite number.
• Output of the model was a probability of failure,
which was combined with impact of failure to
represent risk and prioritize maintenance.
• Build using Aster/Teradata Analytic Platform and
Open Source Python libraries
• You have frequent unplanned down
time impacting operations.
• You are not sure which assets
should have maintenance to avoid
problems.
• Non-routine maintenance impacts
operations.
• You are not sure how to schedule
maintenance resources and to be
pro-active rather than re-active to
incidents.
$350M
Cost Avoidance
Using machine learning and AI to prevent un-planned down time, as well as optimize costs, scheduling and resources.
What’s The Customer Value?
36. 36 © 2018 Teradata
DBA.dl
Business Challenge What We Are Doing?
Drives millions in OpEx reduction and reduces risk management exposure.
• Efficiencies: Find and consolidate duplicate datasets and ETL jobs.
• Compliance: Audit data usage, and enforce standards and best practices.
• Faster onboarding: Find matching source datasets and recommend known
transformation jobs.
• Data Self-Service: Given an sample dataset help business analysts and data
scientists find the data they need without an ETL expert.
Developing novel Deep Learning and Artificial
Intelligence-driven data signature generation to
automatically develop a data catalogue that enables:
• Automated assimilation of new data sources
• Automated identification of duplicate data or
improper data usage
• Automated recommendation of data for usage
by analysts & data scientists for wrangling and
analysis.
A typical enterprise spends 80% of the
time integrating wrangling and
managing data. In addition 60% of
data-driven projects are an exact
replica of what has been built 5 times
before.
There is an opportunity to save tens
of millions annually on DBA, ETL,
data management, and other data
assimilation related activities.
Leading edge innovation in Artificial Intelligence and Deep Learning to address a 40 year old data management problem.
Currently in a joint Research and
Development project with
a large financial institution.
Proof: In-Progress
Y
Coming Soon
Data
What’s The Customer Value?
37. 37 © 2018 Teradata
Product Portfolio Bundle Targeting
Business Challenge
What We Are Doing?
What’s The Customer Value?
• Fast time to identify problem product groups.
• ~$1.5M yearly savings due to better visibility of at risk
customers for targeted intervention.
• • Provide insights driven dashboard to show problem
groups and provide propensity score for at-risk
customers.
• • Allow business executives to see at a glance where
the high impact areas are through consolidated
cohort groups.
• Allow the segment attributes and features to speak
for the specific
• All businesses have a set of traditional
problems including churn, revenue
diminishment, new business, cross-sell
and upsell.
AnalyticOps and insight driven interfaces to improve product performance visibility and enable targeted interventions.
Proof:
• 25% more effective targeting of
at-risk customers vs. Traditional
solutions.
125%
38. 38 © 2018 Teradata
Product Recommendation
Business Challenge What We Did? What’s The Customer Value?
By leveraging the product recommendation solution,
businesses are able to make More personalized offers
from better recommendations. This leads to increased
revenue from more completed offers.
• Developed product and customer affinity
recommendation models using collaborative
filtering to identify most frequently purchased
items
• Most recommended products are offered to
customers with similar shopping behavior
• Compatible with Teradata Analytic Platform or R in
Database on Teradata
Retailers have billions of transaction
details about their customers. The
ability to deliver relevant personalized
offers at scale could lead to significant
increases in revenue.
Retailers also have to account for item
seasonality, different customer
segmentations and too many products
to effectively analyze and market.
Improving customer personalization using analytics on product purchasing behavior
If a large retailer sends an average of
½ billion offers per month, then a
$0.01 improvement over 1 billion offers
is worth $10 million dollars.
Proof: Large Retailer
$10M
Additional
Revenue
39. 39 © 2018 Teradata
Hard Disk Failure Prediction
Business Challenge What We Did?
Hard disk manufacturers can identify underlying issues
with their products as they ramp-up production early in
the scale out phase.
Proactively offer customer service create a high-value
personalized experience that translates into deeper
brand loyalty.
Developed a data aggregation and modeling system
that:
• Collected100s of time series hard disk sensor and
system data daily from 1000s of laptops and
desktops.
• Used Correlation, PCA, Symbolic Aggregate
Approximation (SAX), Naïve Bayes to model and
predict likelihood of disk failure.
Hard disk failures can have a significant
productivity impact and in some
cases be catastrophic for customer’s
businesses or personal memories.
Predicting eminent hard disk failure
allows manufacturers to ship new hard
disk to customer before it fails.
Allows customers to back up their data
and minimize disruption.
Improving customer experience by predicting eminent disk failure.
Developed as an integrated solution
for one of the leading global PC
manufacturer to predict disk failure
1-week out with a F1-Score of 70%.
This allowed the PC manufacturer to
offer premium services for their high-
value customers and develop better
brand experience.
Proof: Large PC Manufacturer
Predict disk
failure
1-week
ahead
What’s The Customer Value?
40. 40 © 2018 Teradata
Identify Manufacturing Issues
Business Challenge What We Did?
Identifying product issues early in the product life-cycle
manufacturer can reduce the negative impact on the
brand, as well as reduce the cost of recalls by taking
corrective actions on their production line.
We use Natural Language Processing (NLP) and
Topic modeling on customer support notes from
customer complaints to identify systematic emerging
patterns.
Connecting the issues with Bill of Materials, using
Graph Analytics (Npath, Ntree, etc) we were able to
identify the manufacturer and plant that was causing
the faults.
Manufacturing issues that make their
way to shipped products can have a
disastrous impact on the band.
Identifying emerging issues early
enough allows manufacturers to take
corrective actions to reduce recalls,
warranty claims and recognize higher
margins.
The challenge is getting ahead of the
public perception before it damages
the brand.
Combining Natural Language Processing and Graph Analytics for early detection of product issues.
Proof: US Phone Manufacturer
We engaged with a leading US mobile
phone manufacturer to analyze freeform
customer support notes for emerging
issues for their latest product launch.
We identified 4 critical detects that
impact port of their inventory ahead
before they were seen in consumer or
media reports.
Manufacturer was able to identify the root
cause and shut off the faulty production line
and protect the brand.
Millions
saved in
recalls
What’s The Customer Value?