SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Natural Sparksmanship
Art of Making an Analytics Enterprise cross the chasm
Rajesh Krishnan
AIMIA Inc.
Our Business
Communication
Personalisation
InsightsLoyalty
Our Business - notes
• AIMIA is data driven marketing & loyalty analytics business, regarded as a leader
by Forrester in their loyalty management wave.
– We manage UK’s largest coalition loyalty program and also manage proprietary loyalty
for customers across verticals and across geography
– We provide insights through advanced analytics to retailers & CPGs across the globe.
– We manage marketing & smart customer communications through campaigns for our
clients utilising the analytics to drive loyalty.
• Personalisation underpins the three focus areas.
• Huge influx of data today means there is opportunity to know customer better.
Personalisation can be done like never before.
Our Analytics
Data
Analytics
Products
Diagnostic
Predictive
Prescriptive
Descriptive
Our Analytics - notes
• Analytics has been the backbone of the business.
• We perform the full spectrum of analytics
– What happened (Descriptive)
– Why it happened (Diagnostic)
– What will happen (Predictive)
– What should we do (Prescriptive)
• Our barn has different breeds of horses that are trained to win different forms of
the sport. Be it Show jumping, Endurance racing or Dressage.
The Scenario
The challenge
Survival instinct Winning habit
The Challenge - notes
• We have a smart custom analytics solution called Offer Engine
– Crunches ~20billion rows of transaction data.
– Calculates probability and ranks 30 billions Customer-Product combinations
– Proven to produce impressive Marketing RoI for our clients.
• It is an algorithm built using SQL running on in-memory MPP database + shell scripts.
• We want productise & implement across our customer base.
• Major technical hurdles on development. Less flexibility to use different data sources
with different velocity. Prolonged time to take code from development to production.
• In short we wanted to find this new breed of horse which has more speed, agility &
flexibility so that it can win this personalisation game anywhere & not just in one field.
The Objective
• Rewrite the award winning concept
“Best Use of Customer Analytics/Data in a Loyalty Programme ” – 2014 Loyalty awards
• Create a flexible framework
• Custom code to a configurable product
• Better performance at a lower cost
The Journey
The schematic
Metadata
Customer
Products
Transactions
Offers
Audience
Algorithm 1
Algorithm 2
Algorithm N
Pre-
Processing
Splitter
Splitter
Join
Join
Join
Ensemble
Output /
Channels
The Schematic - notes
• The transaction & campaign data from the Enterprise Data Warehouse gets pre-
processed to feed to the algorithm.
• The algorithms get trained on the training data from the above.
• When a set of campaign audience & the current offers in store are available, they
are sent to different algorithms based on the configuration.
• An Ensemble aggregates & produce the final ranking.
• It was so obvious to rebuild the key components of the pipeline in Spark to make it
a successful product after what we heard in Spark meetups.
• The best machine learning implementation using Spark presented in these meet
ups from various domains such as banking to e-commerce were convincing.
The Roles
The Owner The Sponsor
The Trainer The Sparksman
The Jockey The field expert
The bettor The client
The Roles – notes (1)
• The challenge in enterprise adoption of Spark is different. It is a high stakes game.
• Understanding the roles is a key first step to become a great Sparksman.
• The Owner / The Sponsor:
– Probably the most important person to make your project a success.
– They are a partnership or a single owner. It is really nice to have a single owner to start
with for ease of communication.
– Need to convince the identified technology is best suited.
– You better have the perfect statistics & proof that your new breed of horse is best suited
to win even before asking to buy one to test it.
• The Trainer / The Sparksman
– We who makes this tech work for our product with thorough knowledge of the market.
– The horseman who truly understands the breed & prepares to win the specific sport.
The Roles – notes (2)
• The Jockey / The field expert:
– The operations team that makes the product run at scale & deals with issues.
– The marketing/ sales team that exactly understands the pulse of client.
– They know the + & - of the product.
– They need to believe this new tech makes their life easy & better.
– Well the Jockey plays a key role in making the horse win the race and it is a perfect
partnership between the horseman & jockey is required to make any breed win.
• The bettor / The client
– The client collects the market info and purchases the best product & expect it to help us
win their customers. They want your product to be flexible, performant & secure.
– The bettor analyses the horse info and bet on it for few races. He wants to win big
money. They want your horse to be the best of breed, well fit & shows potential.
The stages
Fear
Trust
Comfort
The Stage – notes
• Like it or not, we need to go through the following three stages for success.
• Fear
– Established technology with expertise available who can bring the best in it. Why change?
– It is not easy to make the tech work on-prem especially with lack of expertise.
– When you train a young horse, it is natural that we will have the fear of understanding what goes on its
head and find the best environment to make it shine.
• Trust
– This comes with well designed experiments of your different queries.
– So be prepared for early stage failures when you set up Spark on your laptop, a cluster on VMs on prem
without any expertise.
– Try with leading Spark distribution vendors to make the ramp up easy and painless.
– In the right environment with a natural horseman for the horse to gain trust & show its prowess quick.
• Comfort
– Need to simulate real scenario & benchmark to get comfortable. Eg. Understand effect of data skewness
in full data.
– The horse needs few real race experience for it to get used to other horses & the audience.
The Approach characteristics
Patience
Strong Focus Unconventional
Positivity
Timely course correction
The characteristics – notes (1)
• Here are the top 5 characteristics of a Sparksman
• Strong Focus
– We all know it is important for any project. But this needs higher emphasis for Spark Projects. Why?
– Spark as generic execution framework can solve lot of different pieces of the analytics pipeline. (ETL, Data
science, Streaming or even other associated benefits such as dynamic & linear scalability).
– It is easy to get distracted with the capability of Spark. So, define key problem and work to produce
solution for it.
– You pick the sport you are preparing the horse for and teach the essential skills needed to do it. As the
horse has it inherently the nature to run, jump & dance, we should not try to make the same horse trained
for race, show jump & dressage all at the same time.
• Unconventional Approach
– If you come from a traditional RDBMS/ SQL background to manipulate data, be prepared to unlearn.
– If you would like to use machine learning concepts, taking the same approach as you took in a SQL engine
on MPP database will not work or give the best benefits you expect from Spark.
– Eg. We used the predicate pushdown concept to make faster & efficient dataframes wherever possible.
– While you may not figure out what is happening, if you don’t try new options with your horse, you will not
know what works & what not. Great trainers have always tried unorthodox things.
The characteristics – notes (2)
• Patience & Positivity
– Do not give up when it does not work although it is incredibly boring at time to the same thing again.
– Repeat the task with tuning one parameter or one aspect of the code at a time. Repetition is key.
– There are times at which the it won’t work for things you don’t know or not have control of.
– Eg. Many a times our code worked for reasonable sized data & scaled linearly but breaks down at some point. GC
times were suddenly high.
– Various approaches for many days did not help until we figured out it is not the code or Spark but data skewness
is the issue. So giving the benefit of doubt to the tech worked.
– Horses resist to do tasks and sometimes their behaviour bemuses and frustrates you. You cannot give up at the first
sign of resistance and need to have the patience. Also great trainers give benefit of doubt to the horse and be
positive.
• Timely course correction
– Keep the emotion out of the solution. Don’t be rigid about your solution.
– It is important to know when to stop repeating the execution by tweaking parameters & take a new approach to
re-write the code.
– Eg: When our code tweaking didn’t work, we figured out some aggregation fails within a window function on the
full data set. So, we need to avoid window function & change approach when the dataframe is massive.
– Great trainers keep their emotion out & never take frustration on the horse but do some corrections to their
approach say by using a stronger bridle or different crop.
The set-up
2 3
45
1
Complete
Data
Enterprise
DWH
Masked
Essential Data
Linux
Server
The Engine
AWS
Cloud
Source
The Results
The Achievement
80 - 85% reduction in code base
50% less Memory & 25% less Compute used
20% performance gain (with intermediate stages on disk)
Huge savings on cost per run
The enabler
Deep
Spark
Expertise
Dev tool with
Integrated
Visualisation &
Collaboration
Large scale
optimised
Spark clusters
Unified web
interface for
Dev, Test &
Operations
Language
choice
ML Lib
25
Collect 200 bonus points when
you buy any newspaper at the
store today
Customer picks up
newspaper
One key takeaway
“There is no mysticism, no magic, or
only one method in the realm of good
horsemanship.
It’s knowing that everything you think
you know about horses may change
with the very next horse.”
-Buck Brannaman
Final few slides - notes
• We will start using real time data like weather / location of the customer in our personalisation.
• Based on our experiments we know it is a lot of data to consume & analyse in real time.
• This makes Spark very appealing, given 2.0 introduces structured streaming.
• Also the end-to-end security roadmap from databricks might remove our need for data masking.
• Spark projects vary in size, shape & complexity. Spark as a technology is evolving at great speed.
• So, be prepared to unlearn & re-learn, be willing & you can make your analytics enterprise shine
with Spark.
Credits
Musa Bilal for the inspiration
Prasad Deshpande for sharing the passion
Stuart Pearson for true Laissez-faire leadership
John Menhinick & Simon Hawkes for unparalleled support
THANK YOU.
@krajeshiyer
www.linkedin.com/in/rajkri

Más contenido relacionado

La actualidad más candente

Hazelcast for Terracotta Users
Hazelcast for Terracotta UsersHazelcast for Terracotta Users
Hazelcast for Terracotta Users
Hazelcast
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera, Inc.
 

La actualidad más candente (20)

PASS Summit 2020
PASS Summit 2020PASS Summit 2020
PASS Summit 2020
 
How to Win When Migrating to Azure
How to Win When Migrating to AzureHow to Win When Migrating to Azure
How to Win When Migrating to Azure
 
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017
 
Top 10 Application Problems
Top 10 Application ProblemsTop 10 Application Problems
Top 10 Application Problems
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Hazelcast for Terracotta Users
Hazelcast for Terracotta UsersHazelcast for Terracotta Users
Hazelcast for Terracotta Users
 
Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
 
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
 
UEMB210: Software Delivery: Best Practices
UEMB210: Software Delivery: Best PracticesUEMB210: Software Delivery: Best Practices
UEMB210: Software Delivery: Best Practices
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's Perspective
 
Spark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System AdministratorsSpark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System Administrators
 

Destacado

Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
 

Destacado (20)

Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
 
A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
A Scaleable Implemenation of Deep Leaning on Spark- Alexander UlanovA Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Netflix branding stumbles
Netflix branding stumblesNetflix branding stumbles
Netflix branding stumbles
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
 
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
 
Sparkling Random Ferns by P Dendek and M Fedoryszak
Sparkling Random Ferns by  P Dendek and M FedoryszakSparkling Random Ferns by  P Dendek and M Fedoryszak
Sparkling Random Ferns by P Dendek and M Fedoryszak
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
Netflix in France
Netflix in FranceNetflix in France
Netflix in France
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
ベンダーロックインフリーのビジネスクラウドの世界
ベンダーロックインフリーのビジネスクラウドの世界ベンダーロックインフリーのビジネスクラウドの世界
ベンダーロックインフリーのビジネスクラウドの世界
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High Gear
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive AnalyticsPowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
 
Netflix - Book de Campanha
Netflix - Book de CampanhaNetflix - Book de Campanha
Netflix - Book de Campanha
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
 
The Internet of Everywhere—How IBM The Weather Company Scales
The Internet of Everywhere—How IBM The Weather Company ScalesThe Internet of Everywhere—How IBM The Weather Company Scales
The Internet of Everywhere—How IBM The Weather Company Scales
 

Similar a Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the Chasm

Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal
 

Similar a Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the Chasm (20)

Your Roadmap, Your Product Story & Datadriven Product Management
Your Roadmap, Your Product Story & Datadriven Product ManagementYour Roadmap, Your Product Story & Datadriven Product Management
Your Roadmap, Your Product Story & Datadriven Product Management
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India Analytics
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
10 Safe Essential Elements to Achieve the Benefits of SAFe
10 Safe Essential Elements to Achieve the Benefits of SAFe10 Safe Essential Elements to Achieve the Benefits of SAFe
10 Safe Essential Elements to Achieve the Benefits of SAFe
 
Measuring Scrum
Measuring ScrumMeasuring Scrum
Measuring Scrum
 
Sap variant configuation training
Sap variant configuation trainingSap variant configuation training
Sap variant configuation training
 
Agile Marketing & Customer Experience
Agile Marketing & Customer ExperienceAgile Marketing & Customer Experience
Agile Marketing & Customer Experience
 
Sap rebates by dilip sadh
Sap rebates by dilip sadhSap rebates by dilip sadh
Sap rebates by dilip sadh
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
 
The Right Product Owner
The Right Product OwnerThe Right Product Owner
The Right Product Owner
 
Traders Cockpit Product Details
Traders Cockpit Product DetailsTraders Cockpit Product Details
Traders Cockpit Product Details
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
 
Agile and Scrum Basics
Agile and Scrum BasicsAgile and Scrum Basics
Agile and Scrum Basics
 
Tiff Lee Resume
Tiff Lee ResumeTiff Lee Resume
Tiff Lee Resume
 
How to sell Services & Product effectively.
How to sell Services & Product effectively.How to sell Services & Product effectively.
How to sell Services & Product effectively.
 
How to sell Services and Product effectively
How to sell Services and Product effectivelyHow to sell Services and Product effectively
How to sell Services and Product effectively
 
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
 
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...
 
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
 

Más de Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Más de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Último (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the Chasm

  • 1. Natural Sparksmanship Art of Making an Analytics Enterprise cross the chasm Rajesh Krishnan AIMIA Inc.
  • 3. Our Business - notes • AIMIA is data driven marketing & loyalty analytics business, regarded as a leader by Forrester in their loyalty management wave. – We manage UK’s largest coalition loyalty program and also manage proprietary loyalty for customers across verticals and across geography – We provide insights through advanced analytics to retailers & CPGs across the globe. – We manage marketing & smart customer communications through campaigns for our clients utilising the analytics to drive loyalty. • Personalisation underpins the three focus areas. • Huge influx of data today means there is opportunity to know customer better. Personalisation can be done like never before.
  • 5. Our Analytics - notes • Analytics has been the backbone of the business. • We perform the full spectrum of analytics – What happened (Descriptive) – Why it happened (Diagnostic) – What will happen (Predictive) – What should we do (Prescriptive) • Our barn has different breeds of horses that are trained to win different forms of the sport. Be it Show jumping, Endurance racing or Dressage.
  • 8. The Challenge - notes • We have a smart custom analytics solution called Offer Engine – Crunches ~20billion rows of transaction data. – Calculates probability and ranks 30 billions Customer-Product combinations – Proven to produce impressive Marketing RoI for our clients. • It is an algorithm built using SQL running on in-memory MPP database + shell scripts. • We want productise & implement across our customer base. • Major technical hurdles on development. Less flexibility to use different data sources with different velocity. Prolonged time to take code from development to production. • In short we wanted to find this new breed of horse which has more speed, agility & flexibility so that it can win this personalisation game anywhere & not just in one field.
  • 9. The Objective • Rewrite the award winning concept “Best Use of Customer Analytics/Data in a Loyalty Programme ” – 2014 Loyalty awards • Create a flexible framework • Custom code to a configurable product • Better performance at a lower cost
  • 11. The schematic Metadata Customer Products Transactions Offers Audience Algorithm 1 Algorithm 2 Algorithm N Pre- Processing Splitter Splitter Join Join Join Ensemble Output / Channels
  • 12. The Schematic - notes • The transaction & campaign data from the Enterprise Data Warehouse gets pre- processed to feed to the algorithm. • The algorithms get trained on the training data from the above. • When a set of campaign audience & the current offers in store are available, they are sent to different algorithms based on the configuration. • An Ensemble aggregates & produce the final ranking. • It was so obvious to rebuild the key components of the pipeline in Spark to make it a successful product after what we heard in Spark meetups. • The best machine learning implementation using Spark presented in these meet ups from various domains such as banking to e-commerce were convincing.
  • 13. The Roles The Owner The Sponsor The Trainer The Sparksman The Jockey The field expert The bettor The client
  • 14. The Roles – notes (1) • The challenge in enterprise adoption of Spark is different. It is a high stakes game. • Understanding the roles is a key first step to become a great Sparksman. • The Owner / The Sponsor: – Probably the most important person to make your project a success. – They are a partnership or a single owner. It is really nice to have a single owner to start with for ease of communication. – Need to convince the identified technology is best suited. – You better have the perfect statistics & proof that your new breed of horse is best suited to win even before asking to buy one to test it. • The Trainer / The Sparksman – We who makes this tech work for our product with thorough knowledge of the market. – The horseman who truly understands the breed & prepares to win the specific sport.
  • 15. The Roles – notes (2) • The Jockey / The field expert: – The operations team that makes the product run at scale & deals with issues. – The marketing/ sales team that exactly understands the pulse of client. – They know the + & - of the product. – They need to believe this new tech makes their life easy & better. – Well the Jockey plays a key role in making the horse win the race and it is a perfect partnership between the horseman & jockey is required to make any breed win. • The bettor / The client – The client collects the market info and purchases the best product & expect it to help us win their customers. They want your product to be flexible, performant & secure. – The bettor analyses the horse info and bet on it for few races. He wants to win big money. They want your horse to be the best of breed, well fit & shows potential.
  • 17. The Stage – notes • Like it or not, we need to go through the following three stages for success. • Fear – Established technology with expertise available who can bring the best in it. Why change? – It is not easy to make the tech work on-prem especially with lack of expertise. – When you train a young horse, it is natural that we will have the fear of understanding what goes on its head and find the best environment to make it shine. • Trust – This comes with well designed experiments of your different queries. – So be prepared for early stage failures when you set up Spark on your laptop, a cluster on VMs on prem without any expertise. – Try with leading Spark distribution vendors to make the ramp up easy and painless. – In the right environment with a natural horseman for the horse to gain trust & show its prowess quick. • Comfort – Need to simulate real scenario & benchmark to get comfortable. Eg. Understand effect of data skewness in full data. – The horse needs few real race experience for it to get used to other horses & the audience.
  • 18. The Approach characteristics Patience Strong Focus Unconventional Positivity Timely course correction
  • 19. The characteristics – notes (1) • Here are the top 5 characteristics of a Sparksman • Strong Focus – We all know it is important for any project. But this needs higher emphasis for Spark Projects. Why? – Spark as generic execution framework can solve lot of different pieces of the analytics pipeline. (ETL, Data science, Streaming or even other associated benefits such as dynamic & linear scalability). – It is easy to get distracted with the capability of Spark. So, define key problem and work to produce solution for it. – You pick the sport you are preparing the horse for and teach the essential skills needed to do it. As the horse has it inherently the nature to run, jump & dance, we should not try to make the same horse trained for race, show jump & dressage all at the same time. • Unconventional Approach – If you come from a traditional RDBMS/ SQL background to manipulate data, be prepared to unlearn. – If you would like to use machine learning concepts, taking the same approach as you took in a SQL engine on MPP database will not work or give the best benefits you expect from Spark. – Eg. We used the predicate pushdown concept to make faster & efficient dataframes wherever possible. – While you may not figure out what is happening, if you don’t try new options with your horse, you will not know what works & what not. Great trainers have always tried unorthodox things.
  • 20. The characteristics – notes (2) • Patience & Positivity – Do not give up when it does not work although it is incredibly boring at time to the same thing again. – Repeat the task with tuning one parameter or one aspect of the code at a time. Repetition is key. – There are times at which the it won’t work for things you don’t know or not have control of. – Eg. Many a times our code worked for reasonable sized data & scaled linearly but breaks down at some point. GC times were suddenly high. – Various approaches for many days did not help until we figured out it is not the code or Spark but data skewness is the issue. So giving the benefit of doubt to the tech worked. – Horses resist to do tasks and sometimes their behaviour bemuses and frustrates you. You cannot give up at the first sign of resistance and need to have the patience. Also great trainers give benefit of doubt to the horse and be positive. • Timely course correction – Keep the emotion out of the solution. Don’t be rigid about your solution. – It is important to know when to stop repeating the execution by tweaking parameters & take a new approach to re-write the code. – Eg: When our code tweaking didn’t work, we figured out some aggregation fails within a window function on the full data set. So, we need to avoid window function & change approach when the dataframe is massive. – Great trainers keep their emotion out & never take frustration on the horse but do some corrections to their approach say by using a stronger bridle or different crop.
  • 21. The set-up 2 3 45 1 Complete Data Enterprise DWH Masked Essential Data Linux Server The Engine AWS Cloud Source
  • 23. The Achievement 80 - 85% reduction in code base 50% less Memory & 25% less Compute used 20% performance gain (with intermediate stages on disk) Huge savings on cost per run
  • 24. The enabler Deep Spark Expertise Dev tool with Integrated Visualisation & Collaboration Large scale optimised Spark clusters Unified web interface for Dev, Test & Operations Language choice ML Lib
  • 25. 25 Collect 200 bonus points when you buy any newspaper at the store today Customer picks up newspaper
  • 26. One key takeaway “There is no mysticism, no magic, or only one method in the realm of good horsemanship. It’s knowing that everything you think you know about horses may change with the very next horse.” -Buck Brannaman
  • 27. Final few slides - notes • We will start using real time data like weather / location of the customer in our personalisation. • Based on our experiments we know it is a lot of data to consume & analyse in real time. • This makes Spark very appealing, given 2.0 introduces structured streaming. • Also the end-to-end security roadmap from databricks might remove our need for data masking. • Spark projects vary in size, shape & complexity. Spark as a technology is evolving at great speed. • So, be prepared to unlearn & re-learn, be willing & you can make your analytics enterprise shine with Spark.
  • 28. Credits Musa Bilal for the inspiration Prasad Deshpande for sharing the passion Stuart Pearson for true Laissez-faire leadership John Menhinick & Simon Hawkes for unparalleled support