SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Fraud Detection
Using Cloudera EDH Fraud Detection
Kevin O’Dell/Field Engineer
3© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
Agenda
•  Overview of Problem
•  Offline Fraud
•  Online Fraud
•  Discussion
6© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
Problem Statement
•  About $15B annually – Card only
•  1.3B Credit, Debit and Pre-paid cards – 5 per adult.
•  Omni-channel – Debit, Credit, Online, PoS, Deposits
•  Increasing Regulatory / Compliance Requirements
•  How do we integrate multiple systems and sources
13© 2014 Cloudera and/or its affiliates. All rights reserved.
Fraud Systems
14© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
Key Requirements
•  No data loss is acceptable
•  Stream processing must complete ASAP, <500ms
•  Support approximately 400M transactions per day in aggregate
•  Highest Volume Flow:
•  Current – 1.8k transactions/s
•  Projected – 10k transactions/s
•  Each flow has at least three steps
•  Adapter, Persistence, Hadoop Persistence
•  Most complex with approximately seven steps
•  Avoid massive code refactoring
15© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
Fraud System Categories
•  Online
•  Ingest
•  Enrichment (Profiles, feature selection, etc.)
•  Early warning / detection (model serving / model application)
•  Persistence
•  Offline (Human activities)
•  Model building / discovery
•  Case management
•  Forensics
16© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
Some Numbers
•  Approximately 400M events (not necessarily transactions) per day
•  Analysts get data within 5 minutes (Approx 90M/day)
•  Over 100 Source Systems
•  Offline system rolls files every 5 minutes
•  Online system processes transaction authorization flows in < 500ms.
17© 2014 Cloudera and/or its affiliates. All rights reserved.
Architecture
19© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
ClientClient
Incoming Events Cloudera EDH
Automated & Manual
Analytical Adjustments and
Pattern detection
Kafka Cluster
Topic A
Topic B
Topic C
ClientClient
Outgoing Events
Storage
HDFS
SolR
HBase
Event Processing Interactivity
HBase
Search
Serving Layer
Rules Engine
Model Building
Speed Layer
Batch Layer
Processing
Impala
Map/Reduce
Spark
3rd Party
Fraud Architecture
EDH:
Model Building, Automated Alerting, Profile Persistence
Layer, Forensics, Pattern Detection, Discovery
Analytics
Event Processing
Alerting, Enrichment, Business Rules
Spark Streaming
Case Management And
Alerting
26© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
Online Systems
•  Can be incorporated to the authorization pipeline
•  Rules Engine incorporation
•  Application of models
•  Must deliver results sub-second
•  Must scale to spikes in transaction volume
•  Historically outside of Hadoop
•  0 data loss tolerance and tight SLA requirements
28© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
Online System Advantages
•  Enriching the record in Real Time allows us to apply any number of
algorithms
•  Travel Scoring
•  Anomaly Detection Models (Clustering)
•  Commercial ML model application
•  All with sub-second latency
•  Integration into EDH allows easy deployment, monitoring and
integration with offline/batch activities
33© 2014 Cloudera, Inc. All rights reserved.
CONFIDENTIAL—DO NOT
DISTRIBUTE
ClientClient
Incoming Events Operational Cluster (6 months)
Automated & Manual
Analytical Adjustments and
Pattern detection
HULC (13 months)
Storage Processing
HDFS
Impala
Map/Reduce
Spark
Security / Transport
Service Activator
Kafka Cluster
Topic A
Topic B
Topic C
ClientClient
Outgoing Events
Security / Transport
Service Activator
Storage
HDFS
SolR
HBase
Event Processing Interactivity
HBase
Search
Serving Layer
Rules System
Model Building
Speed Layer
Batch Layer
Real-time Cluster
Processing
Impala
Map/Reduce
Spark
3rd Party
Multi-cluster Fraud Architecture
Operational Cluster:
Model Building, Automated Alerting, Profile Persistence
Layer
Discovery Cluster
Batch model updates, Discovery Analytics, Pattern
Detection
Real Time Cluster
Event Processing, Alerting, Enrichment, Business
Rules
Business Users
HDFS Replication
Thank you.

Más contenido relacionado

La actualidad más candente

Endpoint Agent Part 1: End User Experience
Endpoint Agent Part 1: End User ExperienceEndpoint Agent Part 1: End User Experience
Endpoint Agent Part 1: End User ExperienceThousandEyes
 
Modernizing Your DNS Platform with NS1 and ThousandEyes
Modernizing Your DNS Platform with NS1 and ThousandEyesModernizing Your DNS Platform with NS1 and ThousandEyes
Modernizing Your DNS Platform with NS1 and ThousandEyesThousandEyes
 
ThousandEyes Overview
ThousandEyes Overview ThousandEyes Overview
ThousandEyes Overview ThousandEyes
 
Talari Customer Overview_2015
Talari Customer Overview_2015Talari Customer Overview_2015
Talari Customer Overview_2015Serhat Cakmakoglu
 
Cisco IT and ThousandEyes
Cisco IT and ThousandEyesCisco IT and ThousandEyes
Cisco IT and ThousandEyesThousandEyes
 
Diagnosing Internet Outages
Diagnosing Internet OutagesDiagnosing Internet Outages
Diagnosing Internet OutagesThousandEyes
 
2016 Internet Outages: Trends, Insights & Analysis
2016 Internet Outages: Trends, Insights & Analysis 2016 Internet Outages: Trends, Insights & Analysis
2016 Internet Outages: Trends, Insights & Analysis ThousandEyes
 
Oracle Public Cloud Operations from ThousandEyes Connect
Oracle Public Cloud Operations from ThousandEyes ConnectOracle Public Cloud Operations from ThousandEyes Connect
Oracle Public Cloud Operations from ThousandEyes ConnectThousandEyes
 
Microsoft Azure Identity and ThousandEyes
Microsoft Azure Identity and ThousandEyesMicrosoft Azure Identity and ThousandEyes
Microsoft Azure Identity and ThousandEyesThousandEyes
 
Monitoring End User Experience with Endpoint Agent
Monitoring End User Experience with Endpoint AgentMonitoring End User Experience with Endpoint Agent
Monitoring End User Experience with Endpoint AgentThousandEyes
 
Optimizing WAN to Deliver SharePoint Online Globally
Optimizing WAN to Deliver SharePoint Online GloballyOptimizing WAN to Deliver SharePoint Online Globally
Optimizing WAN to Deliver SharePoint Online GloballyThousandEyes
 
The Top Outages of 2021: Analysis and Takeaways
The Top Outages of 2021: Analysis and TakeawaysThe Top Outages of 2021: Analysis and Takeaways
The Top Outages of 2021: Analysis and TakeawaysThousandEyes
 
Endpoint Agent Part 2: Monitoring SaaS Apps from Anywhere
Endpoint Agent Part 2: Monitoring SaaS Apps from AnywhereEndpoint Agent Part 2: Monitoring SaaS Apps from Anywhere
Endpoint Agent Part 2: Monitoring SaaS Apps from AnywhereThousandEyes
 
ThousandEyes at Network Field Day 12
ThousandEyes at Network Field Day 12ThousandEyes at Network Field Day 12
ThousandEyes at Network Field Day 12ThousandEyes
 
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the CloudThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the CloudThousandEyes
 
NXP Presentation @ ThousandEyes Connect London - June 13th 2019
NXP Presentation @ ThousandEyes Connect London - June 13th 2019NXP Presentation @ ThousandEyes Connect London - June 13th 2019
NXP Presentation @ ThousandEyes Connect London - June 13th 2019ThousandEyes
 
Ensuring the compliance, resiliency, and availability of business-critical ne...
Ensuring the compliance, resiliency, and availability of business-critical ne...Ensuring the compliance, resiliency, and availability of business-critical ne...
Ensuring the compliance, resiliency, and availability of business-critical ne...Riverbed Technology
 
Enterprise Agents: Deployment Best Practices
Enterprise Agents: Deployment Best PracticesEnterprise Agents: Deployment Best Practices
Enterprise Agents: Deployment Best PracticesThousandEyes
 
Expert Services and Managed Service
Expert Services and Managed Service Expert Services and Managed Service
Expert Services and Managed Service Episerver
 
Lessons from an AWS outage and how to detect root cause of cloud service disr...
Lessons from an AWS outage and how to detect root cause of cloud service disr...Lessons from an AWS outage and how to detect root cause of cloud service disr...
Lessons from an AWS outage and how to detect root cause of cloud service disr...ThousandEyes
 

La actualidad más candente (20)

Endpoint Agent Part 1: End User Experience
Endpoint Agent Part 1: End User ExperienceEndpoint Agent Part 1: End User Experience
Endpoint Agent Part 1: End User Experience
 
Modernizing Your DNS Platform with NS1 and ThousandEyes
Modernizing Your DNS Platform with NS1 and ThousandEyesModernizing Your DNS Platform with NS1 and ThousandEyes
Modernizing Your DNS Platform with NS1 and ThousandEyes
 
ThousandEyes Overview
ThousandEyes Overview ThousandEyes Overview
ThousandEyes Overview
 
Talari Customer Overview_2015
Talari Customer Overview_2015Talari Customer Overview_2015
Talari Customer Overview_2015
 
Cisco IT and ThousandEyes
Cisco IT and ThousandEyesCisco IT and ThousandEyes
Cisco IT and ThousandEyes
 
Diagnosing Internet Outages
Diagnosing Internet OutagesDiagnosing Internet Outages
Diagnosing Internet Outages
 
2016 Internet Outages: Trends, Insights & Analysis
2016 Internet Outages: Trends, Insights & Analysis 2016 Internet Outages: Trends, Insights & Analysis
2016 Internet Outages: Trends, Insights & Analysis
 
Oracle Public Cloud Operations from ThousandEyes Connect
Oracle Public Cloud Operations from ThousandEyes ConnectOracle Public Cloud Operations from ThousandEyes Connect
Oracle Public Cloud Operations from ThousandEyes Connect
 
Microsoft Azure Identity and ThousandEyes
Microsoft Azure Identity and ThousandEyesMicrosoft Azure Identity and ThousandEyes
Microsoft Azure Identity and ThousandEyes
 
Monitoring End User Experience with Endpoint Agent
Monitoring End User Experience with Endpoint AgentMonitoring End User Experience with Endpoint Agent
Monitoring End User Experience with Endpoint Agent
 
Optimizing WAN to Deliver SharePoint Online Globally
Optimizing WAN to Deliver SharePoint Online GloballyOptimizing WAN to Deliver SharePoint Online Globally
Optimizing WAN to Deliver SharePoint Online Globally
 
The Top Outages of 2021: Analysis and Takeaways
The Top Outages of 2021: Analysis and TakeawaysThe Top Outages of 2021: Analysis and Takeaways
The Top Outages of 2021: Analysis and Takeaways
 
Endpoint Agent Part 2: Monitoring SaaS Apps from Anywhere
Endpoint Agent Part 2: Monitoring SaaS Apps from AnywhereEndpoint Agent Part 2: Monitoring SaaS Apps from Anywhere
Endpoint Agent Part 2: Monitoring SaaS Apps from Anywhere
 
ThousandEyes at Network Field Day 12
ThousandEyes at Network Field Day 12ThousandEyes at Network Field Day 12
ThousandEyes at Network Field Day 12
 
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the CloudThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
 
NXP Presentation @ ThousandEyes Connect London - June 13th 2019
NXP Presentation @ ThousandEyes Connect London - June 13th 2019NXP Presentation @ ThousandEyes Connect London - June 13th 2019
NXP Presentation @ ThousandEyes Connect London - June 13th 2019
 
Ensuring the compliance, resiliency, and availability of business-critical ne...
Ensuring the compliance, resiliency, and availability of business-critical ne...Ensuring the compliance, resiliency, and availability of business-critical ne...
Ensuring the compliance, resiliency, and availability of business-critical ne...
 
Enterprise Agents: Deployment Best Practices
Enterprise Agents: Deployment Best PracticesEnterprise Agents: Deployment Best Practices
Enterprise Agents: Deployment Best Practices
 
Expert Services and Managed Service
Expert Services and Managed Service Expert Services and Managed Service
Expert Services and Managed Service
 
Lessons from an AWS outage and how to detect root cause of cloud service disr...
Lessons from an AWS outage and how to detect root cause of cloud service disr...Lessons from an AWS outage and how to detect root cause of cloud service disr...
Lessons from an AWS outage and how to detect root cause of cloud service disr...
 

Similar a Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataCloudera, Inc.
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Publicis Sapient Engineering
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
ARC's Ralph Rio Easy IT Presentation @ ARC Industry Forum 2010
ARC's Ralph Rio Easy IT Presentation @ ARC Industry Forum 2010ARC's Ralph Rio Easy IT Presentation @ ARC Industry Forum 2010
ARC's Ralph Rio Easy IT Presentation @ ARC Industry Forum 2010ARC Advisory Group
 
151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA ProfileZarul Zaabah
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Niel Dunnage
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerCloudera, Inc.
 
Internet of Things = More Big Data: How Will Cloud Computing Evolve?
Internet of Things = More Big Data: How Will Cloud Computing Evolve?Internet of Things = More Big Data: How Will Cloud Computing Evolve?
Internet of Things = More Big Data: How Will Cloud Computing Evolve?Codero
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
CipherCloud Webinar - Cloud Encryption & Tokenization 101
CipherCloud Webinar - Cloud Encryption & Tokenization 101CipherCloud Webinar - Cloud Encryption & Tokenization 101
CipherCloud Webinar - Cloud Encryption & Tokenization 101CipherCloud
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataMatt Stubbs
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003lee tracie
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Fast Data Overview
Fast Data OverviewFast Data Overview
Fast Data OverviewC. Scyphers
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 

Similar a Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub (20)

Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
 
Defining Advanced AAA Policies for Access Networks
Defining Advanced AAA Policies for Access NetworksDefining Advanced AAA Policies for Access Networks
Defining Advanced AAA Policies for Access Networks
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Advanced Access Management with Aruba ClearPass #AirheadsConf Italy
Advanced Access Management with Aruba ClearPass #AirheadsConf ItalyAdvanced Access Management with Aruba ClearPass #AirheadsConf Italy
Advanced Access Management with Aruba ClearPass #AirheadsConf Italy
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
ARC's Ralph Rio Easy IT Presentation @ ARC Industry Forum 2010
ARC's Ralph Rio Easy IT Presentation @ ARC Industry Forum 2010ARC's Ralph Rio Easy IT Presentation @ ARC Industry Forum 2010
ARC's Ralph Rio Easy IT Presentation @ ARC Industry Forum 2010
 
Access Management with Aruba ClearPass
Access Management with Aruba ClearPassAccess Management with Aruba ClearPass
Access Management with Aruba ClearPass
 
151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Internet of Things = More Big Data: How Will Cloud Computing Evolve?
Internet of Things = More Big Data: How Will Cloud Computing Evolve?Internet of Things = More Big Data: How Will Cloud Computing Evolve?
Internet of Things = More Big Data: How Will Cloud Computing Evolve?
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
CipherCloud Webinar - Cloud Encryption & Tokenization 101
CipherCloud Webinar - Cloud Encryption & Tokenization 101CipherCloud Webinar - Cloud Encryption & Tokenization 101
CipherCloud Webinar - Cloud Encryption & Tokenization 101
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Fast Data Overview
Fast Data OverviewFast Data Overview
Fast Data Overview
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 

Más de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

Más de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

  • 1. Fraud Detection Using Cloudera EDH Fraud Detection Kevin O’Dell/Field Engineer
  • 2. 3© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE Agenda •  Overview of Problem •  Offline Fraud •  Online Fraud •  Discussion
  • 3. 6© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE Problem Statement •  About $15B annually – Card only •  1.3B Credit, Debit and Pre-paid cards – 5 per adult. •  Omni-channel – Debit, Credit, Online, PoS, Deposits •  Increasing Regulatory / Compliance Requirements •  How do we integrate multiple systems and sources
  • 4. 13© 2014 Cloudera and/or its affiliates. All rights reserved. Fraud Systems
  • 5. 14© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE Key Requirements •  No data loss is acceptable •  Stream processing must complete ASAP, <500ms •  Support approximately 400M transactions per day in aggregate •  Highest Volume Flow: •  Current – 1.8k transactions/s •  Projected – 10k transactions/s •  Each flow has at least three steps •  Adapter, Persistence, Hadoop Persistence •  Most complex with approximately seven steps •  Avoid massive code refactoring
  • 6. 15© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE Fraud System Categories •  Online •  Ingest •  Enrichment (Profiles, feature selection, etc.) •  Early warning / detection (model serving / model application) •  Persistence •  Offline (Human activities) •  Model building / discovery •  Case management •  Forensics
  • 7. 16© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE Some Numbers •  Approximately 400M events (not necessarily transactions) per day •  Analysts get data within 5 minutes (Approx 90M/day) •  Over 100 Source Systems •  Offline system rolls files every 5 minutes •  Online system processes transaction authorization flows in < 500ms.
  • 8. 17© 2014 Cloudera and/or its affiliates. All rights reserved. Architecture
  • 9. 19© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE ClientClient Incoming Events Cloudera EDH Automated & Manual Analytical Adjustments and Pattern detection Kafka Cluster Topic A Topic B Topic C ClientClient Outgoing Events Storage HDFS SolR HBase Event Processing Interactivity HBase Search Serving Layer Rules Engine Model Building Speed Layer Batch Layer Processing Impala Map/Reduce Spark 3rd Party Fraud Architecture EDH: Model Building, Automated Alerting, Profile Persistence Layer, Forensics, Pattern Detection, Discovery Analytics Event Processing Alerting, Enrichment, Business Rules Spark Streaming Case Management And Alerting
  • 10. 26© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE Online Systems •  Can be incorporated to the authorization pipeline •  Rules Engine incorporation •  Application of models •  Must deliver results sub-second •  Must scale to spikes in transaction volume •  Historically outside of Hadoop •  0 data loss tolerance and tight SLA requirements
  • 11. 28© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE Online System Advantages •  Enriching the record in Real Time allows us to apply any number of algorithms •  Travel Scoring •  Anomaly Detection Models (Clustering) •  Commercial ML model application •  All with sub-second latency •  Integration into EDH allows easy deployment, monitoring and integration with offline/batch activities
  • 12. 33© 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE ClientClient Incoming Events Operational Cluster (6 months) Automated & Manual Analytical Adjustments and Pattern detection HULC (13 months) Storage Processing HDFS Impala Map/Reduce Spark Security / Transport Service Activator Kafka Cluster Topic A Topic B Topic C ClientClient Outgoing Events Security / Transport Service Activator Storage HDFS SolR HBase Event Processing Interactivity HBase Search Serving Layer Rules System Model Building Speed Layer Batch Layer Real-time Cluster Processing Impala Map/Reduce Spark 3rd Party Multi-cluster Fraud Architecture Operational Cluster: Model Building, Automated Alerting, Profile Persistence Layer Discovery Cluster Batch model updates, Discovery Analytics, Pattern Detection Real Time Cluster Event Processing, Alerting, Enrichment, Business Rules Business Users HDFS Replication