Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 52 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777 (20)

Anuncio

td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777

  1. 1. TD Ameritrade’s Journey from Data Warehouses to Data Lakes January 31, 2017 Informatica Architecture Series
  2. 2. Today’s Speakers Krishna Sarma Director, Data Development, Data Warehouse, BI & Big Data TD Ameritrade David Lyle VP Business Transformation Services Informatica Amit Kara Big Data Solutions Expert Informatica
  3. 3. David Lyle
  4. 4. By 2017, Marketing will spend more on technology than IT - Gartner
  5. 5. But CMOs do not want to be CIOs
  6. 6. Reduce Customer Churn Better Marketing ad hoc Analysis Better Up-Sell / Cross-Sell Increase Revenue Intelligence: Next best step Understand Marketing Attribution Who is ready to buy now? Better Lead Conversion Increased Wallet Share Marketing Business Outcomes Acquire New Customers Increase Return on Marketing Investment Build Customer Database
  7. 7. Data is the #1 technical bottleneck! Example Problem: Analytics 86% surveyed: “At best only somewhat effective at meeting the primary objective of the data and analytics program.”
  8. 8. The CMO View “Data is our competitive advantage!” “Everything in Marketing has analytics.” “IT is just too slow to deliver the data.” “Marketing needs data self-service to succeed!” “Sometimes fast is more important than perfect.”
  9. 9. The CIO View “My Data Warehouse is rock solid, but inflexible and costly for new Marketing requirements.” “Big Data is interesting, but we need to show business value to Marketing.” “Need to enable Marketing to self- serve data.” “Need to deliver new data at the pace & quality that Marketing requires.” “The organization wants cloud analytics but data will be even harder to manage.”
  10. 10. Analytics: Data Challenges Challenges  Must leverage existing investment  Marketing expects fast IT data delivery  Data locked in application silos  Data volume  Data complexity – 50% external  Lack of trust in the data Newer Requirements  Want to leverage new analytics technology  Want real time data updates & decisions  Moving to hybrid/cloud deployment  Moving from reporting to predictive  Business self-service for data  Need business-lead data governance Business Impact Unable to deliver clean, trusted & timely data in the timeframe required for marketing initiatives
  11. 11. The Data Warehouse is the Beginning of a Journey Data Warehouse: Strengths • Standardized data • “Bet your career” Business decisions • Centralized reporting • High reliability • Stability Data Warehouse: Limitations • Slow to adapt / change • May not handle new data types • Not suitable for ad hoc analysis • Not suitable for self-service • May not handle larger volumes / streaming data • Does not support transactional
  12. 12. Everybody’s Journey Will Vary Data Warehouse Data Warehouse Appliance Cloud Data Warehouse Cloud Data Lake On-premise Data Lake …NOTHING goes away!
  13. 13. "The need for increased agility and accessibility for data analysis is the primary driver for data lakes." Andrew White - 13
  14. 14. An Example Customer Journey • ETL for DW & Applications • Added Realtime • Data Quality B2B • Cloud connectivity - SFDC • MDM • Big Data
  15. 15. High Quality/Controlled Flexibility / Innovation How many widgets did I sell yesterday? Questions Who should I sell to next and what should I offer? Structured & processed data Data Types Any or no data structure Summarized, consolidated data Data Level Atomic data Schema on write Processing Schema on read +++ Adding Data: 3-6 months Agility Highly fluid for additions More mature (improving) Governance & Security Emergent Data Warehouse vs. Data Lake Data Warehouse Data Lake
  16. 16. What Marketing Data Goes Where? Data Warehouse Marketo CRM ERP Log/Clickstream Industry Mobile / Geo Social/Online Sensor Image / Video Voice Trusted historical data Operationalized Insights Marketing Data Lake swamp pond lake
  17. 17. #IWT16 Informatica Data Lake Solution Data Warehouse Marketo CRM ERP Data Sources Marketing Data Lake swamp pond lake Informatica Big Data Management Data Integration Data Quality/Governance Data Security Enterprise Information Catalog Intelligent Data Lake Other… Other OnPrem Cloud Apps Master Data Mgmt.
  18. 18. #IWT16 Informatica Marketing Technology Stack CRM Predictive Marketing Web Content Management SEO and ABM Enterprise Data Warehouse Marketing Automation Marketing Intelligent Data Lake Informatica Marketing-Lake Example Customers and Prospects informatica.com Marketing and Sales Actionable Insights Analytics Social Leads Web Clean, Consistent & Integrated Data Connect Clean Master Validate Enrich Relate Share Informatica Platform
  19. 19. Amit Kara
  20. 20. Building a Data Lake Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next Best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization
  21. 21. Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization Big Data Infrastructure Building a Data Lake
  22. 22. Big Data Processing Big Data Storage Big Data Infrastructure Building a Data Lake – Big Data Infrastructure Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization
  23. 23. On-premise Cloud Hadoop NoSQL Databases Data Warehouse Appliances Real-Time Near Real-Time Batch Database Pushdown Building a Data Lake – Big Data Infrastructure Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization
  24. 24. On-premise Cloud Hadoop NoSQL Databases Data Warehouse Appliances Real-Time Near Real-Time Batch Database Pushdown Data Lake Management Building a Data Lake Management Solution Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization Big Data Analytics
  25. 25. Foundation of a Data Lake Management Solution On-premise Cloud Hadoop NoSQL Databases Data Warehouse Appliances Real-Time Near Real-Time Batch Database Pushdown Metadata Intelligence Data Lake Management Data Visualization Advanced Analytics Predictive Analytics Machine LearningRaw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization
  26. 26. Foundation of a Data Lake Management Solution On-premise Cloud Hadoop NoSQL Databases Data Warehouse Appliances Real-Time Near Real-Time Batch Database Pushdown Metadata Intelligence Big Data Management Data Lake Management Data Visualization Advanced Analytics Predictive Analytics Machine LearningRaw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization
  27. 27. Foundation of a Data Lake Management Solution On-premise Cloud Hadoop NoSQL Databases Data Warehouse Appliances Real-Time Near Real-Time Batch Database Pushdown Metadata Intelligence Big Data Management Intelligent Data Applications Data Lake Management Data Visualization Advanced Analytics Predictive Analytics Machine LearningRaw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization
  28. 28. Key capabilities of Data Lake Management Solution Data Lake Management Big Data Integration Big Data Governance and Quality Big Data Security Self Service Data Preparation Enterprise Data Catalog Data Security Intelligence Metadata Management Data Index Data Discovery Metadata Intelligence Foundation Data Blending Data Pipeline Abstraction Data Integration Transformations Data Parsing Publish and Subscribe Stream Processing & Analytics Data Ingestion Master Data Management Data Matching & Relationships Data Quality Data Profiling Data Retention & Lifecycle Management Data Masking Data Encryption Authorization & Authentication Big Data Storage Big Data Processing Big Data Infrastructure Data Visualization Advanced Analytics Predictive Analytics Machine Learning Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization Big Data Integration Big Data Governance and Quality Big Data Security Metadata Intelligence Big Data Management Intelligent Data Applications
  29. 29. Key capabilities of Data Lake Management Solution Data Lake Management Big Data Integration Big Data Governance and Quality Big Data Security Self Service Data Preparation Enterprise Data Catalog Data Security Intelligence Metadata Management Data Index Data Discovery Metadata Intelligence Foundation Data Blending Data Pipeline Abstraction Data Integration Transformations Data Parsing Publish and Subscribe Stream Processing & Analytics Data Ingestion Master Data Management Data Matching & Relationships Data Quality Data Profiling Data Retention & Lifecycle Management Data Masking Data Encryption Authorization & Authentication Big Data Storage Big Data Processing Big Data Infrastructure Data Visualization Advanced Analytics Predictive Analytics Machine Learning Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization Big Data Integration Big Data Governance and Quality Big Data Security Big Data Management Intelligent Data Applications
  30. 30. Key capabilities of Data Lake Management Solution Data Lake Management Big Data Integration Big Data Governance and Quality Big Data Security Self Service Data Preparation Enterprise Data Catalog Data Security Intelligence Metadata Management Data Index Data Discovery Metadata Intelligence Foundation Data Blending Data Pipeline Abstraction Data Integration Transformations Data Parsing Publish and Subscribe Stream Processing & Analytics Data Ingestion Master Data Management Data Matching & Relationships Data Quality Data Profiling Data Retention & Lifecycle Management Data Masking Data Encryption Authorization & Authentication Big Data Storage Big Data Processing Big Data Infrastructure Data Visualization Advanced Analytics Predictive Analytics Machine Learning Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization Big Data Integration Big Data Governance and Quality Big Data Security Intelligent Data Applications
  31. 31. Key capabilities of Data Lake Management Solution Data Lake Management Big Data Integration Big Data Governance and Quality Big Data Security Self Service Data Preparation Enterprise Data Catalog Data Security Intelligence Metadata Management Data Index Data Discovery Metadata Intelligence Foundation Data Blending Data Pipeline Abstraction Data Integration Transformations Data Parsing Publish and Subscribe Stream Processing & Analytics Data Ingestion Master Data Management Data Matching & Relationships Data Quality Data Profiling Data Retention & Lifecycle Management Data Masking Data Encryption Authorization & Authentication Big Data Storage Big Data Processing Big Data Infrastructure Data Visualization Advanced Analytics Predictive Analytics Machine Learning Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization Intelligent Data Applications
  32. 32. Key capabilities of Data Lake Management Solution Data Lake Management Big Data Integration Big Data Governance and Quality Big Data Security Self Service Data Preparation Enterprise Data Catalog Data Security Intelligence Metadata Management Data Index Data Discovery Metadata Intelligence Foundation Data Blending Data Pipeline Abstraction Data Integration Transformations Data Parsing Publish and Subscribe Stream Processing & Analytics Data Ingestion Master Data Management Data Matching & Relationships Data Quality Data Profiling Data Retention & Lifecycle Management Data Masking Data Encryption Authorization & Authentication Big Data Storage Big Data Processing Big Data Infrastructure Data Visualization Advanced Analytics Predictive Analytics Machine Learning Raw Data Assets Applications & Databases Internet of Things Social & Web Logs 3rd Party Data Data Products e-Commerce Next best Recommendation High Net-Worth Customer Retention Remediation Campaign Management Optimization Marketing Operations Optimization
  33. 33. Informatica’s Comprehensive Solution for Data Lakes INGEST GOVERNPREPARE SECURE ACCESSCATALOGACQUIRE CONSUME COMPREHENSIVE SUPPORT FOR DATA PROCESSING Spark Blaze Tez MapReduce Catalog SearchLineage Recommendations METADATA INTELLIGENCE Spark Streaming COMPREHENSIVE SUPPORT FOR DATA INFRASTRUCTURE Data Preparation Business Glossary Record Linkage Sensitivity Visualization Publish / Subscribe Batch Processing Stream Processing Data Profiling Data Protection Data Mastering Data Lineage Data Parsing Enterprise Data Catalog Big Data Relationships Data Security Intelligence Broadest Connectivity Reusable Workflows Data Quality Informatica Data Lake Management Relational Social Files Device data Weblogs Applications Data Mining Dashboards Files
  34. 34. User Informatica Big Data Management & Amazon EMR Deployment Script Amazon RDS Amazon EC2 Informatica Domain Deploying Big Data Management on AWS One Click Deploy on AWS
  35. 35. Informatica BDM Process Flow using EMR Salesforce, Adobe Analytics Marketo Discover & Profile Parse & Prepare Load to Amazon Redshift / S3 Amazon S3 Input bucket Amazon EMR Amazon S3 Output bucket Amazon Redshift 1 2 3 4 5 6 Corporate Data Center (on-prem) Databases Application Server
  36. 36. 36 TD Ameritrade's Journey from Data Warehouses to Data Lakes January 31, 2017
  37. 37. TD Ameritrade (TDA) 37 Services offered include common and preferred stocks, futures, ETFs, options trades, mutual funds, fixed income, margin lending, and cash management services Work Culture  Agile  Foster Innovation  People Matter, Client Centric, Integrity First, Work Together & Strive To Win
  38. 38. Operational Master Data Analytical Master Data MDM Accts Leads Email Web Orders Quotes VEO Integrated Zone Data Marts Exploration Warehouse BI & Analytics External Data (Market, Vendor) Staging Zone Virtual ODS SFDC Others Documents A B C E Archival Zone Other Risk User DB Marketing User DB Finance User DB Data Landscape at TDA without Hadoop Common Staging Area Interactive Zone Enterprise Data Warehouse SDB Mart HR Mart Client Relationship DM BI / Analytics Ad-Hoc & Standard Reports Data Visualization Textual Analytics Executive Dashboards Exploration & Mining Self-Service User Auth Phone SFDC HR Legacy Etc… Analytics Applications This document contains confidential information for use by TD AMERITRADE Holding Corporation and its subsidiaries. 38 Departmental Databases WFM D
  39. 39. Business Drivers for Data Lake Investment 39 What we can do today vis-à-vis what we want to do going forward a) We know what happened yesterday i. And we want to know what's happening Today & Now ?  How can I model risk analytics in real time to minimize our firm’s exposure? b) We report on less variety of data (structured) i. And we want to tie our data sets with semi / un-structured datasets (text, emails, chats, logs, social, etc.) as the “data” world is changing  Who is talking what @ TDA on the Social media ?  Who is browsing what products on TDA website? And how much time s/he is spending on our web-page ? etc. c) With what we have, we can do good reporting & derive some Intelligence i. And we want to derive actionable insights along with predictive modeling, sentiment analysis, machine learning, etc.  What does “hot” mean when we get a tweet “I feel hot today” ?  How would my revenues be impacted in the event of a future Hurricane “Katrina” or “Sandy” ?
  40. 40. Data Marshalling Yard @ Hadoop at TD Ameritrade Landing Zone Landing area for all files Raw dump Data Quality checks Profiling Masking of Sensitive data Non Integrated  Any apps can consume for further processing One stop shop for all raw files (structured, semi- structured & unstructured) A Enterprise Data Archival Enterprise archival For all data types 24 x 7 x 365 access Vast & in-expensive storage Data can be persisted for 10-15-20 yrs. E Exploratory Analytics & Reporting On all data sets (structured, semi -structured & un-structured) Adhoc analytics, exploration Visualization, dash boarding, scorecarding Reporting (Tableau, BOBJ, etc.) B Advanced Analytics Text mining Sentiment analysis Predictive analytics and modeling Etc. C Application Access Operational reporting Client facing applications & engines connecting to DMY Application tier and workloads Various other uses depending on platform maturity D 40
  41. 41. Operational Master Data Analytical Master Data MDM Accts Leads Emails Web Orders Logs Chat Integrated Zone Data Marts Exploration Warehouse BI & Analytics External Data (Market, Vendor) Staging Zone Virtual ODS SFDC Others Documents A B C E Archival Zone Other Risk DB Marketing User DB Finance User DB Data Landscape at TDA with Hadoop (Phase: Crawl) Common Staging Area Interactive Zone Enterprise Data Warehouse SDB Mart HR Mart Client Relationship DM BI / Analytics Ad-Hoc & Standard Reports Data Visualization Textual Analytics Executive Dashboards Exploration & Mining Self-Service User Auth Phone SFDC Social Text Etc… Analytics Applications This document contains confidential information for use by TD AMERITRADE Holding Corporation and its subsidiaries. 41 Departmental Databases WFM D Data Marshalling Yard (Data Lake) @ Hadoop X X
  42. 42. Operational Master Data Analytical Master Data MDM Integrated Zone Data Marts Exploration Warehouse BI & Analytics External Data (Market, Vendor) Staging Zone Virtual ODS SFDC Others A B C E Archival Zone Other Risk DB Marketing User DB Finance User DB Data Landscape at TDA with Hadoop (Phase: Walk) Common Staging Area Interactive Zone Enterprise Data Warehouse SDB Mart HR Mart Client Relationship DM BI / Analytics Ad-Hoc & Standard Reports Data Visualization Textual Analytics Executive Dashboards Exploration & Mining Self-Service Text Analytics Applications This document contains confidential information for use by TD AMERITRADE Holding Corporation and its subsidiaries. 42 Departmental Databases WFM D Data Marshalling Yard (Data Lake) @ Hadoop X X X X Accts Leads Emails Web Orders Logs Chat Documents User Auth Phone SFDC Social Etc…
  43. 43. Operational Master Data Analytical Master Data MDM Integrated Zone Data Marts Exploration Warehouse BI & Analytics External Data (Market, Vendor) Staging Zone Virtual ODS SFDC Others A B C E Archival Zone Other Risk DB Marketing User DB Finance User DB Data Landscape at TDA with Hadoop (Phase: Run) Common Staging Area Interactive Zone Enterprise Data Warehouse SDB Mart HR Mart Client Relationship DM BI / Analytics Ad-Hoc & Standard Reports Data Visualization Textual Analytics Executive Dashboards Exploration & Mining Self-Service Analytics Applications This document contains confidential information for use by TD AMERITRADE Holding Corporation and its subsidiaries. 43 Departmental Databases WFM D Data Marshalling Yard (Data Lake) @ Hadoop X X X X X The “T” of ETL Accts Leads Emails Web Orders Logs Chat Documents User Auth Phone SFDC Social Etc… Text
  44. 44. Operational Master Data Analytical Master Data MDM Integrated Zone Data Marts Exploration Warehouse BI & Analytics External Data (Market, Vendor) Staging Zone Virtual ODH A B C E Archival Zone Other Risk DB Marketing User DB Finance User DB Data Landscape at TDA with Hadoop (Phase: Glide) Common Staging Area Interactive Zone Enterprise Data Warehouse SDB Mart HR Mart Client Relationship DM BI / Analytics Ad-Hoc & Standard Reports Data Visualization Textual Analytics Executive Dashboards Exploration & Mining Self-Service Analytics Applications This document contains confidential information for use by TD AMERITRADE Holding Corporation and its subsidiaries. 44 Departmental Databases D Data Marshalling Yard (Data Lake) @ Hadoop X X X X X X No SQL The “T” of ETL Accts Leads Emails Web Orders Logs Chat Documents User Auth Phone SFDC Social Etc… Text
  45. 45. Operational Master Data Analytical Master Data MDM Integrated Zone Data Marts Exploration Warehouse BI & Analytics External Data (Market, Vendor) Staging Zone Virtual ODH A B C E Archival Zone Other Risk DB Marketing User DB Finance User DB Data Landscape at TDA with Hadoop (Phase: Fly) Common Staging Area Interactive Zone Enterprise Data Warehouse SDB Mart HR Mart Client Relationship DM BI / Analytics Ad-Hoc & Standard Reports Data Visualization Textual Analytics Executive Dashboards Exploration & Mining Self-Service Analytics Applications This document contains confidential information for use by TD AMERITRADE Holding Corporation and its subsidiaries. 45 Departmental Databases D Data Marshalling Yard (Data Lake) @ Hadoop X X X X X No SQL The “T” of ETL Application Access Operational reporting Client facing applications & engines connecting to DMY Application tier and workloads Various other uses depending on platform maturity Accts Leads Emails Web Orders Logs Chat Documents User Auth Phone SFDC Social Etc… Text
  46. 46. Hadoop at TD Ameritrade – Lessons Learning  46  If you are not making mistakes then you are not learning  Evolutionary approach over Revolutionary  Data can be useful even before it is perfected  A goal without a plan is only a wish
  47. 47. Hadoop at TD Ameritrade – Tips & Tricks 47 1. Network bandwidth & Firewalls 2. Organize your datasets: a) Velocity (Batch, NRT, RT) b) Variety (logs, email, text, chats, social, structured, etc.) 3. Data profiling 4. Data Ingestion frameworks 5. Begin with non-SII/PII datasets 6. Light Governance (to begin with)
  48. 48. Best Practices – What our customers tell us  Plan for Cloud and on-premise (Hybrid)  Do Look for a data management platform that supports all use cases  DO connect your Data Lake with a business initiative • Start small, show value quickly  DO leverage your current investment • Current data management • Data warehouse / analytics • Data Governance  DON’T create new silos of data / technology  DO leverage new kinds of data, new technology- if they can accelerate business value delivery
  49. 49. Best Practices for Architects  DO design your architectures to specifically enable these benefits • Cloud for time-to-value and flexibility • Data Lakes for flexibility and innovation  DO plan bi-directional data flows from Data Warehouse to Data Lake  DO leverage cloud, big data, NoSQL, Columnar… as business needs require  DO Standardize on a single data management platform • High productivity & flexibility • Pre-integrated: easy to maintain, upgrade • Connects to any data source or target • Supports big data, on-premise, cloud • Handles all of your integration use cases • Enables re-usable people, skills, code 42% prefer an integrated DI suite. (#1 response) TDWI
  50. 50. Resources February 2nd, 2017 Informatica Marketing Data Lake Demo bitly.com/infalake March 8th, 2017 Genesis Housing: Modern Hub Architecture to Power Digital Transformation Watch for posting on BrightTalk.com Upcoming Webinars The Complete Marketing Data Lake Management Reference Architecture https://www.informatica.com/datalake-ref-bdm-on-aws Reference Architecture
  51. 51. Questions?
  52. 52. Thank You for Attending!

×