Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Hadoop Appliance

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Hadoop Appliance

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Hadoop Appliance

  1. 1. DMT 3260 Citizens Bank Data Lake Implementation: Selecting BigInsights ViON Spark/Hadoop Appliance Dana Rafiee, Destiny Corporation John DiFranco, Citizens Bank
  2. 2. DMT 3260 Order of Presentation Destiny Background The Data Scientist Client Infrastructure Challenges Tools Used at Clients Client Architecture Case Studies Citizens Bank Financial Processing Organization
  3. 3. DMT Citizens Bank, formerly part of the Royal Bank of Scotland, is implementing a BigInsights Hadoop Data Lake with PureData System for Analytics (Netezza) to support all of its internal data initiatives. The goal is to provide an improved experience for customers and to grow market share. Along their ETL journey, we’ve used Netezza SQL, Hadoop and finally IBM BigIntegrate and BigInsights. Testing BigIntegrate on BigInsights yielded the productivity, maintenance and performance that Citizens was looking for, and this all came prepackaged in the the ViON Hadoop Appliance that was rolled into its data centers—greatly simplifying entry into the Hadoop world Abstract
  4. 4. DMT 3260 Destiny Background • Business and Technology Consulting Firm • Advising Fortune 500 Corporations for 30 years • Build Data Lakes, Warehouses, Reporting and Analytics environments for large corporations and government • Business Consultants • Data Warehouse/Modeling Specialists • Advanced Analytic Practitioners • SAS and IBM Business Partner • Objective Opinions Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  5. 5. DMT 3260 Who is the Data Scientist? • Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured. • Statistics • Machine learning • Data mining • Predictive analytics • “Data Scientist is the new title for the Analyst” • Paul Kent, VP of Big Data at SAS Institute Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  6. 6. DMT 3260 Requirements of the Data Scientist Community • Immediate access to data no matter where it exists • Simple access to systems • Legacy and Open Community Tools • Ample resources to do their work • Ability to store analytical results • Fast Execution • Access to In-House Data and External Data • Nimble IT shop or I will find another option (Cloud) Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  7. 7. DMT 3260 Why is the Playing Field Different Today? • Legacy Data and Systems • OLTP Systems of Record • Mainframes • Data Warehouses and Marts • Dark Data (Archived) • New Data Sources • Social Media • Internet of Things • Streaming Data • Data Brokers – Search Yourself? Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  8. 8. DMT 3260 Some Big Data Use Cases • Macy’s Inc. - Real-Time Pricing on 73 Million items based on demand and inventory. • Tipp24AG - Betting on European lotteries with predictive analytics, building models in less than 10% of the time. • Walmart – Text Analytics, machine learning and synonym mining to produce relevant web site search results increasing conversions by 10-15%. • Fast Food and Digital Menus – Long drive through lines display quick delivered items, while short lines display higher margin items that take longer to prepare. • Morton’s Steak House – For a publicity stunt, analyzed tweets about Morton’s, matched data to a frequent Morton’s diner and then delivered him dinner has he landed in the airport. • PredPol Inc. – Los Angles and Santa Cruz Police use data about earthquakes and crime to predict where crimes will happen after an earthquake. There is up to a 33% reduction in crimes. • Tesco PLC – Track 70 million refrigerator data points to be more proactive with maintenance and cut down energy costs. • American Express – Predicting and reducing customer churn through analysis of historical buying patterns. • Express Scripts Holding Co. – Through analysis, determined people were forgetting to take their medications. Invented beeping medicine capsules and implemented automated phone calls. • Infinity Property and Casualty Corp.- Re-analyzing dark data on claims now allow them to recover $12M in subrogation claims. Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  9. 9. DMT 3260 IT’s Challenges in Supporting the Data Scientists • Building Proper Infrastructure to Support the Business – Timely Access to data and systems – Simple to use – Open to new technologies and capabilities – Accurate data – Current data to support business needs – Powerful enough to crunch all the data – Fast or Cheap – Robust and Reliable in an Open Environment – On-Premise or Cloud or Hybrid – Support Mandated Regulations Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  10. 10. DMT 3260 The Traditional IT Architecture Mainframe Data WarehouseData Input Analyst Information Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  11. 11. DMT 3260 Why is it Not Enough? • Inflexible • Cannot capture new forms of data • Cannot easily analyze new forms of data • Cannot economically handle large data volumes • Cannot easily integrate with the Open Community • Long Lead Times for IT Projects Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  12. 12. DMT 3260 Designing the New Infrastructure • New Non-Standard Data Sources • Structured • Unstructured • Streaming • NOSQL forms • External Sources • Ability to Land All Data Economically • Let the business decide what data is required • New Analytics Requirements Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  13. 13. DMT 3260 Some IT Infrastructure Considerations • Limited Budgets and Resources • Master Data Management • Hadoop – Bronze, Silver, Gold – Single copy of the Data – Spectrum Scale/GPFS – Other Options • Storage Mechanisms – Elastic Storage Server – DS8800, XIV – Flash • Types of Queries • Historical Information • Speed of Processing – Fast, Expensive – Slow, Cheap • Location – On-Premise – Cloud • Mobile Device Requirements • Virtual Desktop • Keeping Data In-Sync – Production and DR – Update Strategies – Replication Strategies – Database – SAN Store Utilities • Data In-Flight • Data Lineage • Appliances – PDA/Netezza – SAP/Hana on Power – DB2 Blu – On Premises – DataAdapt Spark Hadoop Appliance (BigInsights) • Grid Processing • Regulatory Compliance • Data Governance • In-house maintenance or Managed Service • IEEE 802.3ba 40GbE, Direct Attached SAN, NAS • Politics Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  14. 14. DMT 3260 Data Classifications 0 0.5 1 1.5 2 2.5 3 Bronze Silver Gold Volume Data Scientist Power User BI End User Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  15. 15. DMT 3260 Discovery and Transformation of Data • Tools to Analyze and Transform Data – Data Stage – Podium – Trillium – DataFlux – Informatica – Talend • User Tools to Gain Insight into the Data – Watson Explorer – Attivio • In-Database • In Memory and Machine Learning – Apache Spark – Micro Batches – Apache Flink – Streaming Data Flow Engine and Memory Management • Other Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  16. 16. DMT 3260 Building Analytics Processes and the Challenges • Three Categories – Ad Hoc – Standard Analysis and Reporting – Statistical Models • Challenges for IT – Skill Sets of the Data Scientist and Power Users – Playing Nicely Together – Structure of the Data – Data Modeling vs. SQL Tools – Location and Movement of the Data Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  17. 17. DMT 3260 Case Studies • Citizens Bank BigInsights Deployment • Global Financial Advisors Deployment • Financial Processing Organization Design Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  18. 18. DMT 3260 Citizens Bank Original Environment • Teradata Data Warehouse • Raw Data and History (Staging from record systems) • Conformed Data to a Data Model (Mapped to industry standard model) • Data Marts (Fit for purpose business specific)
  19. 19. DMT 3260 Challenges with the Teradata Environment • Processing on Teradata was slow due to: • Traditional Teradata Data Warehouse Framework • Reference Model • Slow Time to Market • Extremely Expensive in Labor Costs • Extremely Expensive to add Additional Computing Capacity • System and SAS costs increasing
  20. 20. DMT 3260 Looking for Alternatives • Execution of an information Proof of Concept • IBM • Oracle • Cloudera • Hortonworks
  21. 21. DMT 3260 Conclusions and Choices Made • The IBM BigInsights Appliance is the most cost effective • Minimal engagement from internal infrastructure organization • Delivered fully assembled with hardware and software • Appliance Model value proposition similar to a Netezza Appliance
  22. 22. DMT 3260 Standard Tools at Citizens • IBM BigSQL • assurance that standard tools would work well with (DB2 LUW V 10.5) • All products support this platform • Oracle OBI-EE – Operational Reporting • SAS for Statistical Modeling • Tableau for Visual Reporting • Datastage for ETL – centralized application development model • Spectrum Scale(GPFS) vs. Hadoop for better management of the data and less raw storage • Fluid Query for connections to BigInsights
  23. 23. DMT 3260 POC on BigInsights Appliance • Datastage processing running on Teradata was moved to BigInsights • Client Connectivity, queries, testing and validation • Proved that the platform could be used as the server and storage to run enterprise data stage processing
  24. 24. DMT 3260 Results • Moved Analytics processing from Teradata to Netezza (cost/performance) • Increase in SAS performance by running in Netezza database • Repurposed some SAS costs • Reduced data warehouse admin support costs (Teradata DBAs reallocated) • Implemented BigInsights Hadoop for a data lake (staging and conformity) • Avoided large capital outlays for additional Teradata capacity • Reduction in Labor Effort to use the new platforms
  25. 25. DMT 3260 Future Plans • Evaluating and Planning Implementation of dashDB (Bridge to Cloud) to move some items to Cloud • Instead of paying for another year of S&S, using the funds for Bridge to Cloud • Attractive price point • Adding new applications (Risk) to Netezza and the Data Lake
  26. 26. DMT 3260 Complimentary Consultation o Contact Us at: info@destinycorp.com • Discovery Session • Analysis of Architecture • Business Process • Governance • High Level Recommendations
  27. 27. DMT 3260 Questions and Answers
  28. 28. DMT 3260 Contact Information Dana Rafiee Managing Director Destiny Corporation 860-721-1684 x2007 drafiee@destinycorp.com www.destinycorp.com John DiFranco SVP - Director of Enterprise Data Management Citizens Bank John.difranco@citizensbank.com www.citizensbank.com 781-655-4489 Thank you for your time

×