Más contenido relacionado

Presentaciones para ti(20)



Similar a 2015 02 12 talend hortonworks webinar challenges to hadoop adoption(20)

Más de Hortonworks(20)


2015 02 12 talend hortonworks webinar challenges to hadoop adoption

  1. 1 ©2015 Talend Inc. Challenges  to  Hadoop  Adop0on:   If  You  Can  Dream  It,  You  Can  Build  It     February  12,  2015
  2. 2 Welcome   A  few  logis0cal  points..     •  All  par0cipants  are  muted   •  You  may  ask  ques0ons  using  the  Q&A  panel  located  on   boFom  or  GoToWebinar  applet   •  Answers  will  be  provided  aJer  the  presenta0on   •  If  0me  is  too  short  to  address  all  ques0ons,  answers  will  be  provided  via  email   •  To  receive  a  replay  of  our  webinar  today,  please  send   us  an  email  to   •  If  you  are  experiencing  connec0on  problems,  please   use  the  Q&A  panel  to  communicate  
  3. 3 ©2015 Talend Inc. Challenges  to  Hadoop  Adop0on:   If  You  Can  Dream  It,  You  Can  Build  It     February  12,  2015
  4. 4 Your  Speakers  Today     Jim Walker Director, Product Marketing Shawn James Director, Alliances & Business Development Mark Balkenende Sr. Sales Solution Architect
  5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP Winter 2015 Version 1.0 Hortonworks. We do Hadoop.
  6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Traditional systems under pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 40 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional
  7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop emerged as foundation of new data architecture Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data •  Built by Yahoo! to be the heartbeat of its ad & search business •  Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises •  Incredibly disruptive to current platform economics Traditional Hadoop Advantages ü  Manages new data paradigm ü  Handles data at scale ü  Cost effective ü  Open source Traditional Hadoop Had Limitations " Batch-only architecture " Single purpose clusters, specific data sets " Difficult to integrate with existing investments " Not enterprise-grade Application Storage HDFS Batch Processing MapReduce
  8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Modern Data Architecture emerges to unify data & processing Modern Data Architecture •  Enable applications to have access to all your enterprise data through an efficient centralized platform •  Supported with a centralized approach governance, security and operations •  Versatile to handle any applications and datasets no matter the size or type Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   SOURCES Existing Systems ERP   CRM   SCM   ANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICS Applications Business Analytics Visualization & Dashboards ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) YARN: Data Operating System Interactive Real-TimeBatch Partner ISVBatch BatchMP P   EDW  
  9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop adoption follows a predictable journey Cost Optimization, new analytic apps, and ultimately to a “data lake”
  10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Cost optimization Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context ANALYTICS Data Marts Business Analytics Visualization & Dashboards HDP helps you reduce costs and optimize the value associated with your EDW ANALYTICSDATASYSTEMS Data Marts Business Analytics Visualization & Dashboards HDP 2.2 ELT ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Cold Data, Deeper Archive & New Sources Enterprise Data Warehouse Hot MPP In-Memory Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   Existing Systems ERP   CRM   SCM   SOURCES
  11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Single View Improve acquisition and retention Predictive Analytics Identify your next best action Data Discovery Uncover new findings Financial Services New Account Risk Screens Trading Risk Insurance Underwriting Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement Telecom Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers Retail 360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior Manufacturing Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields Healthcare Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service Oil & Gas Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells Government Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting Hadoop Driver: Advanced analytic applications
  12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Enabling the data lakeSCALE SCOPE Data Lake Definition •  Centralized Architecture Multiple applications on a shared data set with consistent levels of service •  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities. •  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value. Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps Goal: •  Centralized Architecture •  Data-driven Business DATA LAKE Journey to the Data Lake with Hadoop Systems of Insight
  13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Challenges to Hadoop Adoption •  Where do I start? Why is this of value to me and my organization? •  Hadoop is complex, what do I use for what? •  It is too complex. I don’t have any trained Hadoop resources. Many have been down this path…
  14. 14 Connec3ng  the  Data-­‐Driven  Enterprise
  15. 15 Main  Challenges  in  the  Data  Integra3on  Market   BIG  DATA   More  data,  less  structure PRODUCTIVITY   Can’t  keep  up  with  demand   COST   Expensive  solu3ons   SKILLS   Hard  to  find  talent  
  16. 16 The  Big  Data  Demand   4.4  MILLION  JOBS   IN  BIG  DATA  BY  2015   but  only  one  third  of   those  jobs  will  be  filled  Source: Gartner
  17. 17 The  Hadoop  Ecosystem  is  Complex   Source:  “Hadoop  Ecosystem  Overview”,  Forrester  2014  
  18. 18 Talend  Brings  Unmatched  Produc3vity     HAND-­‐CODING   •  Unproduc3ve •  Need  specialized  skills •  Hard  to  maintain •  Limited  support TALEND  ENTERPRISE   •  800+  components •  Generates  op3mized  code •  Collabora3on  &  management •  Gold  support  (SLAs)
  19. 19 Future-­‐Proof  Architecture  With  Na3ve  Code  Gen   ETL   Day-­‐to-­‐day   integra3on ELT   DW  Appliance ESB   Messaging,  Rou3ng,   Transforma3on HADOOP   Highly Scalable Spark
  20. 20Select Icons made by Freepik, Situ Herrera, Talend  Big  Data   Legacy Systems ERP Internet of Things DBMS / EDW NoSQL Standard Reports Ad-hoc Query Tools Data Mining MDD/OLAP Analytical Applications NoSQL Web Logs Develop and Test Operations Team Studio Talend Big Data Ingestion Map Profile Parse Match Cleanse Standardize Change Data Capture Machine Learning Share Schedule Native Access Future Proof Architecture Lowest TCO Increased Productivity Benefits
  21. 21 Easiest  and  Most  Powerful  Integra3on  Solu3on  for  Big  Data Talend  Big  Data  
  22. 22 Main  Challenges  in  the  Data  Market   SCALABLE   AGILE   LOWEST  TCO  EASY  
  24. 24 ©2015 Talend Inc Live  Demo  
  25. 25 Key  Takeaways   •  See  how  Talend’s  Big  Data  Pla[orm  addresses  the  Skills  Gap •  See  how  Talend  will  increase  your  Big  Data  Produc3vity •  Agree  Talend  and  Hortonworks  has  the  technology  and  skills  to   sa3sfy  your  business  requirements BIG  DATA   More  data,  less  structure PRODUCTIVITY   Can’t  keep  up  with  demand   SKILLS   Hard  to  find  talent  
  26. 26 Demonstra0on  Use  Case   Objec3ve  of  the  Use  Case  was  to  iden3fy    data  quality  issues  prior  to  loading  data  to  the   EDW  without  increasing  the  actual  load  window. •  Load    500  TB  Compressed  Files  to  HFDS -  3rd  Party  Sales/Prescribing  files  delivered  by  Vendor •  Compute  Monthly  Totals -  Prior  to  loading  to  EDW  compare  prior  month’s  totals  to  current  Month  totals  within  new  data   files   •  Display  Comparison  results  in  Analy3cal  Tool -  Display  total  Sales  comparison  for  each  Product  to  quickly  show  Data  Quality  issues  before   loading  to  EDW  Staging
  27. 27 Typical  3rd  Party  Data  Load   Data Preparation Warehouse Processing Final Reports / Quality Check Bad Big Data Quality issues results in lost time, resource & revenue
  28. 28 Data  Warehouse  Op0miza0on   Data Preparation Warehouse Processing Final Reports / Quality Check Hadoop Cluster ü Upfront Quality Checks ü Identify Master records earlier ü Load Uncompressed data directly to DWH staging Optimized Loading
  29. 29 ©2015 Talend Inc Live  Demo  
  30. 30 What  stood  out  most? Recap  on  the  Demonstra0on?     •  Hortonworks  and  Talend  can  help  you  reduce  costs   •  Offload  costly  ETL  process   •  Enrich  the  value  of  your  EDW •  Graphical  drag  and  drop  visual  environment  showcasing   Talend  and  Hortonworks  
  31. 31 Hortonworks/Talend  Sandbox   •  Graphical  drag  and  drop  visual  environment  showcasing  Hortonworks -  Visually  see  the  results  of  integra3on  process •  Accelerates  data  loading  and  transforma3on  with  Hadoop -  Build  and  deploy  MapReduce  and  Pig  jobs  on  YARN •  Pre-­‐built  use  cases:    data  warehouse  op3miza3on,  clickstream  data,  Twiger  sen3ment,   Apache  weblogs •  Demonstra3ons  of  several  NoSQL  databases  
  32. 32 From  Zero  to  Big  Data  in  10  Minutes   Download free­‐sandbox •  Get up and running in minutes, not weeks, with a big data Sandbox and demos •  Includes: Sentiment analysis, ETL Offload, Log file analysis •  Start working with Talend & Hortonworks today!
  33. 33 ©2015 Talend Inc Back  up  slides  
  34. 34 HDFS2  (Redundant,  Reliable  Storage)   YARN  (Cluster  Resource  Management)       BATCH   (MapReduce)   INTERACTIVE   (Tez)   STREAMING   (Storm,  Spark)   GRAPH   (Giraph)   NoSQL   (MongoDB)   Events   (Falcon)   ONLINE   (HBase)   OTHER   (Search)   TRANSFORM  (Data  Refinement)   PROFILE   PARSE  MAP   CDC  CLEANSE   STANDARD-­‐   IZE   MACHINE   LEARNING   MATCH   TAP   (Inges3on)   SQOOP   FLUME   HDFS  API   HBase  API   HIVE   800+   DELIVER   (as  an  API)   Ac3veMQ  Karaf   Camel  CXF   Kaca  Storm   Meta  Security   MDM  iPaaS   Govern  HA   Reference  Architecture