Más contenido relacionado

Presentaciones para ti(20)



Similar a Big Data Simplified - Is all about Ab'strakSHeN(20)

Más de DataWorks Summit(20)


Big Data Simplified - Is all about Ab'strakSHeN

  1. Big Data Simplified "Is all about abˈstrakSH(ə)n" H E M A L G A N D H I D I R E C TO R O F D ATA E N G I N E E R I N G
  2. Introduction
  3. Background
  4. Analyze Current State • Challenges • Facts New Platform Design • Define Goals • Feature List • Implementation Approach Compare • Feature List • Trade Offs • Cost Structure Decision Fix vs. Build?
  5. Analyze Current State
  6. Platform is very complex
  7. Struggling to keep up with business needs
  8. Huge backlog
  9. Code base is increasing rapidly
  10. We are slow to respond to market needs
  11. Outdated technology stack
  12. Missing best practices
  13. High cost of data Storage Finding InsightsIntegration Maintenance
  14. Strategic Value Data Identity Time Value Dependencies Lack of understanding business impact of data
  15. Agile – mini waterfall
  16. Process and Organization High Investments Costs Adoption Issues Complex Framework
  17. Lot of Challenges
  18. NOT scalable platform Can impact revenue negatively!!!
  19. New Platform Design
  20. Keep it simple
  21. Keep up with business needs
  22. Move fast
  23. Keep technology stack current over time
  24. Low cost of data Storage Finding InsightsIntegration Maintenance
  25. Strategic Value Data Identity Time Value Dependencies Understand business impact of data
  26. Measure data
  27. Be Agile – Do Less
  28. Improve data ROI
  29. Compare
  30. Investment needs Current Platform High New PlatformVs. High
  31. Scalability Current Platform Not Scalable New PlatformVs. Initially Scalable
  32. Maintenance cost Current Platform High New PlatformVs. Initially low, grows over time
  33. Technology Current Platform Outdated New PlatformVs. Big Data tools provide technology not solutions to design problems
  34. Technology choices
  35. Decision Fix vs. Build?
  36. Next Steps
  37. Build a feature based scalable big data platform in 6 months with limited resources while supporting legacy system. Goal
  38. Design Patterns
  39. Take Platform Approach Project Requirements Data Platform Features Reusable Components
  40. Technology Abstraction Business Logic Declarative Configuration Pick Technology at Runtime Execution Engine
  41. Data Access & Ingestion Abstraction Data Storage Data Access APIData Ingestion Framework Data Producers Data Consumers
  42. Data Integration Jobs Stream Data to Storage Layer Data Storage Data Integration Jobs Stream
  43. Hot Data Hot/Cold Data Management Cold Data Configuration Configuration
  44. abˈstrakSH(ə)n
  45. High Level Architecture
  46. Data Quality Service (Data Lineage & Profiling) Security Scheduling & Cluster Monitoring Applications & Visualization Tools Dredge Collection • Apache Flume • Sqoop Flow • Kafka • Spark Processing • PIG • Spark • Map Reduce Storage • Hive • HBase • Vertica Delivery • Looker • Tableau • Visualization (d3.js) • Email/FTP Data Platform Data Access Abstraction Architecture
  47. A declarative, abstraction layer for integrating big data tools, enabling loosely coupled big data platform. WHAT IS DREDGE
  48. Dredge Logical View Events ManagementLog Streaming Tasks Hadoop Cluster Source Readers Target Writer Streams /Direct Dredge Repository – HBase Target End Points Source End Points Configuration Abstraction
  49. Dredge Repository – HBase LAMDA Architecture : HDFS, Hive, HBase, PIG, Flume, Kafka, Oozie Dredge Runtime Temp Store - HDFS Event Management Temp Cache- HDFS Logger Stream Dredge Data Services Aggregator UDF’s Combiners, Routers.. Plugin (Java/Shell, PIG, SQL) Rank, Sorter Set Operations Filters/Patterns Analysis Abstraction builder (Kafka, Flume, Pig, Custom) Source Readers (Logs, RDBMS, unstructured data, Custom) Direct/Stream Target Writers (Hive, HBase, RDBMS, Custom) Direct/Stream Dredge UI Declarative configuration Logical Flows Data Lineage Runtime Logs Admin Dredge Architecture
  50. • From 1000+ scripts to 50-100 scripts • From 1000+ configuration files to <5 files • Logical view of workflow, abstract physical implementation • Quickly integrate new tools, declarative configuration implementation for big data tools • Improved SLA, time to market, better cluster utilization, higher performance • Simplified integration • Minimal migration costs • Low maintenance, configurable archiving of data DREDGE BENEFITS
  51. Summarizing
  52.  Abstraction layer  Technology  Data access  Data ingestion  Dependencies… It is all about abˈstrakSH(ə)n  Reusable data components  Event driven dependencies  Plug & Play integration, loosely coupled (Cluster resources, Data)
  53. Big data requires a different mindset: Innovate, iterate often and keep it simple.
  54. Thank you. E N G I N E E R I N G . O N E K I N G S L A N E . C O M

Notas del editor

  1. ----- Meeting Notes (6/3/15 10:27) ----- Agile speed vs agile vigor Cleanup code retire code