Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Creating a Next-Generation Big Data Architecture

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 45 Anuncio

Creating a Next-Generation Big Data Architecture

Descargar para leer sin conexión

If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.

Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.

Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:

-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture

If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.

Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.

Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:

-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a Creating a Next-Generation Big Data Architecture (20)

Más de Perficient, Inc. (20)

Anuncio

Creating a Next-Generation Big Data Architecture

  1. 1. Big Data Architectural Series: Creating a Next-Generation Big Data Architecture facebook.com/perficient twitter.com/Perficientlinkedin.com/company/perficient
  2. 2. 2 Perficient is a leading information technology consulting firm serving clients throughout North America. We help clients implement business-driven technology solutions that integrate business processes, improve worker productivity, increase customer loyalty and create a more agile enterprise to better respond to new business opportunities. About Perficient
  3. 3. 3 • Founded in 1997 • Public, NASDAQ: PRFT • 2013 revenue $373 million • Major market locations: • Allentown, Atlanta, Boston, Charlotte, Chicago, Cincinnati, Columbus, Dallas, Denver, Detroit, Fairfax, Houston, Indianapolis, Lafayette, Minneapolis, New York City, Northern California, Oxford (UK), Philadelphia, Southern California, St. Louis, Toronto, Washington, D.C. • Global delivery centers in China and India • >2,200 colleagues • Dedicated solution practices • ~90% repeat business rate • Alliance partnerships with major technology vendors • Multiple vendor/industry technology and growth awards Perficient Profile
  4. 4. BUSINESS SOLUTIONS Business Intelligence Business Process Management Customer Experience and CRM Enterprise Performance Management Enterprise Resource Planning Experience Design (XD) Management Consulting TECHNOLOGY SOLUTIONS Business Integration/SOA Cloud Services Commerce Content Management Custom Application Development Education Information Management Mobile Platforms Platform Integration Portal & Social Our Solutions Expertise
  5. 5. Our Speaker Bill Busch Sr. Solutions Architect, Enterprise Information Solutions, Perficient • Leads Perficient's enterprise data practice • Specializes in business-enabling BI solutions that enable the agile enterprise • Responsible for executive data strategy, roadmap development, and the delivery of high-impact solutions that enable organizations to leverage enterprise data • Bill has over 15 years of experience in executive leadership, business intelligence, data warehousing, data governance, master data management, information/data architecture and analytics
  6. 6. Perficient’s Big Data Architectural Series Business Case Next Generation Architecture Future Topics • Data Integration • Stream Processing • NoSQL • SQL on Hadoop • Data Quality • Governance • Use Cases & Case Studies Today’s Webinar
  7. 7. Today’s Objectives 5 Architectural Roles For Hadoop Hadoop Ecosystem Potential vs. Reality Realizing A Hadoop Centric Architecture
  8. 8. Today’s Objectives 5 Architectural Roles For Hadoop Hadoop Ecosystem Potential vs. Reality Realizing A Hadoop Centric Architecture
  9. 9. “Big Data is high-volume, high-velocity and high- variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” Convergence of structured, unstructured, and dark data Big Data is the evolution of data creating similar data management issues that IT has struggled to address for the last 20+ years. Three Views of Big Data
  10. 10. “Big Data is high-volume, high-velocity and high- variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making.” Convergence of structured, unstructured, and dark data Big Data is the evolution of data creating similar data management issues that IT has struggled to address for the last 20+ years. Three Views of Big Data
  11. 11. Common Big Data Business Use Cases Improve Strategic Decision Making Customer Experience Analysis Operational Optimization Risk and Fraud Reduction Data Monetization Security Event Detection and Analysis IT Cost Management
  12. 12. Expanding Data Ecosystem • Customer Intelligence • Operations • Risk& Fraud • Data Monetization • Strategic Development • Security Intelligence • IT Optimization Structured Data (5-20% of Total) Point-of-Sale Text Messages Contracts & Regulatory Preferences & Emotions Security AccessWeather Machine Data Automobile Mobile Communications Geospatial Social Data Ecosystem
  13. 13. Enterprise Data Architecture Next Generation
  14. 14. The Promise Data Architecture Simplification Data Integration Data Hub Analytics Stream Processing Data Warehouse Operational Data Hadoop Cluster
  15. 15. The Reality Maturity Limits the Use Cases • Realize the potential of Hadoop • Multi-tenancy is in its infancy • Hadoop 2.0 and YARN • Most third-party applications are just moving to YARN • Hive (and other SQL on Hadoop solutions) maturing • Robust enterprise functionality is evolving • Security • High Availability
  16. 16. Different Types of “Open Source Hadoop” Apache Projects Only Proprietary Value Add & Re- Development Apache Projects + Proprietary Add-ons Packaged and Online Solutions • IBM Big Insights • Oracle Big Data Appliance • HDInsight • Many others! Choosing A Hadoop Distribution  Company Philosophy  Current Relationships  Acceptable Risk  Specialized Functionality
  17. 17. Quick Primer on YARN What is Yarn? • Yet Another Resource Manager • Sometimes referred as MapReduce 2.0 • Data operating system • Fault-Tolerance Why is this important? • Enables multi-tendency on Hadoop • Moves processing to the data *Image Provided by HortonWorks
  18. 18. Today’s Objectives 5 Architectural Roles For Hadoop Hadoop Ecosystem Potential vs. Reality Realizing A Hadoop Centric Architecture
  19. 19. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  20. 20. Enterprise Data Architecture Next Generation
  21. 21. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  22. 22. Analytical Processing Source Wrangle Data Model & Tune Operationalize1 2 3 4 • Data Ingestion • Metadata Management • Data Access • Data Preparation Tools • Data Discovery &Visualization • Data Wrangling Tools • Business Glossary & Search • Data Access • Data Discovery & Visualization • Analytical Tools • Analytical Sandbox • Business Created Reporting • Model Execution & Management • Knowledge Management (Portal) Analytical Process Architectural Capabilities
  23. 23. Analytical Processing Source Wrangle Data Model & Tune Operationalize1 2 3 4 • Data Ingestion • Metadata Management • Data Access • Data Preparation Tools • Data Discovery &Visualization • Data Wrangling Tools • Business Glossary & Search • Data Access • Data Discovery & Visualization • Analytical Tools • Analytical Sandbox • Business Created Reporting • Model Execution & Management • Knowledge Management (Portal) Analytical Process Architectural Capabilities
  24. 24. Data Access • There are many methods to accessing Big Data • Direct HDFS • NoSQL / Connector • Hive/ SQL On Hadoop • Align tool to access methods and file types • Data Preparation • Analytics Source Files/Data Tidy Data Data Preparation Tool Analytics Tool Analytical Result Read Access Write Access Key Hadoop Cluster
  25. 25. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  26. 26. Data Warehouse Roles • Two models for splitting processing • Hot – Cold • Data Warehouse Layer • Push high user loads to traditional data warehouses • Fully investigate DW- Hadoop connector functionality • Leverage opportunity to use in-memory database solutions Data Warehouse Layer Approach Hadoop Cluster Traditional DW/DM Hot – Cold Data Warehouse Cold Data Hadoop Cluster Traditional DW/DM Hot Data
  27. 27. Data Warehouse Organize Your Data • Types of data stored on cluster • Analytical sandboxes • Team • Individual • Quotas • Potential to replace information lifecycle management solutions • No right answer – clearly define usage Consolidated Data Streaming Queues Delta’s (Incremental) Common Data (Dimensions, Master Data) Improved / Modeled Data Published, Analytical and Aggregates Sandbox Zone Raw Data Processed Data Hadoop Cluster Archived Data
  28. 28. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  29. 29. Stream and Event Processing • Dedicated vs. Shared Model • Persistence of messages, logs, etc. • Long-term storage • Queuing • Pre-load (HDFS) vs. Post-load processing • Micro-Batch vs. One-at-a-Time • Programing language support • Processing guarantee • At most once • At least once • Exactly once Let business requirements drive need for streaming solutions. It is acceptable to use more than one solution as long as the roles / purposes of each are clearly defined.
  30. 30. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  31. 31. The Data Integration Challenge Key Point: Hadoop and Hadoop-related technologies can address these challenges. However, they must be architected and governed properly Volume, variety, and velocity create unique challenges for data integration 10,000+ unique entities (or file groups) may have to be managed Batch windows are still the same or shrinking The Challenge
  32. 32. Data Factory & Integration Hadoop Distributed Tools Data Integration Packages Hybrid (Both Hadoop and Data Integration Package) • Leverages tools included in the Hadoop Distribution and programing languages • Scoop, Flume, Spark, Java, MapReduce are examples • Tools can be implemented in many different modes • Hand-coded/scripted • Runtime Configured • Generated • Based on use case leverages both Hadoop and COTs tools to move and transform data • Leverage commercial data integration packages to move and transform data • IBM Infosphere Big Insights, Informatica are examples • Key questions, where is processing taking place and does the tool use YARN resource manger? Approaches to Big Data Integration
  33. 33. Define Pipelines and Stages Sqoop Cloud Sources RDBMS File Hub FTP Packaged Tool Object DBMS ETL Tool Log Data FTP Stream/ Message Bus Kafta Sqoop Storm Extract HDFS Load & Formatting Scraping& Normalization MCF Storm Cleansing , Aggregation Transformation Package ETL Tool Storm Data Distribution Data Access & Distribution RDBMS/DW /IMDB Hive Hbase File Extracts NoSQL Stream Output Custom Sqoop Custom Custom Message Bus ETL Tool ETL Tool
  34. 34. Big Data Integration Framework Typical Services Key Guidance: • In lieu of using a ETL product, consider building a Big Data Integration framework • Apache Falcon provides pipeline management • Focus is on making all components run-time configurable with metadata • Can offer significant cost savings over the long run Load UtilityMetadata Collection Metadata Pipeline Config Files Metadata Config Files Pipeline Utilities Parser (Delimiter) Data Standardization HIVE Publishing MF Coding Converters File Joiner & Transport Logging Checksum Retention Replication Late Arriving Data Exception Handling Pipeline Master (ex. Falcon) DB Copy Archival Audit Sqoop Flume HDFS Shell
  35. 35. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  36. 36. SQL on Hadoop • SQL on Hadoop is changing • Historically focused on read functionality for analytics • New breed of SQL on Hadoop • BI and operational reporting • Transaction Processing *Image Provided by Splice Machine
  37. 37. Transactions In Hive
  38. 38. Today’s Objectives 5 Architectural Roles For Hadoop Hadoop Ecosystem Potential vs. Reality Realizing A Hadoop Centric Architecture
  39. 39. Common Big Data Business Use Cases Improve Strategic Decision Making Customer Experience Analysis Operational Optimization Risk and Fraud Reduction Data Monetization Security Event Detection and Analysis IT Cost Management
  40. 40. Architectural Scenarios Architecture Role Business Use Case Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store* Strategic Decision Making P s Customer Experience P s P s Operational Optimization P s s s Risk and Fraud Reduction P s P Data Monetization s s P Security Event Detection and Analysis P s s s IT Cost Management P s P P * Capability is just emerging within the Hadoop ecosystem. Consider this use case for isolated business cases and early adopters. P = Primary Use Case s = Secondary Use case
  41. 41. Integrating Hadoop into the Enterprise Determine Business Use Cases Understand Current Tools & Architecture Align Business Use Case Priorities Build Roadmap Specify Solution Architecture Update & Maintain Roadmap Implement Roadmap
  42. 42. Final Thoughts Do • Match the business use case to the big data role • Clearly define a roadmap • Establish clear architectural standards to drive • Consistency • Re-use of resources • Homework when defining a solution architecture Don’t • Select an initial use case that relies on immature Hadoop functionality • Leverage tools that move data off the cluster for processing then storing the data back on the cluster • Assume all Hadoop technologies integrate well together
  43. 43. As a reminder, please submit your questions in the chat box. We will get to as many as possible.
  44. 44. Daily unique content about content management, user experience, portals and other enterprise information technology solutions across a variety of industries. Perficient.com/SocialMedia Facebook.com/Perficient Twitter.com/Perficient
  45. 45. Thank you for your participation today. Please fill out the survey at the close of this session.

×