Big Data Simplified - Is all about Ab'strakSHeN

Big Data Simplified
"Is all about
abˈstrakSH(ə)n"
H E M A L G A N D H I
D I R E C TO R O F D ATA E N G I N E E R I N G

Analyze
Current State
• Challenges
• Facts
New Platform
Design
• Define Goals
• Feature List
• Implementation
Approach
Compare
• Feature List
• Trade Offs
• Cost
Structure
Decision
Fix
vs.
Build?

Struggling to keep up with business needs

Code base is increasing rapidly

We are slow to respond to market needs

High cost of data
Storage
Finding
InsightsIntegration Maintenance

Strategic Value
Data Identity
Time Value
Dependencies
Lack of understanding
business impact of data

Process and Organization
High Investments
Costs
Adoption Issues
Complex
Framework

NOT scalable platform
Can impact revenue negatively!!!

Keep technology stack current over time

Low cost of data
Storage
Finding
InsightsIntegration Maintenance

Strategic Value
Data Identity
Time Value
Dependencies
Understand business
impact of data

Investment needs
Current Platform
High
New PlatformVs.
High

Scalability
Current Platform
Not Scalable
New PlatformVs.
Initially Scalable

Maintenance cost
Current Platform
High
New PlatformVs.
Initially low,
grows over time

Technology
Current Platform
Outdated
New PlatformVs.
Big Data tools
provide technology
not solutions to
design problems

Build a feature based
scalable big data
platform in 6 months
with limited resources
while supporting legacy
system.
Goal

Take Platform Approach
Project
Requirements
Data
Platform
Features
Reusable
Components

Technology Abstraction
Business Logic
Declarative
Configuration
Pick
Technology
at Runtime
Execution
Engine

Data Access & Ingestion Abstraction
Data Storage
Data Access APIData Ingestion Framework
Data Producers Data Consumers

Data Integration Jobs
Stream Data to Storage Layer
Data Storage
Data Integration Jobs Stream

Hot Data
Hot/Cold Data Management
Cold Data
Configuration
Configuration

Data Quality Service (Data Lineage & Profiling)
Security
Scheduling & Cluster Monitoring
Applications & Visualization Tools
Dredge
Collection
• Apache Flume
• Sqoop
Flow
• Kafka
• Spark
Processing
• PIG
• Spark
• Map Reduce
Storage
• Hive
• HBase
• Vertica
Delivery
• Looker
• Tableau
• Visualization
(d3.js)
• Email/FTP
Data Platform
Data Access Abstraction
Architecture

A declarative, abstraction
layer for integrating big
data tools, enabling loosely
coupled big data platform.
WHAT IS DREDGE

Dredge Logical View
Events ManagementLog Streaming
Tasks
Hadoop Cluster
Source
Readers
Target
Writer
Streams
/Direct
Dredge Repository – HBase
Target
End
Points
Source
End
Points
Configuration Abstraction

Dredge Repository – HBase
LAMDA Architecture : HDFS, Hive, HBase, PIG, Flume, Kafka, Oozie
Dredge Runtime
Temp Store - HDFS
Event
Management
Temp Cache- HDFS Logger Stream
Dredge Data Services
Aggregator
UDF’s
Combiners, Routers..
Plugin (Java/Shell, PIG, SQL)
Rank, Sorter Set Operations
Filters/Patterns Analysis
Abstraction builder (Kafka, Flume, Pig,
Custom)
Source Readers (Logs, RDBMS, unstructured data,
Custom)
Direct/Stream
Target Writers (Hive, HBase, RDBMS, Custom)
Direct/Stream
Dredge UI
Declarative
configuration
Logical Flows
Data Lineage
Runtime
Logs
Admin
Dredge Architecture

• From 1000+ scripts to 50-100 scripts
• From 1000+ configuration files to <5 files
• Logical view of workflow, abstract physical implementation
• Quickly integrate new tools, declarative configuration
implementation for big data tools
• Improved SLA, time to market, better cluster utilization,
higher performance
• Simplified integration
• Minimal migration costs
• Low maintenance, configurable archiving of data
DREDGE BENEFITS

 Abstraction layer
 Technology
 Data access
 Data ingestion
 Dependencies… It is all about abˈstrakSH(ə)n
 Reusable data components
 Event driven dependencies
 Plug & Play integration, loosely coupled (Cluster resources, Data)

Big data requires a different mindset:
Innovate, iterate often and keep it
simple.

Thank you.
E N G I N E E R I N G . O N E K I N G S L A N E . C O M

Big Data Simplified - Is all about Ab'strakSHeN

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Big Data Simplified - Is all about Ab'strakSHeN

Similar a Big Data Simplified - Is all about Ab'strakSHeN (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Big Data Simplified - Is all about Ab'strakSHeN

Notas del editor