Lightening talk from the Hadoop Summit 2013 in Amsterdam covering how Syncsort is helping make Hadoop Ready for Prime Time. It includes the pluggable sort contribution - the impact on sort, join, aggregation, merge, filter in hadoopand Syncsort's ability to move mainframe data to hadoop - Big Iron to Big Data.
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk
1. Making Hadoop Ready for Prime Time
Hadoop Summit Amsterdam March 2013
Steve Totman
Director Of Strategy
Syncsort
March 20th 2013
Photo Credit Aaron Sikkink http://www.flickr.com/people/housequakecom/
4. The Big Data Continuum
Big Data Continuum
Handcoding
nightmare
Integrating Big Data… Smarter
Hand-coding:
SQL, JCL.
Basic ETL Tools
Challenges
Min
Data
Awakening
SQL Migration
Max
Value
Advancing
Traditional
BI
Standardization &
Plateauing
Dynamic
Hitting arch limits + Early Hadoop
Heavy Platforms.
exponential costs. adoption prototyping
Demand for MF data Growing MIPS
& experimentation
Long
development
cycles
Highperformance ETL
Syncsort Confidential and Proprietary - do not copy or distribute
Unsustainable
costs
ETL & Rehosting
Optimization
Hadoop
connectivity &
sort gaps
Hadoop Sort
& Connectivity
Evolved
Big Data is the new
standard for both MF
& open systems data
Efficiency,
ETL &
skills gaps
Hadoop ETL
DMExpress
MFX
4
5. Mandatory sort steps in MapReduce processing
Syncsort Confidential and Proprietary - do not copy or distribute
5
8. Smart Contributions to Improve Hadoop
Native Sort:
ᵡ modular
Not
ᵡ
Limited capabilities
ᵡ
Difficult to fine-tune & configure (requires
JIRA Description
4807
Allow MapOutputBuffer to be pluggable
4808
Allow Reduce-side merge to be pluggable
4809
Make classes required for 2454 public
4812
Create reduce input merger plug-in
4842
Shuffle race can hang reducer
2461
HDFS file name globbing in libhdfs
4482
Backport of 2454 to MapReduce 1 & 1.2
coding & compilation)
Native
Sort
Native
Sort
Hadoop Contribution:
Hadoop
Node
Node
Modular
Extensible
Configurable through use of external sorters
on MapReduce nodes
Native
Sort
Native
Sort
Hadoop
Node
Hadoop
Node
First Included - Hadoop distribution, CDH4.2, on February 26th
…and more!!
8
Sy
nc
9. Benefits to the Community
MATCH
COMPRESSION
MERGE
TeraSort Benchmark
RANK
LOOKUP
Elapsed Time (min)
250
200
150
100
50
0
0
1000
2000
3000
File Size (GB)
JOIN
AGGREGRATION
Syncsort Confidential and Proprietary - do not copy or distribute
4000
5000
CDC
9
11. Syncsort. A Bridge to Scalable, Cost-effective Big Data
Connect
Pre-process
•HDFS Connectivity
•Mainframe
•Teradata
•Files
•RDBMS, Appliances
•Sort, Join
•Aggregate
•Compress
•Partition
Facilitate
•Graphical UI
•No Manual Coding
•No Tuning
Optimize
•Up to 6x Faster Load
•Up to 2x Faster Sort
•Faster MapReduce
Jobs
•Less Storage
Over 40 Years Solving Big Data
Challenges with Fast. Efficient. Simple.
Cost Effective DI Technology
Syncsort Confidential and Proprietary - do not copy or distribute
11