Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

•Descargar como PPTX, PDF•

0 recomendaciones•2,370 vistas

Lightening talk from the Hadoop Summit 2013 in Amsterdam covering how Syncsort is helping make Hadoop Ready for Prime Time. It includes the pluggable sort contribution - the impact on sort, join, aggregation, merge, filter in hadoopand Syncsort's ability to move mainframe data to hadoop - Big Iron to Big Data.

Tecnología

Making Hadoop Ready for Prime Time
Hadoop Summit Amsterdam March 2013
Steve Totman
Director Of Strategy
Syncsort

March 20th 2013

Photo Credit Aaron Sikkink http://www.flickr.com/people/housequakecom/

Syncsort Confidential and Proprietary - do not copy or distribute

3

The Big Data Continuum
Big Data Continuum

Handcoding
nightmare

Integrating Big Data… Smarter

Hand-coding:
SQL, JCL.
Basic ETL Tools

Challenges

Min

Data
Awakening

SQL Migration

Max

Value

Advancing
Traditional
BI
Standardization &

Plateauing

Dynamic

Hitting arch limits + Early Hadoop
Heavy Platforms.
exponential costs. adoption prototyping
Demand for MF data Growing MIPS
& experimentation
Long
development
cycles

Highperformance ETL

Syncsort Confidential and Proprietary - do not copy or distribute

Unsustainable
costs

ETL & Rehosting
Optimization

Hadoop
connectivity &
sort gaps

Hadoop Sort
& Connectivity

Evolved
Big Data is the new
standard for both MF
& open systems data

Efficiency,
ETL &
skills gaps

Hadoop ETL

DMExpress
MFX
4

Mandatory sort steps in MapReduce processing

Syncsort Confidential and Proprietary - do not copy or distribute

5

Syncsort Confidential and Proprietary - do not copy or distribute

6

Smart Contributions to Improve Hadoop
Native Sort:

ᵡ modular
Not
ᵡ
Limited capabilities
ᵡ
Difficult to fine-tune & configure (requires

JIRA Description
4807

Allow MapOutputBuffer to be pluggable

4808

Allow Reduce-side merge to be pluggable

4809

Make classes required for 2454 public

4812

Create reduce input merger plug-in

4842

Shuffle race can hang reducer

2461

HDFS file name globbing in libhdfs

4482

Backport of 2454 to MapReduce 1 & 1.2

coding & compilation)
Native
Sort

Native
Sort

Hadoop Contribution:
Hadoop
Node
Node

 Modular
 Extensible
 Configurable through use of external sorters
on MapReduce nodes
Native
Sort

Native
Sort

Hadoop
Node

Hadoop
Node

First Included - Hadoop distribution, CDH4.2, on February 26th

…and more!!
8

Sy
nc

Benefits to the Community

MATCH

COMPRESSION
MERGE
TeraSort Benchmark
RANK
LOOKUP
Elapsed Time (min)

250
200
150
100
50

0
0

1000

2000
3000
File Size (GB)

JOIN
AGGREGRATION
Syncsort Confidential and Proprietary - do not copy or distribute

4000

5000

CDC
9

Data Access:

Mainframes

Today

Syncsort Confidential and Proprietary - do not copy or distribute

50%

Run

10

Syncsort. A Bridge to Scalable, Cost-effective Big Data
Connect

Pre-process

•HDFS Connectivity
•Mainframe
•Teradata
•Files
•RDBMS, Appliances

•Sort, Join
•Aggregate
•Compress
•Partition

Facilitate
•Graphical UI
•No Manual Coding
•No Tuning

Optimize
•Up to 6x Faster Load
•Up to 2x Faster Sort
•Faster MapReduce
Jobs
•Less Storage

Over 40 Years Solving Big Data
Challenges with Fast. Efficient. Simple.
Cost Effective DI Technology
Syncsort Confidential and Proprietary - do not copy or distribute

11

Hourly Load into comScore’s Hadoop Cluster
SyncSort’s DMExpress saves comScore over 4TB of data per day!
That’s 1460TB a year -1.42 Petabytes
500,000,000,000
450,000,000,000
400,000,000,000
350,000,000,000
300,000,000,000
250,000,000,000
200,000,000,000

150,000,000,000
100,000,000,000
50,000,000,000
1

2

3

4

5

6

7

8

9

10

Input Data in Bytes

© comScore, Inc.

Proprietary.

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Output Data in Bytes

12

comScore’s Daily Trend of Event Volume

5,000,000,000

40,000,000,000

4,000,000,000

30,000,000,000

3,000,000,000

20,000,000,000

2,000,000,000

10,000,000,000

1,000,000,000

0

# of panel records

6,000,000,000

50,000,000,000

# of census records

60,000,000,000

0

Beacon Records

Panel Records

Please Attend Mike Brown’s Session Analyzing 1.4
Trillion Events with Hadoop Tomorrow

© comScore, Inc.

Proprietary.

13

(No elephants were harmed during
the creation of this talk but some
are now a lot faster & meaner)
Please visit our booth to register for a free evaluation
Syncsort Confidential and Proprietary - do
not copy or distribute
© comScore, Inc.

Proprietary.

14

Más contenido relacionado

La actualidad más candente

How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.

Big Data Case Study: Fortune 100 TelcoBlueData, Inc.

EMC Isilon Database Converged deckKeithETD_CTO

Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.

Data Process Systems, connecting everythingDataWorks Summit/Hadoop Summit

Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.

The Ecosystem is too damn big DataWorks Summit/Hadoop Summit

Hd insight overviewvhrocca

Big data, map reduce and beyonddatasalt

Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution

Big Data Architecture and DeploymentCisco Canada

Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreAlluxio, Inc.

Data Orchestration for AI, Big Data, and CloudAlluxio, Inc.

Achieving compute and storage independence for data-driven workloadsAlluxio, Inc.

Hw09 Welcome To Hadoop WorldCloudera, Inc.

Hadoop: Distributed Data ProcessingCloudera, Inc.

Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)

Data lake-itweekend-sharif university-vahid amirydatastack

Big data technologies and Hadoop infrastructureRoman Nikitchenko

State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin

La actualidad más candente (20)

How the Development Bank of Singapore solves on-prem compute capacity challen...

Big Data Case Study: Fortune 100 Telco

EMC Isilon Database Converged deck

Accelerate Analytics and ML in the Hybrid Cloud Era

Data Process Systems, connecting everything

Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack

The Ecosystem is too damn big

Hd insight overview

Big data, map reduce and beyond

Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution

Big Data Architecture and Deployment

Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More

Data Orchestration for AI, Big Data, and Cloud

Achieving compute and storage independence for data-driven workloads

Hw09 Welcome To Hadoop World

Hadoop: Distributed Data Processing

Introduction to Apache Hadoop Eco-System

Data lake-itweekend-sharif university-vahid amiry

Big data technologies and Hadoop infrastructure

State of the Art Robot Predictive Maintenance with Real-time Sensor Data

Destacado

50states jessikafrenchjessikafrench

Syncsort & comScore Big Data Warehouse Meetup Sept 2013Steven Totman

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman

Steve Totman Syncsort Big Data Warehousing hug 23 sept FinalSteven Totman

Reuters: Pictures of the Year 2016 (Part 2)maditabalnco

The Six Highest Performing B2B Blog Post FormatsBarry Feldman

Destacado (6)

50states jessikafrench

Syncsort & comScore Big Data Warehouse Meetup Sept 2013

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

Steve Totman Syncsort Big Data Warehousing hug 23 sept Final

Reuters: Pictures of the Year 2016 (Part 2)

The Six Highest Performing B2B Blog Post Formats

Similar a Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...Precisely

Simplifying Big Data Integration with Syncsort DMX and DMX-hPrecisely

Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely

GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson

Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo

Syncsort et le retour d'expérience ComScoreModern Data Stack France

Solving enterprise challenges through scale out storage & big compute finalAvere Systems

Data Orchestration for the Hybrid Cloud EraAlluxio, Inc.

DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo

Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA

Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Precisely

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit

IBM Data Centric Systems & OpenPOWERinside-BigData.com

Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2

Ibm db2 big sqlModusOptimum

From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersDenodo

Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung

Hadoop is HappeningPrecisely

Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo

Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...Precisely

Similar a Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk (20)

How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...

Simplifying Big Data Integration with Syncsort DMX and DMX-h

Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage

GOAI: GPU-Accelerated Data Science DataSciCon 2017

Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)

Syncsort et le retour d'expérience ComScore

Solving enterprise challenges through scale out storage & big compute final

Data Orchestration for the Hybrid Cloud Era

DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization

Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...

Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...

IBM Data Centric Systems & OpenPOWER

Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.

Ibm db2 big sql

From Single Purpose to Multi Purpose Data Lakes - Broadening End Users

Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent

Hadoop is Happening

Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)

Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Scaling API-first – The story of a global engineering organizationRadu Cotescu

GenCyber Cyber Security Day PresentationMichael W. Hawkins

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

A Call to Action for Generative AI in 2024Results

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Histor y of HAM Radio presentation slidevu2urc

Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

1. Making Hadoop Ready for Prime Time Hadoop Summit Amsterdam March 2013 Steve Totman Director Of Strategy Syncsort March 20th 2013 Photo Credit Aaron Sikkink http://www.flickr.com/people/housequakecom/

2. 2

3. Syncsort Confidential and Proprietary - do not copy or distribute 3

4. The Big Data Continuum Big Data Continuum Handcoding nightmare Integrating Big Data… Smarter Hand-coding: SQL, JCL. Basic ETL Tools Challenges Min Data Awakening SQL Migration Max Value Advancing Traditional BI Standardization & Plateauing Dynamic Hitting arch limits + Early Hadoop Heavy Platforms. exponential costs. adoption prototyping Demand for MF data Growing MIPS & experimentation Long development cycles Highperformance ETL Syncsort Confidential and Proprietary - do not copy or distribute Unsustainable costs ETL & Rehosting Optimization Hadoop connectivity & sort gaps Hadoop Sort & Connectivity Evolved Big Data is the new standard for both MF & open systems data Efficiency, ETL & skills gaps Hadoop ETL DMExpress MFX 4

5. Mandatory sort steps in MapReduce processing Syncsort Confidential and Proprietary - do not copy or distribute 5

6. Syncsort Confidential and Proprietary - do not copy or distribute 6

7. 7

8. Smart Contributions to Improve Hadoop Native Sort: ᵡ modular Not ᵡ Limited capabilities ᵡ Difficult to fine-tune & configure (requires JIRA Description 4807 Allow MapOutputBuffer to be pluggable 4808 Allow Reduce-side merge to be pluggable 4809 Make classes required for 2454 public 4812 Create reduce input merger plug-in 4842 Shuffle race can hang reducer 2461 HDFS file name globbing in libhdfs 4482 Backport of 2454 to MapReduce 1 & 1.2 coding & compilation) Native Sort Native Sort Hadoop Contribution: Hadoop Node Node  Modular  Extensible  Configurable through use of external sorters on MapReduce nodes Native Sort Native Sort Hadoop Node Hadoop Node First Included - Hadoop distribution, CDH4.2, on February 26th …and more!! 8 Sy nc

9. Benefits to the Community MATCH COMPRESSION MERGE TeraSort Benchmark RANK LOOKUP Elapsed Time (min) 250 200 150 100 50 0 0 1000 2000 3000 File Size (GB) JOIN AGGREGRATION Syncsort Confidential and Proprietary - do not copy or distribute 4000 5000 CDC 9

10. Data Access: Mainframes Today Syncsort Confidential and Proprietary - do not copy or distribute 50% Run 10

11. Syncsort. A Bridge to Scalable, Cost-effective Big Data Connect Pre-process •HDFS Connectivity •Mainframe •Teradata •Files •RDBMS, Appliances •Sort, Join •Aggregate •Compress •Partition Facilitate •Graphical UI •No Manual Coding •No Tuning Optimize •Up to 6x Faster Load •Up to 2x Faster Sort •Faster MapReduce Jobs •Less Storage Over 40 Years Solving Big Data Challenges with Fast. Efficient. Simple. Cost Effective DI Technology Syncsort Confidential and Proprietary - do not copy or distribute 11

12. Hourly Load into comScore’s Hadoop Cluster SyncSort’s DMExpress saves comScore over 4TB of data per day! That’s 1460TB a year -1.42 Petabytes 500,000,000,000 450,000,000,000 400,000,000,000 350,000,000,000 300,000,000,000 250,000,000,000 200,000,000,000 150,000,000,000 100,000,000,000 50,000,000,000 1 2 3 4 5 6 7 8 9 10 Input Data in Bytes © comScore, Inc. Proprietary. 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Output Data in Bytes 12

13. comScore’s Daily Trend of Event Volume 5,000,000,000 40,000,000,000 4,000,000,000 30,000,000,000 3,000,000,000 20,000,000,000 2,000,000,000 10,000,000,000 1,000,000,000 0 # of panel records 6,000,000,000 50,000,000,000 # of census records 60,000,000,000 0 Beacon Records Panel Records Please Attend Mike Brown’s Session Analyzing 1.4 Trillion Events with Hadoop Tomorrow © comScore, Inc. Proprietary. 13

14. (No elephants were harmed during the creation of this talk but some are now a lot faster & meaner) Please visit our booth to register for a free evaluation Syncsort Confidential and Proprietary - do not copy or distribute © comScore, Inc. Proprietary. 14

Notas del editor

Organizations typically struggle with data processing at all stages of the Big Data Continuum

Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (6)

Similar a Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Similar a Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk (20)

Último

Último (20)

Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Notas del editor