SlideShare a Scribd company logo
1 of 14
© 2013 Mellanox Technologies 1
Big Data Benchmarking with RDMA solutions
Oracle Open World 2013
© 2013 Mellanox Technologies 2
Leading Supplier of End-to-End Interconnect Solutions
Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
Virtual Protocol Interconnect
Storage
Front / Back-End
Server / Compute Switch / Gateway
56G IB & FCoIB 56G InfiniBand
10/40/56GbE & FCoE 10/40/56GbE
Fibre Channel
Virtual Protocol Interconnect
© 2013 Mellanox Technologies 3
 A scalable fault-tolerant distributed system for data storage and processing
 Hadoop has two main systems
• Hadoop Distributed File System: self-healing high-bandwidth clustered storage.
• MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data
programming abstraction.
 Key values
• Flexibility – Store any data, Run any analysis.
• Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes.
• Economics – Cost per TB at a fraction of traditional options.
Hadoop Framework
HDFS™
(Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
Map Reduce
HDFS™
(Hadoop Distributed File System)
© 2013 Mellanox Technologies 4
Three Areas for Accelerations
 Data Analytics
• Explore inefficiencies in existing analytics frameworks and systems
• Accelerate data processing to deliver faster results
 Storage
• Explore ways to refine dominant file system
• Take advantage for direct attached disk to accelerate data access
 Distributed Storage
• Leverage popular distributed storage systems with Big Data applications
• Use existing systems for usage with Big Data frameworks
© 2013 Mellanox Technologies 5
~88% CPU
Utilization
I/O Offload Frees Up CPU for Application Processing
UserSpaceSystemSpace
~53% CPU
Utilization
~47% CPU
Overhead/Idle
~12% CPU
Overhead/Idle
Without RDMA With RDMA and Offload
UserSpaceSystemSpace
© 2013 Mellanox Technologies 6
 Plug-in architecture
• Open-source, latest GA version 3.1 (6/10/2013)
• Google code repository at: https://code.google.com/p/uda-plugin/
 Accelerates Map Reduce Jobs
• Accelerated merge sort
 Efficient Shuffle Provider
• Data transfer over RDMA
• Supports InfiniBand and Ethernet
 Supported Hadoop Distributions
• Apache 3.0 – In the main trunk!
• Apache 2.0.3 – In the main trunk!
• Apache Hadoop 1.0.x ; 1.1.x
• Cloudera Distribution Hadoop 3 &4
• Hortonworks HDP 1.x
• GPHD 1.2
 Supported Hardware
• ConnectX®-3 VPI
• SwitchX-2 based systems
Unstructured Data Accelerator - UDA
HDFS™
(Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
Map Reduce
© 2013 Mellanox Technologies 7
Double Map Reduce Performance with UDA
*TeraSort is a popular benchmark used to measure the performance of Hadoop cluster
~50%Disk Access CPU Efficiency 2.5X
**1TB Data Set, 20x dual X5670 (Westmere) Machines, 10x HDD Base; Vanilla GPHD1.2; UDA  GPHD1.2+UDA
2X Faster Job Completion! Increase the Value of Data!
54%
© 2013 Mellanox Technologies 8
 HDFS is the Hadoop File System
• The underlying File system for HBase and other NoSQL Data Bases
 More Drives, Higher Throughput is Needed
 SSDs Solutions Must use Higher Throughput
• Bounded by 1GbE and 10GbE
HDFS Acceleration; Joint Project With Ohio State University
HDFS™
(Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
© 2013 Mellanox Technologies 9
 SSDs Become De-Facto standard in HDFS deployment
• Read capability is a critical factor for application performance
 E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster
• 1GbE network impede any performance benefit from SSD deployment
Unlocking the Power SSDs In Hadoop Environment
E-DFSIO, Showing the Power of SSD @ HDFS
© 2013 Mellanox Technologies 10
OrangeFS as Hadoop Storage Solution
© 2013 Mellanox Technologies 11
 Mellanox VPI Card
• MCX354A-FCBT
 Mellanox Edge Switches
• MSX10xx; MSX60xx
Cloudera Certified – CDH3 and CDH4
© 2013 Mellanox Technologies 12
 E5-26x0 (Sandy Bridge) Machines
• Dual Socket
• 4+ cores each socket
• 32GB+ of DRAM
 Disk Drives
• At least 5 x 1TB, SAS, 10K RPM
 Hadoop Configuration
• At least one Name Node + Job Tracker
• At least 4 Data Nodes
 Installation:
• Your selection of Hadoop Distribution or other Big Data solution (Such as Cassandra)
 Networking
• ConnectX-3 VPI card, FDR, 40GbE and 10GbE
• SwitchX based systems: MSX6036F, MSX1036B and MSX1016
• Mellanox’s FDR, 40GbE and 10GbE Cable Solutions
 http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Hadoop.pdf
Simple Building Block for Big Data Solution
© 2013 Mellanox Technologies 13
 EMC 1000-Node Analytic Platform
 Accelerates Industry's Hadoop Development
 24 PetaByte of physical storage
• Half of every written word since inception of mankind
 Mellanox VPI Solutions
Test Drive Your Big Data
2X Faster Hadoop Job Run-Time
Hadoop
Acceleration
High Throughput, Low Latency, RDMA Critical for ROI
© 2013 Mellanox Technologies 14
Thank You

More Related Content

What's hot

Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Mason Mei
 
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK
 

What's hot (20)

DPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK Summit 2015 - Sprint - Arun RajagopalDPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK Summit 2015 - Sprint - Arun Rajagopal
 
Design Cloud system: InfiniBand vs. Ethernet
Design Cloud system: InfiniBand vs. EthernetDesign Cloud system: InfiniBand vs. Ethernet
Design Cloud system: InfiniBand vs. Ethernet
 
DPDK Summit - 08 Sept 2014 - Ericsson - A Multi-Socket Ferrari for NFV
DPDK Summit - 08 Sept 2014 - Ericsson - A Multi-Socket Ferrari for NFVDPDK Summit - 08 Sept 2014 - Ericsson - A Multi-Socket Ferrari for NFV
DPDK Summit - 08 Sept 2014 - Ericsson - A Multi-Socket Ferrari for NFV
 
InfiniBand Growth Trends - TOP500 (July 2015)
InfiniBand Growth Trends - TOP500 (July 2015)InfiniBand Growth Trends - TOP500 (July 2015)
InfiniBand Growth Trends - TOP500 (July 2015)
 
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
 
InfiniBand
InfiniBandInfiniBand
InfiniBand
 
Advancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
 
Analyst Perspective - Next Generation Storage Networking for Next Generation ...
Analyst Perspective - Next Generation Storage Networking for Next Generation ...Analyst Perspective - Next Generation Storage Networking for Next Generation ...
Analyst Perspective - Next Generation Storage Networking for Next Generation ...
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
Infini Band
Infini BandInfini Band
Infini Band
 
Deep Learning: Convergence of HPC and Hyperscale
Deep Learning: Convergence of HPC and HyperscaleDeep Learning: Convergence of HPC and Hyperscale
Deep Learning: Convergence of HPC and Hyperscale
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3
 
Infiband
InfibandInfiband
Infiband
 
FD.io - The Universal Dataplane
FD.io - The Universal DataplaneFD.io - The Universal Dataplane
FD.io - The Universal Dataplane
 

Viewers also liked

Mellanox presentation for Agile Conference June 2015
Mellanox presentation for Agile Conference June 2015Mellanox presentation for Agile Conference June 2015
Mellanox presentation for Agile Conference June 2015
Chai Forsher
 
Mellanox hpc update @ hpcday 2012 kiev
Mellanox hpc update @ hpcday 2012 kievMellanox hpc update @ hpcday 2012 kiev
Mellanox hpc update @ hpcday 2012 kiev
Volodymyr Saviak
 
Mellanox hpc day 2011 kiev
Mellanox hpc day 2011 kievMellanox hpc day 2011 kiev
Mellanox hpc day 2011 kiev
Volodymyr Saviak
 

Viewers also liked (19)

Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification
Ahead of the NFV Curve with Truly Scale-out Network Function CloudificationAhead of the NFV Curve with Truly Scale-out Network Function Cloudification
Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification
 
Mellanox for OpenStack - OpenStack最新情報セミナー 2014年10月
Mellanox for OpenStack  - OpenStack最新情報セミナー 2014年10月Mellanox for OpenStack  - OpenStack最新情報セミナー 2014年10月
Mellanox for OpenStack - OpenStack最新情報セミナー 2014年10月
 
Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest
 
Mellanox's Sales Strategy
Mellanox's Sales StrategyMellanox's Sales Strategy
Mellanox's Sales Strategy
 
Deploying HPC Cluster with Mellanox InfiniBand Interconnect Solutions
Deploying HPC Cluster with Mellanox InfiniBand Interconnect Solutions Deploying HPC Cluster with Mellanox InfiniBand Interconnect Solutions
Deploying HPC Cluster with Mellanox InfiniBand Interconnect Solutions
 
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
Announcing the Mellanox ConnectX-5 100G InfiniBand AdapterAnnouncing the Mellanox ConnectX-5 100G InfiniBand Adapter
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
 
Mellanox IBM
Mellanox IBMMellanox IBM
Mellanox IBM
 
Mellanox presentation for Agile Conference June 2015
Mellanox presentation for Agile Conference June 2015Mellanox presentation for Agile Conference June 2015
Mellanox presentation for Agile Conference June 2015
 
Interconnect Your Future
Interconnect Your FutureInterconnect Your Future
Interconnect Your Future
 
InfiniBand Strengthens Leadership as the Interconnect Of Choice
InfiniBand Strengthens Leadership as the Interconnect Of ChoiceInfiniBand Strengthens Leadership as the Interconnect Of Choice
InfiniBand Strengthens Leadership as the Interconnect Of Choice
 
Mellanox hpc update @ hpcday 2012 kiev
Mellanox hpc update @ hpcday 2012 kievMellanox hpc update @ hpcday 2012 kiev
Mellanox hpc update @ hpcday 2012 kiev
 
Mellanox 2013 Analyst Day
Mellanox 2013 Analyst DayMellanox 2013 Analyst Day
Mellanox 2013 Analyst Day
 
Interconnect Your Future With Mellanox
Interconnect Your Future With MellanoxInterconnect Your Future With Mellanox
Interconnect Your Future With Mellanox
 
Mellanox Announcements at SC15
Mellanox Announcements at SC15Mellanox Announcements at SC15
Mellanox Announcements at SC15
 
Mellanox introduction 2016 03-28_hjh
Mellanox introduction  2016 03-28_hjhMellanox introduction  2016 03-28_hjh
Mellanox introduction 2016 03-28_hjh
 
Mellanox hpc day 2011 kiev
Mellanox hpc day 2011 kievMellanox hpc day 2011 kiev
Mellanox hpc day 2011 kiev
 
Mellanox's Operational Excellence
Mellanox's Operational ExcellenceMellanox's Operational Excellence
Mellanox's Operational Excellence
 
Scale Out Database Solution
Scale Out Database SolutionScale Out Database Solution
Scale Out Database Solution
 
Mellanox Market Leading Solutions
Mellanox Market Leading SolutionsMellanox Market Leading Solutions
Mellanox Market Leading Solutions
 

Similar to Big Data Benchmarking with RDMA solutions

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
thevijayps
 

Similar to Big Data Benchmarking with RDMA solutions (20)

Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks
 
Ceph Day New York 2014: Ceph over High Performance Networks
Ceph Day New York 2014: Ceph over High Performance NetworksCeph Day New York 2014: Ceph over High Performance Networks
Ceph Day New York 2014: Ceph over High Performance Networks
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascale
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Hadoop
Hadoop Hadoop
Hadoop
 

More from Mellanox Technologies

More from Mellanox Technologies (13)

InfiniBand FAQ
InfiniBand FAQInfiniBand FAQ
InfiniBand FAQ
 
CloudX – Expand Your Cloud into the Future
CloudX – Expand Your Cloud into the FutureCloudX – Expand Your Cloud into the Future
CloudX – Expand Your Cloud into the Future
 
Mellanox VXLAN Acceleration
Mellanox VXLAN AccelerationMellanox VXLAN Acceleration
Mellanox VXLAN Acceleration
 
Virtualization Acceleration
Virtualization Acceleration Virtualization Acceleration
Virtualization Acceleration
 
Interop Tokyo 2014 -- Mellanox Demonstrations
Interop Tokyo 2014 -- Mellanox DemonstrationsInterop Tokyo 2014 -- Mellanox Demonstrations
Interop Tokyo 2014 -- Mellanox Demonstrations
 
Become a Supercomputer Hero
Become a Supercomputer HeroBecome a Supercomputer Hero
Become a Supercomputer Hero
 
The Generation of Open Ethernet
The Generation of Open Ethernet The Generation of Open Ethernet
The Generation of Open Ethernet
 
Interconnect Your Future with Connect-IB
Interconnect Your Future with Connect-IBInterconnect Your Future with Connect-IB
Interconnect Your Future with Connect-IB
 
Unified Fabric Manager - HP Insight CMU Connector
Unified Fabric Manager - HP Insight CMU ConnectorUnified Fabric Manager - HP Insight CMU Connector
Unified Fabric Manager - HP Insight CMU Connector
 
Print 'N Fly - SC13
Print 'N Fly - SC13Print 'N Fly - SC13
Print 'N Fly - SC13
 
Mellanox's Technological Advantage
Mellanox's Technological AdvantageMellanox's Technological Advantage
Mellanox's Technological Advantage
 
Storage, Cloud, Web 2.0, Big Data Driving Growth
Storage, Cloud, Web 2.0, Big Data Driving GrowthStorage, Cloud, Web 2.0, Big Data Driving Growth
Storage, Cloud, Web 2.0, Big Data Driving Growth
 
Mellanox Financial Overview
Mellanox Financial OverviewMellanox Financial Overview
Mellanox Financial Overview
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Big Data Benchmarking with RDMA solutions

  • 1. © 2013 Mellanox Technologies 1 Big Data Benchmarking with RDMA solutions Oracle Open World 2013
  • 2. © 2013 Mellanox Technologies 2 Leading Supplier of End-to-End Interconnect Solutions Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables Comprehensive End-to-End InfiniBand and Ethernet Portfolio Virtual Protocol Interconnect Storage Front / Back-End Server / Compute Switch / Gateway 56G IB & FCoIB 56G InfiniBand 10/40/56GbE & FCoE 10/40/56GbE Fibre Channel Virtual Protocol Interconnect
  • 3. © 2013 Mellanox Technologies 3  A scalable fault-tolerant distributed system for data storage and processing  Hadoop has two main systems • Hadoop Distributed File System: self-healing high-bandwidth clustered storage. • MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data programming abstraction.  Key values • Flexibility – Store any data, Run any analysis. • Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. • Economics – Cost per TB at a fraction of traditional options. Hadoop Framework HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig Map Reduce HDFS™ (Hadoop Distributed File System)
  • 4. © 2013 Mellanox Technologies 4 Three Areas for Accelerations  Data Analytics • Explore inefficiencies in existing analytics frameworks and systems • Accelerate data processing to deliver faster results  Storage • Explore ways to refine dominant file system • Take advantage for direct attached disk to accelerate data access  Distributed Storage • Leverage popular distributed storage systems with Big Data applications • Use existing systems for usage with Big Data frameworks
  • 5. © 2013 Mellanox Technologies 5 ~88% CPU Utilization I/O Offload Frees Up CPU for Application Processing UserSpaceSystemSpace ~53% CPU Utilization ~47% CPU Overhead/Idle ~12% CPU Overhead/Idle Without RDMA With RDMA and Offload UserSpaceSystemSpace
  • 6. © 2013 Mellanox Technologies 6  Plug-in architecture • Open-source, latest GA version 3.1 (6/10/2013) • Google code repository at: https://code.google.com/p/uda-plugin/  Accelerates Map Reduce Jobs • Accelerated merge sort  Efficient Shuffle Provider • Data transfer over RDMA • Supports InfiniBand and Ethernet  Supported Hadoop Distributions • Apache 3.0 – In the main trunk! • Apache 2.0.3 – In the main trunk! • Apache Hadoop 1.0.x ; 1.1.x • Cloudera Distribution Hadoop 3 &4 • Hortonworks HDP 1.x • GPHD 1.2  Supported Hardware • ConnectX®-3 VPI • SwitchX-2 based systems Unstructured Data Accelerator - UDA HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig Map Reduce
  • 7. © 2013 Mellanox Technologies 7 Double Map Reduce Performance with UDA *TeraSort is a popular benchmark used to measure the performance of Hadoop cluster ~50%Disk Access CPU Efficiency 2.5X **1TB Data Set, 20x dual X5670 (Westmere) Machines, 10x HDD Base; Vanilla GPHD1.2; UDA  GPHD1.2+UDA 2X Faster Job Completion! Increase the Value of Data! 54%
  • 8. © 2013 Mellanox Technologies 8  HDFS is the Hadoop File System • The underlying File system for HBase and other NoSQL Data Bases  More Drives, Higher Throughput is Needed  SSDs Solutions Must use Higher Throughput • Bounded by 1GbE and 10GbE HDFS Acceleration; Joint Project With Ohio State University HDFS™ (Hadoop Distributed File System) Map Reduce HBase DISK DISK DISK DISK DISK DISK Hive Pig
  • 9. © 2013 Mellanox Technologies 9  SSDs Become De-Facto standard in HDFS deployment • Read capability is a critical factor for application performance  E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster • 1GbE network impede any performance benefit from SSD deployment Unlocking the Power SSDs In Hadoop Environment E-DFSIO, Showing the Power of SSD @ HDFS
  • 10. © 2013 Mellanox Technologies 10 OrangeFS as Hadoop Storage Solution
  • 11. © 2013 Mellanox Technologies 11  Mellanox VPI Card • MCX354A-FCBT  Mellanox Edge Switches • MSX10xx; MSX60xx Cloudera Certified – CDH3 and CDH4
  • 12. © 2013 Mellanox Technologies 12  E5-26x0 (Sandy Bridge) Machines • Dual Socket • 4+ cores each socket • 32GB+ of DRAM  Disk Drives • At least 5 x 1TB, SAS, 10K RPM  Hadoop Configuration • At least one Name Node + Job Tracker • At least 4 Data Nodes  Installation: • Your selection of Hadoop Distribution or other Big Data solution (Such as Cassandra)  Networking • ConnectX-3 VPI card, FDR, 40GbE and 10GbE • SwitchX based systems: MSX6036F, MSX1036B and MSX1016 • Mellanox’s FDR, 40GbE and 10GbE Cable Solutions  http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Hadoop.pdf Simple Building Block for Big Data Solution
  • 13. © 2013 Mellanox Technologies 13  EMC 1000-Node Analytic Platform  Accelerates Industry's Hadoop Development  24 PetaByte of physical storage • Half of every written word since inception of mankind  Mellanox VPI Solutions Test Drive Your Big Data 2X Faster Hadoop Job Run-Time Hadoop Acceleration High Throughput, Low Latency, RDMA Critical for ROI
  • 14. © 2013 Mellanox Technologies 14 Thank You

Editor's Notes

  1. HDFS is the underlying file system for Hadoop.WE have a project ongoing with OSU – stay tuned for the availability schedule.
  2. Test configuration:5 nodes, Apache Hadoop 1.1.2 E5-2670, 64GB DRAM.1 Name Node, 4 Data Nodes.HDDs: 5x 1TB, 10K per NodeSSDs: 2x 960MB, PCIe Gen II x4.HiBench 2.2 test suite
  3. Hadoop Filesystem Agnostic API
  4. Recipe on how to build a big data solution is available on Mellanox web site.Everything is there, components, scripts, tradeoffs – USE IT, it works.
  5. Ask your customers to login to the AWB.It is for their use and try, it is deployed over Mellanox e2e FDR network utilizing UDA and UFM