SlideShare a Scribd company logo
1 of 13
Download to read offline
MRemu:	
  An	
  Emula-on-­‐based	
  
Framework	
  for	
  Datacenter	
  Network	
  
Experimenta-on	
  using	
  Realis-c	
  
MapReduce	
  Traffic	
  	
  
Marcelo	
  Veiga	
  Neves1,	
  Cesar	
  A.	
  F.	
  De	
  Rose1,	
  	
  
Kostas	
  Katrinis2	
  
marcelo.neves@pucrs.br	
  
	
  
	
  
1	
  PUCRS,	
  Porto	
  Alegre,	
  Brazil	
  
2	
  IBM	
  Research,	
  Dublin,	
  Ireland	
  
Oct,	
  2015	
  
Context	
  
•  Big	
  Data	
  &	
  MapReduce	
  analy-cs	
  frameworks	
  
– Scale	
  out	
  to	
  hundred	
  or	
  even	
  thousands	
  of	
  
commodity	
  servers	
  
– Increased	
  network	
  traffic	
  volumes	
  and	
  mul-plicity	
  
of	
  traffic	
  paWerns	
  
•  Data	
  center	
  networks	
  for	
  Big	
  Data	
  
– Scale-­‐out	
  topologies	
  (e.g.,	
  fat-­‐tree,	
  leaf-­‐spine)	
  
– Network	
  control	
  soZware	
  (e.g,	
  SDN	
  –	
  IPDPS’15)	
  	
  
Problem	
  
•  The	
  need	
  for	
  a	
  real	
  hardware	
  infrastructure	
  
– is	
  oZen	
  not	
  a	
  valid	
  op-on	
  	
  
– even	
  when	
  datacenter	
  resources	
  are	
  available	
  	
  
•  access	
  is	
  not	
  con-nuous	
  
•  not	
  prac-cal	
  to	
  reconfigure	
  them	
  in	
  order	
  to	
  evaluate	
  
different	
  network	
  topologies	
  and	
  characteris-cs	
  (e.g.,	
  
bandwidth	
  and	
  latency)	
  
•  Alterna-ves:	
  
– Simula-on	
  &	
  Emula-on	
  
Problem	
  
•  Most	
  research	
  on	
  data	
  center	
  networks	
  do	
  not	
  use	
  
realis-c	
  Big	
  Data	
  traffic	
  
–  synthe-c	
  traffic	
  paWerns	
  
•  Simplified	
  shuffle-­‐like	
  traffic	
  paWerns	
  
–  e.g.,	
  all-­‐to-­‐all	
  
–  not	
  consider	
  transfer	
  scheduling	
  decisions,	
  number	
  of	
  
parallel	
  transfers,	
  etc.	
  
–  overlap	
  communica-on	
  with	
  computa-on	
  
•  How	
  the	
  reported	
  results	
  translate	
  to	
  performance	
  
improvement	
  for	
  actual	
  analy-cs	
  run-mes?	
  
4	
  
Network	
  Traffic	
  in	
  real	
  Hadoop	
  Applica-ons	
  
•  a	
  
5	
  
Network	
  transfers	
  
Network	
  transfers	
  
Proposed	
  solu-on:	
  MRemu	
  
•  Emula-on-­‐based	
  framework	
  for	
  data	
  center	
  
network	
  experimenta-on	
  
•  Highlights:	
  
–  Ability	
  to	
  run	
  a	
  complete	
  data	
  center	
  in	
  a	
  single	
  server	
  
–  Use	
  of	
  realis-c	
  network	
  traffic	
  
•  replay	
  or	
  extrapolate	
  from	
  execu-ons	
  of	
  real	
  applica-ons	
  in	
  
produc-on	
  datacenters	
  
–  Mimics	
  framework	
  internals	
  (e.g,	
  transfer	
  scheduling,	
  
phases	
  overlaps,	
  etc.)	
  
–  Unmodified	
  code	
  also	
  run	
  in	
  real	
  hardware	
  
MRemu	
  Architecture	
  
7	
  
Job Trace
Hadoop
Job Tracing
Synthectic
Job Generator
Topology
Description
Mininet-HiFI
Topology
Builder
Application
Launcher
Network
Monitor
Data center emulator
TaskTracker
Job Trace
Parser
Traffic
Generator
Hadoop MapReduce emulator
Logger
JobTracker
*Mininet	
  can	
  be	
  replaced	
  with	
  real	
  hardware	
  (e.g.,	
  run	
  in	
  legacy	
  clusters)	
  
*	
  
Evalua-on	
  
•  Mininet-­‐HiFi	
  has	
  already	
  been	
  validated	
  and	
  is	
  widely	
  
used	
  to	
  reproduce	
  networking	
  research	
  experiments	
  
•  Accuracy	
  when	
  reproducing	
  MapReduce	
  workloads.	
  	
  
–  Comparison	
  with	
  traces	
  extracted	
  from	
  real	
  job	
  execu-ons	
  
–  Two	
  opera-ons	
  modes:	
  replay	
  mode	
  and	
  hadoop	
  mode	
  
•  Execu-on	
  environment:	
  
–  Shamrock	
  datacenter,	
  IBM	
  Research	
  
–  HiBench	
  Benchmark	
  Suite:	
  	
  Sort,	
  Nutch,	
  PageRank	
  and	
  
Bayes	
  
8	
  
Handigol,	
  N.;	
  Heller,	
  B.;	
  Jeyakumar,	
  V.;	
  Lantz,	
  B.;	
  McKeown,	
  N.	
  “Reproducible	
  Network	
  Experiments
Accuracy	
  Evalua-on	
  
9	
  
Job	
  Comple-on	
  Time	
  Accuracy	
   Individual	
  Flow	
  Comple-on	
  Time	
  Accuracy	
  
Nutch	
  applica-on	
  with	
  background	
  traffic	
  
Sort	
  applica-on	
  with	
  background	
  traffic	
  
Par--on	
  skew	
  problem	
  
Impact	
  of	
  the	
  network	
  topology	
  
Other	
  experiments	
  
Conclusion	
  and	
  Future	
  Work	
  
•  MRemu,	
  an	
  emula-on-­‐based	
  framework	
  that	
  enables	
  
conduc-ng	
  datacenter	
  network	
  research	
  
–  without	
  requiring	
  expensive	
  and	
  con-nuous	
  access	
  to	
  
large-­‐scale	
  datacenter	
  hardware	
  resources	
  	
  
•  Available	
  as	
  open	
  source:	
  
–  hWps://github.com/mvneves/mremu	
  
•  Future	
  work:	
  
–  Extend	
  it	
  to	
  other	
  frameworks	
  and	
  traffic	
  paWerns	
  
–  Integrate	
  it	
  with	
  Mininet-­‐HiFI	
  cluster	
  edi-on	
  
–  Support	
  to	
  migra-on	
  of	
  “virtual	
  machines”	
  
References	
  
•  NEVES,	
  M.	
  V.,:	
  Applica-on-­‐aware	
  networking	
  to	
  Accelerate	
  
MapReduce	
  Applica-ons	
  (Ph.D.	
  Disserta-on),	
  2015	
  
•  NEVES,	
  M.	
  V.;	
  KATRINIS,	
  M.	
  K.;	
  FRANKE,	
  H.;	
  DE	
  ROSE,	
  C.	
  A.	
  F.;	
  
Pythia:	
  Faster	
  Big	
  Data	
  in	
  Mo-on	
  through	
  Predic-ve	
  SoZware-­‐
Defined	
  Network	
  Op-miza-on	
  at	
  Run-me.	
  In:	
  IPDPS	
  2014,	
  
Phoenix,	
  USA,	
  2014	
  
	
  
•  NEVES,	
  M.	
  V.,	
  DE	
  ROSE,	
  C.	
  A.	
  F.,	
  KATRINIS,	
  K.	
  MRemu:	
  An	
  
Emula-on-­‐based	
  Framework	
  for	
  Datacenter	
  Network	
  
Experimenta-on	
  using	
  Realis-c	
  MapReduce	
  Traffic,	
  MASCOTS	
  
2015,	
  Atlanta,	
  USA,	
  2015.	
  
MRemu:	
  An	
  Emula-on-­‐based	
  
Framework	
  for	
  Datacenter	
  Network	
  
Experimenta-on	
  using	
  Realis-c	
  
MapReduce	
  Traffic	
  	
  
Marcelo	
  Veiga	
  Neves1,	
  Cesar	
  A.	
  F.	
  De	
  Rose1,	
  	
  
Kostas	
  Katrinis2	
  
marcelo.neves@pucrs.br	
  
	
  
	
  
1	
  PUCRS,	
  Porto	
  Alegre,	
  Brazil	
  
2	
  IBM	
  Research,	
  Dublin,	
  Ireland	
  
Oct,	
  2015	
  

More Related Content

What's hot

The Education of Computational Scientists
The Education of Computational ScientistsThe Education of Computational Scientists
The Education of Computational Scientists
inside-BigData.com
 
WAN & LAN Cluster with Diagrams and OSI explanation
WAN & LAN Cluster with Diagrams and OSI explanationWAN & LAN Cluster with Diagrams and OSI explanation
WAN & LAN Cluster with Diagrams and OSI explanation
Jonathan Reid
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
NIKHIL NAIR
 

What's hot (20)

عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
 
The Education of Computational Scientists
The Education of Computational ScientistsThe Education of Computational Scientists
The Education of Computational Scientists
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
WAN & LAN Cluster with Diagrams and OSI explanation
WAN & LAN Cluster with Diagrams and OSI explanationWAN & LAN Cluster with Diagrams and OSI explanation
WAN & LAN Cluster with Diagrams and OSI explanation
 
Science DMZ at Imperial
Science DMZ at ImperialScience DMZ at Imperial
Science DMZ at Imperial
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
Scientific Computing in the Cloud
Scientific Computing in the CloudScientific Computing in the Cloud
Scientific Computing in the Cloud
 
Lecture 4 Cluster Computing
Lecture 4 Cluster ComputingLecture 4 Cluster Computing
Lecture 4 Cluster Computing
 
Cluster Computing Seminar.
Cluster Computing Seminar.Cluster Computing Seminar.
Cluster Computing Seminar.
 
Cyber Analytics Applications for Data-Intensive Computing
Cyber Analytics Applications for Data-Intensive ComputingCyber Analytics Applications for Data-Intensive Computing
Cyber Analytics Applications for Data-Intensive Computing
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA
LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA
LARGE SCALE IMAGE PROCESSING IN REAL-TIME ENVIRONMENTS WITH KAFKA
 
Stream data mining & CluStream framework
Stream data mining & CluStream frameworkStream data mining & CluStream framework
Stream data mining & CluStream framework
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Hadoop
HadoopHadoop
Hadoop
 

Similar to MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Big Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosBig Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking Scenarios
Stenio Fernandes
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
purplesea
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
Khazret Sapenov
 

Similar to MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic (20)

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
 
Common Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli DartCommon Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli Dart
 
Big Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosBig Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking Scenarios
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
 
Cloud computing
Cloud computing Cloud computing
Cloud computing
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
Feec telecom-nw-softwarization-aug-2015
Feec telecom-nw-softwarization-aug-2015Feec telecom-nw-softwarization-aug-2015
Feec telecom-nw-softwarization-aug-2015
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

  • 1. MRemu:  An  Emula-on-­‐based   Framework  for  Datacenter  Network   Experimenta-on  using  Realis-c   MapReduce  Traffic     Marcelo  Veiga  Neves1,  Cesar  A.  F.  De  Rose1,     Kostas  Katrinis2   marcelo.neves@pucrs.br       1  PUCRS,  Porto  Alegre,  Brazil   2  IBM  Research,  Dublin,  Ireland   Oct,  2015  
  • 2. Context   •  Big  Data  &  MapReduce  analy-cs  frameworks   – Scale  out  to  hundred  or  even  thousands  of   commodity  servers   – Increased  network  traffic  volumes  and  mul-plicity   of  traffic  paWerns   •  Data  center  networks  for  Big  Data   – Scale-­‐out  topologies  (e.g.,  fat-­‐tree,  leaf-­‐spine)   – Network  control  soZware  (e.g,  SDN  –  IPDPS’15)    
  • 3. Problem   •  The  need  for  a  real  hardware  infrastructure   – is  oZen  not  a  valid  op-on     – even  when  datacenter  resources  are  available     •  access  is  not  con-nuous   •  not  prac-cal  to  reconfigure  them  in  order  to  evaluate   different  network  topologies  and  characteris-cs  (e.g.,   bandwidth  and  latency)   •  Alterna-ves:   – Simula-on  &  Emula-on  
  • 4. Problem   •  Most  research  on  data  center  networks  do  not  use   realis-c  Big  Data  traffic   –  synthe-c  traffic  paWerns   •  Simplified  shuffle-­‐like  traffic  paWerns   –  e.g.,  all-­‐to-­‐all   –  not  consider  transfer  scheduling  decisions,  number  of   parallel  transfers,  etc.   –  overlap  communica-on  with  computa-on   •  How  the  reported  results  translate  to  performance   improvement  for  actual  analy-cs  run-mes?   4  
  • 5. Network  Traffic  in  real  Hadoop  Applica-ons   •  a   5   Network  transfers   Network  transfers  
  • 6. Proposed  solu-on:  MRemu   •  Emula-on-­‐based  framework  for  data  center   network  experimenta-on   •  Highlights:   –  Ability  to  run  a  complete  data  center  in  a  single  server   –  Use  of  realis-c  network  traffic   •  replay  or  extrapolate  from  execu-ons  of  real  applica-ons  in   produc-on  datacenters   –  Mimics  framework  internals  (e.g,  transfer  scheduling,   phases  overlaps,  etc.)   –  Unmodified  code  also  run  in  real  hardware  
  • 7. MRemu  Architecture   7   Job Trace Hadoop Job Tracing Synthectic Job Generator Topology Description Mininet-HiFI Topology Builder Application Launcher Network Monitor Data center emulator TaskTracker Job Trace Parser Traffic Generator Hadoop MapReduce emulator Logger JobTracker *Mininet  can  be  replaced  with  real  hardware  (e.g.,  run  in  legacy  clusters)   *  
  • 8. Evalua-on   •  Mininet-­‐HiFi  has  already  been  validated  and  is  widely   used  to  reproduce  networking  research  experiments   •  Accuracy  when  reproducing  MapReduce  workloads.     –  Comparison  with  traces  extracted  from  real  job  execu-ons   –  Two  opera-ons  modes:  replay  mode  and  hadoop  mode   •  Execu-on  environment:   –  Shamrock  datacenter,  IBM  Research   –  HiBench  Benchmark  Suite:    Sort,  Nutch,  PageRank  and   Bayes   8   Handigol,  N.;  Heller,  B.;  Jeyakumar,  V.;  Lantz,  B.;  McKeown,  N.  “Reproducible  Network  Experiments
  • 9. Accuracy  Evalua-on   9   Job  Comple-on  Time  Accuracy   Individual  Flow  Comple-on  Time  Accuracy  
  • 10. Nutch  applica-on  with  background  traffic   Sort  applica-on  with  background  traffic   Par--on  skew  problem   Impact  of  the  network  topology   Other  experiments  
  • 11. Conclusion  and  Future  Work   •  MRemu,  an  emula-on-­‐based  framework  that  enables   conduc-ng  datacenter  network  research   –  without  requiring  expensive  and  con-nuous  access  to   large-­‐scale  datacenter  hardware  resources     •  Available  as  open  source:   –  hWps://github.com/mvneves/mremu   •  Future  work:   –  Extend  it  to  other  frameworks  and  traffic  paWerns   –  Integrate  it  with  Mininet-­‐HiFI  cluster  edi-on   –  Support  to  migra-on  of  “virtual  machines”  
  • 12. References   •  NEVES,  M.  V.,:  Applica-on-­‐aware  networking  to  Accelerate   MapReduce  Applica-ons  (Ph.D.  Disserta-on),  2015   •  NEVES,  M.  V.;  KATRINIS,  M.  K.;  FRANKE,  H.;  DE  ROSE,  C.  A.  F.;   Pythia:  Faster  Big  Data  in  Mo-on  through  Predic-ve  SoZware-­‐ Defined  Network  Op-miza-on  at  Run-me.  In:  IPDPS  2014,   Phoenix,  USA,  2014     •  NEVES,  M.  V.,  DE  ROSE,  C.  A.  F.,  KATRINIS,  K.  MRemu:  An   Emula-on-­‐based  Framework  for  Datacenter  Network   Experimenta-on  using  Realis-c  MapReduce  Traffic,  MASCOTS   2015,  Atlanta,  USA,  2015.  
  • 13. MRemu:  An  Emula-on-­‐based   Framework  for  Datacenter  Network   Experimenta-on  using  Realis-c   MapReduce  Traffic     Marcelo  Veiga  Neves1,  Cesar  A.  F.  De  Rose1,     Kostas  Katrinis2   marcelo.neves@pucrs.br       1  PUCRS,  Porto  Alegre,  Brazil   2  IBM  Research,  Dublin,  Ireland   Oct,  2015