SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
If the Data Cannot Come to
the Algorithm...
many cores with java
session four
data locality
copyright 2013 Robert Burrell Donkin robertburrelldonkin.name
this work is licensed under a Creative Commons Attribution 3.0 Unported License
Pre-emptive multi-tasking operating
systems use involuntary context switching
to provide the illusion of parallel processes
even when the hardware supports only a
single thread of execution.
Take Away from Session One
Even on a single core,
there's no escaping parallelism.
Take Away from Session Two
Take Away from Session Three
Code executing on different cores uses copies held
in registers and caches, so memory shared is likely
to be incoherent unless the program plays by the
rules of the software platform.
Gustafson's Law
S(p) = p - a (p-1)
● S(p) is the speedup for pprocessors
● a is the non-parallelizable fraction
"in practice, the problem size scales with the number of
processors" John L. Gustafson
● Think about Gustafson's Law...
● The quantity of data processed...
● ...scales linearly as processors added.
● Throwing processors at the problem
works...
● ...at least sometimes.
Scales and Scaling
Divide and Conquer
● Back to the future
● Partition the data...
○ ...apply the same algorithm to each part and then
○ ...collate the answers.
● Natural to parallelise
● No contended shared memory
Data Locality
● When the algorithm is small
○ it's more efficient
■ to bring the algorithm to the data
■ than the data to the algorithm
● Whether the data is in
○ caches on cores in a many core computer, or in
○ disc storage in a distributed data store
Map and Reduce
● Partition the data
● The map algorithm
○ works in parallel
○ on local data
○ independently
● The reduce algorithm
○ collates output from map algorithms
● More complex systems built from these blocks
Map-Reduce
As a Query Language
● NoSQL
● A popular alternative to SQL
○ for distributed data stores
● Why...?
○ Easy to
■ read and write
■ parallelize
○ Rich and full programming model
Map-Reduce
Crunching Big Data
● Commodity hardware
● Scales up to Terabyte and Petabyte
○ smoothly by adding new nodes
● Map-Reduce platforms typically provide
○ fault tolerance eg. retry
○ orchestration
○ redundant data storage
● Statistical resilience
Take Away
When you want to be able to process big data
tomorrow by adding cores or computers, adopt
an appropriate architecture today.

Más contenido relacionado

La actualidad más candente

Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
Harisankar H
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
Riyad Parvez
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
Christina Yu
 

La actualidad más candente (20)

Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek MukherjeeIntroduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee
 
EC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed ProcessingEC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed Processing
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)
 
Pregel
PregelPregel
Pregel
 
Caffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelCaffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noel
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
 
Tech Talk - Underutilized Resources in Distributed System
Tech Talk - Underutilized Resources in Distributed SystemTech Talk - Underutilized Resources in Distributed System
Tech Talk - Underutilized Resources in Distributed System
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
 
Scheduling for Parallel and Multi-Core Systems
Scheduling for Parallel and Multi-Core SystemsScheduling for Parallel and Multi-Core Systems
Scheduling for Parallel and Multi-Core Systems
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
 
Multi core processing of xml twig patterns
Multi core processing of xml twig patternsMulti core processing of xml twig patterns
Multi core processing of xml twig patterns
 
Tensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute LibraryTensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute Library
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Serving deep learning models in a serverless platform (IC2E 2018)
Serving deep learning models in a serverless platform (IC2E 2018)Serving deep learning models in a serverless platform (IC2E 2018)
Serving deep learning models in a serverless platform (IC2E 2018)
 

Destacado

Fifty Year Of Microprocessor
Fifty Year Of MicroprocessorFifty Year Of Microprocessor
Fifty Year Of Microprocessor
Ali Usman
 
Xilinx fpga cores
Xilinx fpga coresXilinx fpga cores
Xilinx fpga cores
sanaz nouri
 

Destacado (11)

Fifty Year Of Microprocessor
Fifty Year Of MicroprocessorFifty Year Of Microprocessor
Fifty Year Of Microprocessor
 
ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale
 
Apostila lpt
Apostila lptApostila lpt
Apostila lpt
 
processors
processorsprocessors
processors
 
The Evolution Of Computer
The Evolution Of ComputerThe Evolution Of Computer
The Evolution Of Computer
 
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
 
Genesis & Progression of Processors in CPU
Genesis & Progression of Processors in CPUGenesis & Progression of Processors in CPU
Genesis & Progression of Processors in CPU
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
 
Xilinx fpga cores
Xilinx fpga coresXilinx fpga cores
Xilinx fpga cores
 
Introduction to microprocessor
Introduction to microprocessorIntroduction to microprocessor
Introduction to microprocessor
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 

Similar a If the data cannot come to the algorithm...

Introduction to Memoria
Introduction to MemoriaIntroduction to Memoria
Introduction to Memoria
Victor Smirnov
 
Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_
Robert Burrell Donkin
 

Similar a If the data cannot come to the algorithm... (20)

Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Introduction to Memoria
Introduction to MemoriaIntroduction to Memoria
Introduction to Memoria
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Scalable data systems at Traveloka
Scalable data systems at TravelokaScalable data systems at Traveloka
Scalable data systems at Traveloka
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Caching in
Caching inCaching in
Caching in
 
Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018
 
Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Threads - Why Can't You Just Play Nicely With Your Memory?
Threads - Why Can't You Just Play Nicely With Your Memory?Threads - Why Can't You Just Play Nicely With Your Memory?
Threads - Why Can't You Just Play Nicely With Your Memory?
 

Más de Robert Burrell Donkin

If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...
Robert Burrell Donkin
 

Más de Robert Burrell Donkin (11)

Threads and Threads
Threads and ThreadsThreads and Threads
Threads and Threads
 
If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...
 
An End to Order
An End to OrderAn End to Order
An End to Order
 
An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)
 
Many Cores Java - Session One: Threads and Threads
Many Cores Java - Session One: Threads and ThreadsMany Cores Java - Session One: Threads and Threads
Many Cores Java - Session One: Threads and Threads
 
Apache Maven In 10 Slides
Apache Maven In 10 SlidesApache Maven In 10 Slides
Apache Maven In 10 Slides
 
XP In 10 slides
XP In 10 slidesXP In 10 slides
XP In 10 slides
 
Public Sector: Agile and Open Source
Public Sector: Agile and Open SourcePublic Sector: Agile and Open Source
Public Sector: Agile and Open Source
 
An Agile Pick-N-Mix
An Agile Pick-N-MixAn Agile Pick-N-Mix
An Agile Pick-N-Mix
 
The Pomodoro Technique: Introduced Unofficially In 10 Slides
The Pomodoro Technique: Introduced Unofficially In 10 SlidesThe Pomodoro Technique: Introduced Unofficially In 10 Slides
The Pomodoro Technique: Introduced Unofficially In 10 Slides
 
Retrospectives In 10 Slides (With Notes)
Retrospectives In 10 Slides  (With Notes)Retrospectives In 10 Slides  (With Notes)
Retrospectives In 10 Slides (With Notes)
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

If the data cannot come to the algorithm...

  • 1. If the Data Cannot Come to the Algorithm... many cores with java session four data locality copyright 2013 Robert Burrell Donkin robertburrelldonkin.name this work is licensed under a Creative Commons Attribution 3.0 Unported License
  • 2. Pre-emptive multi-tasking operating systems use involuntary context switching to provide the illusion of parallel processes even when the hardware supports only a single thread of execution. Take Away from Session One
  • 3. Even on a single core, there's no escaping parallelism. Take Away from Session Two
  • 4. Take Away from Session Three Code executing on different cores uses copies held in registers and caches, so memory shared is likely to be incoherent unless the program plays by the rules of the software platform.
  • 5. Gustafson's Law S(p) = p - a (p-1) ● S(p) is the speedup for pprocessors ● a is the non-parallelizable fraction "in practice, the problem size scales with the number of processors" John L. Gustafson
  • 6. ● Think about Gustafson's Law... ● The quantity of data processed... ● ...scales linearly as processors added. ● Throwing processors at the problem works... ● ...at least sometimes. Scales and Scaling
  • 7. Divide and Conquer ● Back to the future ● Partition the data... ○ ...apply the same algorithm to each part and then ○ ...collate the answers. ● Natural to parallelise ● No contended shared memory
  • 8. Data Locality ● When the algorithm is small ○ it's more efficient ■ to bring the algorithm to the data ■ than the data to the algorithm ● Whether the data is in ○ caches on cores in a many core computer, or in ○ disc storage in a distributed data store
  • 9. Map and Reduce ● Partition the data ● The map algorithm ○ works in parallel ○ on local data ○ independently ● The reduce algorithm ○ collates output from map algorithms ● More complex systems built from these blocks
  • 10. Map-Reduce As a Query Language ● NoSQL ● A popular alternative to SQL ○ for distributed data stores ● Why...? ○ Easy to ■ read and write ■ parallelize ○ Rich and full programming model
  • 11. Map-Reduce Crunching Big Data ● Commodity hardware ● Scales up to Terabyte and Petabyte ○ smoothly by adding new nodes ● Map-Reduce platforms typically provide ○ fault tolerance eg. retry ○ orchestration ○ redundant data storage ● Statistical resilience
  • 12. Take Away When you want to be able to process big data tomorrow by adding cores or computers, adopt an appropriate architecture today.