SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Casual mass parallel data
processing in Java
Alexey Ragozin
Mar 2014
Building new bicycle …
Build Vs. Buy
Build
• No dedicated team to
support infrastructure
• Very specific tasks
• Exclusive use of
infrastructure
• Reasonable scale
Buy
• Product can bought as
service (internal or external)
• Large scale
• Multi tenancy
• You are going to use
advanced features
(e.g. map/reduce)
“Casual” computing
• Small computation farms (< 100 servers)
• Team owns both application and grid
• Java platform
• Reasonably short batches (< 24 hours)
• Reasonably small data sets (< 10 TiB)
Simple master slave topology
Master process
Task queue
Slave Slave Slave
Scheduler
AdvertiseTask
Report
Simple master slave topology
Control plane
 RMI
Queue / scheduler
 Simple in memory queue
 May be more complex than just task queue
Data plane
…
Data plane
Never, ever, try to send data over RMI 
File system
 Avoid network mounts!
In-memory key-value
 Client side sharding works best
Disk database (RDBMS or NoSQL)
 Consider prefetch of data
Direct socket streaming
…
Distributed objects revised
Pit falls of CORBA/RMI
• IDL – functional contract
• IDL – protocol
Separating concerns
• Functional contract – wrapper object
• Protocol – hidden remote interface
Distributed objects revised
Renewed distributed objects paradigm
Strong
• Polymorphism
• Encapsulation
 Network protocol, caching aspects etc
Weak
• Homogenous code base required
• Synchronous network communications
Brute force
 Build / package
 Deploy / SCP
 Restart slaves
 Start batch
 Change code, repeat
Deployment problem
Computation grid software
 Compile and run batch
Behind scene
 Your classes would be collected
 Associated with batch
 Deployed on participating slaves
Central scheduler topology
Batch controller
Slave Slave Slave
Pull task
Task
Report
Queue server
Task queue
Batch controller
Add tasks
Consume
reports
Or more elaborated
Flow organized tasks
• Input data available before
task starts
• e.g. Map/Reduce
Collaborative tasks
• Tasks communicate
intermediate results to each
other
• e.g. physic simulations
Flavors of parallel processing
Get back to data plane
Rules of thumb
• Insert / delete – never update
• Write locally (reducing risks)
• Read remotely (retry on error)
• Store input as is
 File system
 Document / column oriented NoSQL
• Input and temporary data is different
 Choose right store for each
Exploiting file system
Avoid network file systems
• File system concept is not designed to be distributed
• Good network file system cannot not exists
• Use simple remote file access protocols
• SCP (unencrypteddatatransferoptionsaddedbyCERNguys)
• HTTP (ifyoureallydonotwantSCP)
Cheap SAN could be build from open source
Algorithmic optimization
Parallel computing
• N times speed up will increase
your OPEX and CAPEX cost by N*lg(N)
Algorithmic optimization
• Up front costs only
• Orders of magnitude optimization opportunities
• Exciting coding
• Ecological way of computing 
Streaming algorithms
Finding N most frequent elements
• Min-Count
Estimating number of unique values
• HyperLogLog
Distribution histograms
https://github.com/addthis/stream-lib
https://github.com/rwl/ParallelColt
NanoCloud – drastically simplified
coding for computing clusters
@Test
public void hello_remote_world() {
Cloud cloud = CloudFactory.createSimpleSshCloud();
cloud.node("myserver.acme.com").exec(new Callable<Void>(){
@Override
public Void call() throws Exception {
String localhost = InetAddress.getLocalHost().toString();
System.out.println("Hi! I'm running on " + localhost);
return null;
}
});
}
As easy as …
All you need is …
NanoCloud requirements
 SSHd
 Java (1.6 and above) present
 Works though NAT and firewalls
 Works on Amazon EC2
 Works everywhere where SSH works
Master – slave communications
Master process Slave hostSSH
(Single TCP)
Slave
Slave
RMI
(TCP)
std err
std out
std in
diag
Slave
controller
Slave
controller
multiplexed slave streams Agent
Links
NanoCloud
• https://code.google.com/p/gridkit/wiki/NanoCloudTutorial
• Maven Central: org.gridkit.lab:telecontrol-ssh:0.7.23
• http://blog.ragozin.info/2013/01/remote-code-execution-in-java-made.html
ANT task
• https://github.com/gridkit/gridant
Thank you
Alexey Ragozin
alexey.ragozin@gmail.com
http://blog.ragozin.info
- my articles
http://code.google.com/p/gridkit
http://github.com/gridkit
- my open source code
http://aragozin.timepad.ru
- community events in Moscow

Más contenido relacionado

La actualidad más candente

Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Rajeev Rastogi (KRR)
 
Performance Tuning - Understanding Garbage Collection
Performance Tuning - Understanding Garbage CollectionPerformance Tuning - Understanding Garbage Collection
Performance Tuning - Understanding Garbage CollectionHaribabu Nandyal Padmanaban
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
Lessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tPGConf APAC
 
使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡Lawrence Huang
 
Distributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperDistributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperAlex Ehrnschwender
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandraJon Haddad
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance BenchmarkingSantanu Dey
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...DataStax
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniZalando Technology
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper lookJignesh Shah
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaPostgreSQL-Consulting
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data storesTomas Doran
 
CCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloudCCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloudwalk2talk srl
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemAvleen Vig
 

La actualidad más candente (20)

Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Performance Tuning - Understanding Garbage Collection
Performance Tuning - Understanding Garbage CollectionPerformance Tuning - Understanding Garbage Collection
Performance Tuning - Understanding Garbage Collection
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Lessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’t
 
使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡
 
Distributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperDistributed Applications with Apache Zookeeper
Distributed Applications with Apache Zookeeper
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandra
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper look
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data stores
 
CCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloudCCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloud
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
PostgreSQL Terminology
PostgreSQL TerminologyPostgreSQL Terminology
PostgreSQL Terminology
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 

Similar a Casual mass parallel computing

Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in JavaAltoros
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Kognitio - an overview
Kognitio - an overviewKognitio - an overview
Kognitio - an overviewKognitio
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overviewStreamHorizon
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture PerformanceEnkitec
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceEnkitec
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StoryBrian Cline
 

Similar a Casual mass parallel computing (20)

Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in Java
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Kognitio - an overview
Kognitio - an overviewKognitio - an overview
Kognitio - an overview
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
Server Tips
Server TipsServer Tips
Server Tips
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer Story
 

Más de aragozin

Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and opsaragozin
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slowaragozin
 
Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016aragozin
 
Распределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на JavaРаспределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на Javaaragozin
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?aragozin
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourselfaragozin
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profileraragozin
 
Java black box profiling
Java black box profilingJava black box profiling
Java black box profilingaragozin
 
Блеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшейБлеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшейaragozin
 
JIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutionsJIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutionsaragozin
 
Nanocloud cloud scale jvm
Nanocloud   cloud scale jvmNanocloud   cloud scale jvm
Nanocloud cloud scale jvmaragozin
 
Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)aragozin
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVMaragozin
 
Cборка мусора в Java без пауз (HighLoad++ 2013)
Cборка мусора в Java без пауз  (HighLoad++ 2013)Cборка мусора в Java без пауз  (HighLoad++ 2013)
Cборка мусора в Java без пауз (HighLoad++ 2013)aragozin
 
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)aragozin
 
Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)aragozin
 
Борьба с GС паузами в JVM
Борьба с GС паузами в JVMБорьба с GС паузами в JVM
Борьба с GС паузами в JVMaragozin
 
Распределённый кэш или хранилище данных. Что выбрать?
Распределённый кэш или хранилище данных. Что выбрать?Распределённый кэш или хранилище данных. Что выбрать?
Распределённый кэш или хранилище данных. Что выбрать?aragozin
 
Devirtualization of method calls
Devirtualization of method callsDevirtualization of method calls
Devirtualization of method callsaragozin
 
Tech talk network - friend or foe
Tech talk   network - friend or foeTech talk   network - friend or foe
Tech talk network - friend or foearagozin
 

Más de aragozin (20)

Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slow
 
Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016
 
Распределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на JavaРаспределённое нагрузочное тестирование на Java
Распределённое нагрузочное тестирование на Java
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourself
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profiler
 
Java black box profiling
Java black box profilingJava black box profiling
Java black box profiling
 
Блеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшейБлеск и нищета распределённых кэшей
Блеск и нищета распределённых кэшей
 
JIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutionsJIT compilation in modern platforms – challenges and solutions
JIT compilation in modern platforms – challenges and solutions
 
Nanocloud cloud scale jvm
Nanocloud   cloud scale jvmNanocloud   cloud scale jvm
Nanocloud cloud scale jvm
 
Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)Java GC tuning and monitoring (by Alexander Ashitkin)
Java GC tuning and monitoring (by Alexander Ashitkin)
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
 
Cборка мусора в Java без пауз (HighLoad++ 2013)
Cборка мусора в Java без пауз  (HighLoad++ 2013)Cборка мусора в Java без пауз  (HighLoad++ 2013)
Cборка мусора в Java без пауз (HighLoad++ 2013)
 
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)
 
Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)Performance Test Driven Development (CEE SERC 2013 Moscow)
Performance Test Driven Development (CEE SERC 2013 Moscow)
 
Борьба с GС паузами в JVM
Борьба с GС паузами в JVMБорьба с GС паузами в JVM
Борьба с GС паузами в JVM
 
Распределённый кэш или хранилище данных. Что выбрать?
Распределённый кэш или хранилище данных. Что выбрать?Распределённый кэш или хранилище данных. Что выбрать?
Распределённый кэш или хранилище данных. Что выбрать?
 
Devirtualization of method calls
Devirtualization of method callsDevirtualization of method calls
Devirtualization of method calls
 
Tech talk network - friend or foe
Tech talk   network - friend or foeTech talk   network - friend or foe
Tech talk network - friend or foe
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Casual mass parallel computing

  • 1. Casual mass parallel data processing in Java Alexey Ragozin Mar 2014
  • 3. Build Vs. Buy Build • No dedicated team to support infrastructure • Very specific tasks • Exclusive use of infrastructure • Reasonable scale Buy • Product can bought as service (internal or external) • Large scale • Multi tenancy • You are going to use advanced features (e.g. map/reduce)
  • 4. “Casual” computing • Small computation farms (< 100 servers) • Team owns both application and grid • Java platform • Reasonably short batches (< 24 hours) • Reasonably small data sets (< 10 TiB)
  • 5. Simple master slave topology Master process Task queue Slave Slave Slave Scheduler AdvertiseTask Report
  • 6. Simple master slave topology Control plane  RMI Queue / scheduler  Simple in memory queue  May be more complex than just task queue Data plane …
  • 7. Data plane Never, ever, try to send data over RMI  File system  Avoid network mounts! In-memory key-value  Client side sharding works best Disk database (RDBMS or NoSQL)  Consider prefetch of data Direct socket streaming …
  • 8. Distributed objects revised Pit falls of CORBA/RMI • IDL – functional contract • IDL – protocol Separating concerns • Functional contract – wrapper object • Protocol – hidden remote interface
  • 9. Distributed objects revised Renewed distributed objects paradigm Strong • Polymorphism • Encapsulation  Network protocol, caching aspects etc Weak • Homogenous code base required • Synchronous network communications
  • 10. Brute force  Build / package  Deploy / SCP  Restart slaves  Start batch  Change code, repeat Deployment problem Computation grid software  Compile and run batch Behind scene  Your classes would be collected  Associated with batch  Deployed on participating slaves
  • 11. Central scheduler topology Batch controller Slave Slave Slave Pull task Task Report Queue server Task queue Batch controller Add tasks Consume reports
  • 13. Flow organized tasks • Input data available before task starts • e.g. Map/Reduce Collaborative tasks • Tasks communicate intermediate results to each other • e.g. physic simulations Flavors of parallel processing
  • 14. Get back to data plane Rules of thumb • Insert / delete – never update • Write locally (reducing risks) • Read remotely (retry on error) • Store input as is  File system  Document / column oriented NoSQL • Input and temporary data is different  Choose right store for each
  • 15. Exploiting file system Avoid network file systems • File system concept is not designed to be distributed • Good network file system cannot not exists • Use simple remote file access protocols • SCP (unencrypteddatatransferoptionsaddedbyCERNguys) • HTTP (ifyoureallydonotwantSCP) Cheap SAN could be build from open source
  • 16. Algorithmic optimization Parallel computing • N times speed up will increase your OPEX and CAPEX cost by N*lg(N) Algorithmic optimization • Up front costs only • Orders of magnitude optimization opportunities • Exciting coding • Ecological way of computing 
  • 17. Streaming algorithms Finding N most frequent elements • Min-Count Estimating number of unique values • HyperLogLog Distribution histograms https://github.com/addthis/stream-lib https://github.com/rwl/ParallelColt
  • 18. NanoCloud – drastically simplified coding for computing clusters
  • 19. @Test public void hello_remote_world() { Cloud cloud = CloudFactory.createSimpleSshCloud(); cloud.node("myserver.acme.com").exec(new Callable<Void>(){ @Override public Void call() throws Exception { String localhost = InetAddress.getLocalHost().toString(); System.out.println("Hi! I'm running on " + localhost); return null; } }); } As easy as …
  • 20. All you need is … NanoCloud requirements  SSHd  Java (1.6 and above) present  Works though NAT and firewalls  Works on Amazon EC2  Works everywhere where SSH works
  • 21. Master – slave communications Master process Slave hostSSH (Single TCP) Slave Slave RMI (TCP) std err std out std in diag Slave controller Slave controller multiplexed slave streams Agent
  • 22. Links NanoCloud • https://code.google.com/p/gridkit/wiki/NanoCloudTutorial • Maven Central: org.gridkit.lab:telecontrol-ssh:0.7.23 • http://blog.ragozin.info/2013/01/remote-code-execution-in-java-made.html ANT task • https://github.com/gridkit/gridant
  • 23. Thank you Alexey Ragozin alexey.ragozin@gmail.com http://blog.ragozin.info - my articles http://code.google.com/p/gridkit http://github.com/gridkit - my open source code http://aragozin.timepad.ru - community events in Moscow