Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Strangeloop2012
1. Making Hadoop Real Time with
Scala & GridGain
Nikita Ivanov, Founder & CEO GridGain Systems
September 2012 www.gridgain.com
1065 East Hillsdale Blvd, Suite 230
Foster City, CA 94404
#gridgain
2. Table Of Contents:
> 30% > 70%
> Why Real Time Hadoop? > Live Coding
> GridGain Overview > Real Time & Streaming Word Count
> In-Memory Computing
> Compute & Data Grids
www.gridgain.com Slide 2
4. GridGain Overview:
> Scalable In-Memory Data Platform > Full ACID
In-Memory Compute Grid + In-Memory Data Grid Fully distributed ACID transactions
Real Time & Streaming MapReduce, CEP
> Three Editions:
> Simplicity and Productivity
Dramatically reduces cost of application development
> “Compute Grid” Edition Demonstrably faster time-to-market
Targeted at HPC market
> “Data Grid” Edition Example:
Targeted at Transactional Data Caching market
Full source code in Scala of world’s shortest real time MapReduce app
> “Big Data” Edition built with GridGain. Works on one or thousands of computers with no
code changes required.
Targeted at Real Time Big Data market
> Language support:
Server: Java, Scala, Groovy,
Clients: .NET, PHP, REST, C++
> Mobile platforms support:
iOS/ObjectiveC, Android clients
www.gridgain.com Slide 4
5. In-Memory Computing: Why Now?
“In-memory will have an industry impact
comparable to web and cloud. RAM is a new disk,
and disk is a new tape.”
Technology & Cost: Performance & Scalability Matters:
> 64-bit CPU can address 16 exabytes > Citi: 100ms == $1M loss
Entire active data set on the planet is addressable by just 1 CPU. Forex trading
> Disk up to 105 times slower than DRAM > Google: 500ms == 20% traffic drop
Dropping 20% of revenue
SSD drives are up to 103 times slower
> Super effective in-memory parallelization
> SAP sees +206% in profit in Q112
For in-memory SAP HANA products
Enabled by modern multicore CPUs
> Software AG sees 3x revenue in 2012
> DRAM prices drop 30% every 18 months For in-memory Terracota products
1TB RAM & 48 cores cluster ~ $40K (< $20K in 3 years)
www.gridgain.com Slide 5
6. GridGain: In-Memory Compute Grid
> Direct API for split and aggregation
> Pluggable failover, topology and collision resolution
> Distributed task session
> Distributed continuations & recursive split
> Support for Streaming MapReduce
> Support for Complex Event Processing (CEP)
> Node-local cache
> AOP-based, OOP/FP-based execution modes
> Direct closure distribution in Java, Scala and Groovy
> Cron-based scheduling
> Direct redundant mapping support
> Zero deployment with P2P class loading
> Partial asynchronous reduction
> Direct support for weighted and adaptive mapping
> State checkpoints for long running tasks
> Early and late load balancing
> Affinity routing with data grid
www.gridgain.com Slide 6
7. GridGain: In-Memory Data Grid
> Zero deployment for data
> Local, full replicable and partitioned cache types
> Pluggable expiration policies (LRU, LIRS, random, time-based)
> Read-through and write-through logic with pluggable cache store
> Synchronous and asynchronous cache operations
> MVCC-based concurrency
> Pluggable data overflow storage via new swap space SPI
> PESSIMISTIC, OPTIMISTIC transactions
> Standard isolation levels, JTA/JCA integration
> Master/Master data replication/invalidation
> Write-behind cache store support
> Concurrent and transactional data preloading
> Delayed preloading support
> Affinity routing with compute grid
> Partitioned cache with active replicas
> Structured and unstructured data
> Datacenter replication
> JDBC driver for in-memory object data store
> Off-heap memory support
> Pluggable indexing via Indexing SPI
> Tiered storage with on-heap, off-heap, swap space, SQL, and Hadoop
> Distributed in-memory query capability
> SQL, H2, Lucene, predicate-based affinity co-located queries
www.gridgain.com Slide 7
8. Live Coding: GridGain + Scala
> 100% Live Coding:
> Nothing pre-built
> Every line & character
> Everything from the start
www.gridgain.com Slide 8
Comments:\n---------------- \nExample:\nCounts number of non-space characters in given argument by spreading the processing on multiple remote nodes. \nThis is 100% full source code. Can be written in Java or Groovy just as easily.\nCompile and run it. Works transparently on one computer or on thousands.\nFully distributed and parallelized processing.\nNo deployment or provisioning required - 100% zero deployment.\nDevelop and start in the cloud in minutes.\nIt will automatically load balance and fail over, if necessary.\nIt will automatically scale up and down based on dynamically available resources.\n
Comments:\n---------------\n- In-Memory Computing is high growth industry:\n- SAP has seen 206% increase in profit in Q112 in large due to SAP HANA, in-memory processing product\n- IBM grew in-memory computing revenue 80% in 2011\n- Software AG 3x revenue in 2012 for in-memory computing products\n- VMWare is looking at 30% increase in in-memory computing products\n\nGartner References:\n- Top 10 Strategic Technology Trends: In-Memory Computing Massimo Pezzini, Carl Claunch, Joseph Unsworth (G00230658)\n- Insight: Invest in In-Memory Computing for Breakthrough Competitive Advantage Massimo Pezzini (G00226070)\n- What CIOs Need to Know About In-Memory Database Management Systems Roxane Edjlali, Donald Feinberg (G00215735)\n- Need for Speed Powers In-Memory Business Intelligence James Richardson (G00212168)\n- Predicts 2012: Cloud and In-Memory Drive Innovation in Application Platforms Massimo Pezzini, Yefim Natis, Eric Knipp (G00226073)\n\nLinks:\n- http://slidesha.re/O0htOV\n- http://nyti.ms/Ngr4qF\n- http://bit.ly/Ngr8qh\n- http://bit.ly/NZu0Dw\n\n\n\n