5. History
2004 2008 2014
• Massive increase in data
volumes
• Falling margins per
transaction
• Increasing cost of IT
maintenance
• Need for elasticity in
systems
• Financial Services
Providers (every major
Wall Street bank)
• Department of Defense
• Real Time response needs
• Time to market constraints
• Need for flexible data
models across enterprise
• Distributed development
• Persistence + In-memory
• Global data visibility needs
• Fast Ingest needs for data
• Need to allow devices to
hook into enterprise data
• Always on
• Largest travel Portal
• Airlines
• Trade clearing
• Online gambling
• Largest Telcos
• Large mfrers
• Largest Payroll processor
• Auto insurance giants
• Largest rail systems on
earth
• 1000+ customers in production
• Cutting edge use cases
6. Interesting use cases
China Railway
Corporation
5,700 train stations
4.5 million tickets per day
20 million daily users
1.4 billion page views per day
40,000 visits per second
*http://pivotal.io/big-data/pivotal-gemfire
Indian Railways
7,000 stations
72,000 miles of track
23 million passengers daily
120,000 concurrent users
10,000 transactions per minute
7. Interesting use cases
Indian RailwaysChina Railway Corporation
World: ~7,349,000,000
~36% of the world population
Population: 1,251,695,6161,401,586,609
14. • Cache
• Region
• Member
• Client Cache
• Functions
• Listeners
Geode Concepts and Usage
15. • Cache
• In-memory storage and management for your data
• Configurable through XML, Spring, Java API or CLI
• Collection of Region
Region
Region
Region
Cache
JVM
Concepts
16. Concepts
• Region
• Distributed java.util.Map on steroids (Key/Value)
• Consistent API regardless of where or how data is stored
• Observable (reactive)
• Highly available, redundant on cache Member (s).
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
17. Concepts
• Region
• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY
18. • Persistent Regions
• Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
Concepts
Modify
k1->v5
Create
k6->v6
Create
k2->v2
Create
k4->v4
Oplog2.crf
Member
1
Modify
k4->v7Oplog3.crf
Put k4->v7
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Server 1 Server N
19. • Member
• A process that has a connection to the system
• A process that has created a cache
• Embeddable within your application
Concepts
Client
Locator
Server
20. Concepts
• Client cache
• A process connected to the Geode server(s)
• Can have a local copy of the data
• Can be notified about events on the servers
Application
GemFire Server
Region
Region
RegionClient Cache
21. Concepts
• Functions
• Used for distributed concurrent processing
(Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
Submit (f1)
f1 , f2 , … fn
Execute
Functions
25. Why Open Source? Why ASF?
• Open source is fundamentally changing software buying patterns
• Customers get transparency and co-development of features
• It’s the community that matters
• ASF provides a framework for open source
26. Geode Will Be a Significant Apache Project
• 1M+ LOC, over a 1000 person years invested into cutting edge R&D
• Thousands of production customers in very demanding verticals
• Cutting edge use cases that have shaped product thinking
• A core technology team that has stayed together since founding
• Performance differentiators that are baked into every aspect of the product
27. Geode versus GemFire
• Geode is a project supported by the OSS community
• GemFire is product from Pivotal, based on Geode source
• We donated everything but the kitchen sink*
• Development process follows “The Apache Way”
* Multi-site WAN replication, continuous queries, and native (C/C++) client
28. "Talk is cheap, show me the code"
• Clone & Build
git clone https://github.com/apache/incubator-geode
cd incubator-geode
./gradlew build -Dskip.tests=true
• Start a server
cd gemfire-assembly/build/install/apache-geode
./bin/gfsh
gfsh> start locator --name=locator
gfsh> start server --name=server
gfsh> create region --name=myRegion --type=REPLICATE
29. How to Get Involved
• http://geode.incubator.apache.org
• Join the mailing lists; ask a question, answer a question, learn
dev@geode.incubator.apache.org
user@geode.incubator.apache.org
• File a bug in JIRA
• Update the wiki, website, or documentation
• Create example applications
• Use it in your project! We need you!
31. • RDD
• Dataframe
• Driver
• Worker
Quick intro to Apache Spark
"An RDD in Spark is simply an immutable distributed collection of objects.
Each RDD is split into multiple partitions, which may be computed on different nodes
of the cluster. RDDs can contain any type of Python, Java, or Scala objects,
including user-defined classes."
32. • RDD
• Dataframe
• Driver
• Worker
Quick intro to Apache Spark
“A dataframe is a distributed collection of rows organized into named columns. An
abstraction for selecting, filtering and plotting structured data (pandas), previously
known as SchemaRDD."
34. Live Data
Apache Geode / GemFire
1- Live data is ingested into the grid
2 - Trained ML model compares
new data to historical patterns
3 - Results are pushed
immediately to deployed
applications
Machine Learning
model
4 - Re-training is triggered, updating
the model with the latest historical data
Spring XD
Spring XD
Data
Temperature
Hot
Warm