SlideShare una empresa de Scribd logo
1 de 60
Descargar para leer sin conexión
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.1
Use your current wisely
Harvey Raja
Oracle
Chris Neal
Pegasus
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.3
§  Senior Engineer
§  Oracle Coherence
Introduction
Harvey Raja
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.4
§  Systems Architect
§  Pegasus Solutions
Introduction
Chris Neal
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.5 5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
The following is intended to outline our general
product direction. It is intended
for information purposes only, and may not be
incorporated into any contract.
It is not a commitment to deliver any material,
code, or functionality, and should not be relied
upon in making purchasing decisions. The
development, release, and timing of any features
or functionality described for Oracle s products
remains at the sole discretion of Oracle.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.6
What’s all this about?
§  Big Data / Big Memory on a transistor diet
§  Applications and conceived concerns
§  Object Profiling
§  Elastic Data
§  Improvements
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.7
Heap me up
§  JVM manages our objects
§  Understands Live Data
–  References
–  Free Lists
new Object()
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.8
Heap me up
§  Two distinct regions of data locality
–  Young Generation
–  Old Generation
§  Allows conscious distinction between:
–  Long living objects
–  Short lived objects
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.9
Heap me up
§  All memory allocations are against the
same resource
§  Why would it be any different?
§  Provides means to access
–  off-heap memory
–  File Descriptors a.k.a. any resource
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.10
Heap me up
JVM
100%
0%
0%
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.11
Applications & Their Objects
§  Every application has very different uses of objects
–  Size
–  Scope
§  Structures / Containers
–  Structures {Containers}
–  Containers {Structures}
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.12
Application Object Profiles
§  Ye olde faithful… Pet Store
§  Short-lived objects
–  Search Results
§  Long-living objects
–  Popular items
JEE Pet Store
JEE Pet Store
JEE Pet Store
JEE Pet Store
Pet Store
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.13
Application Object Profiles
§  Foreign Exchange Position Keeping
§  Aggregate Trades Values per
currency pair
§  Some currencies are a ‘busier’
§  Currencies may have varying SLAs
FX Position Keeping
T
T
T
T
USD-GBP
USD-EUR
T
T T
T
T
T
T
T
T
T
T
T
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.14
What we know about applications?
§  The application understands more about each object than the JVM
–  Frequency
–  Size
§  Keeping everything in RAM is
possible but is it efficient?
§  Huge Leap between Object on heap &
stored in DB
-  Recency
-  Custom characteristics of the object
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.15
Can the JVM just make the right choice?
§  The JVM would have to span multiple devices
§  Non-Heap must be serialized
§  Applications are diverse therefore to make generic decisions on object
usage would likely lead to false-positives
§  Down to the application or a layer above the JVM allowing users to
define resource assignment policies
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.16
Municipality
§  The application deems its own usage of each resource
§  JVM provides primitives to load & store to these devices
§  May be useful to have an API that performs this storage appropriate to
the device:
–  Routing stores to appropriate device
–  Handle concerns of multiple applications on the same JVM
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.17
CPU Architectures
§  As latency increases so does
capacity
§  Data fetched as required by instructions
§  Data is demoted as well as writes to
shared data rippling through the caches
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.18
CPU Architectures
Speed Capacity
Registry 1ns 0.00
L1 Cache 2-5ns 2x32KB
L2 Cache 5ns 256KB
L3 Cache 20ns 8MB
Main Memory 20-60ns 16GB
Nehalem2GHZ
processor
Mayfair
Kensington
Camden
Wimbledon
Manchester
§  Perhaps we should charge our objects rental premiums?
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.19
Blank Canvas
£600pcm
£1800pcm
£2400pcm
£100000pcm
£100
Per TimeUnit
£500
Per TimeUnit
£250
Per TimeUnit
£1000
Per TimeUnit
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.20
How do you choose the right property?
§  How do you select the right property:
–  How often are you in the office?
–  How long does it take you to get into the
office?
–  How much space do you occupy in the
office?
–  How do you get to the office?
–  MFU
–  Device latency
–  Object size
–  Every device has its own
quirks
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.21
Translated to memory
USD -> GBP
USD-> JMD
Heap RAM
(off heap)
Disk
GBP -> MUR
Usage
Pet store
example
Pedigree Chum
Orijen
Dog Ugg boots
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22
Translated to memory
§  Objects held on heap are:
–  Structures (Containers)
–  Containers (Structures)
§  Similar to a file descriptor, each object has its own metadata:
–  Access Time
–  Modified Time
§  With ((Map) Containers) we already have a location to store
metadata
-  Size
-  Touch Count
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.23
Translated to memory
CPU
Heap
NIO
Flash
Mechanical Disk
§  Each object has metadata
§  Some policy can manage these
objects
§  Demoted & promoted to the
various media types
§  Big jump from heap to NIO
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.24
Brainstorm Summary
§  Cherish your high commodity investments
§  Reduce the regularity of going to a highly contended foreign
resource
§  Would be ideal to have objects float between high latency resources
using a telepathic API
–  Having some metadata could drive our decision for data locality
§  Map provides a nice abstraction for objects that should ripple
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.25
Challenges
§  Serialization cost
§  Generally interactions are performed against the object form
§  False Positives
§  Device type peculiarities
§  A handle to the object (key) and metadata must be held on heap
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.26
Elastic Data
§  A Feature in Coherence
§  Store binary key and value objects
in RAM or Disk
§  Overflow from RAM to Disk
§  RAM can be configured as NIO
store(byte[] key, byte[] val)
load(byte[] key)
erase(byte[] key)
RAM
Flash
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.27
Elastic Data
§  Its simply writing a number of bytes to some stream?
§  How do you maintain handles?
§  Need a pointer to the written data?
§  How about updates, seek & replace?
Easy Peasy
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.28
Elastic Data
§  Require a compact structure to hold handles (keys) to device pointers
§  Journal writes to the file system
§  Consistent API regardless of write to RAM, NIO, Flash or Mechanical
Disk
§  Buffer writes with thread dedicated to writing
Implementation Details
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.29
Elastic Data
FlashJournal
RAMJournal
Collector
Overflow
Preparer
Writer
Binary Key Pointer
00110101001
11001001011
...
...
Serialize
110101
LFU
Collector
Store Index
Object deemed
unworthy of Heap
store(key, value)Pointer returned
Create
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.30
Elastic Data
FlashJournal
RAMJournal
Collector
Overflow
Preparer
Writer
Binary Key Pointer
00110101001
11001001011
...
...
Collector
Store Index
Deserialize
110101
Deserialized object
returned
load(pointer)binary value
The binary
key provides
the physical
location of
the stored
item
Read
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.31
Elastic Data
Elastic Data
100%
100%
100%
Utilization
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.32
Elastic Data
§  RAM Journal can be used with NIO ByteBuffers
–  Memory managed using same mechanisms between RAM & Flash
§  Consider device specifics prior to use and design components /
interactions accordingly
§  Multi-threaded clients
§  Flash vs Mechanical
More Features
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.33
Elastic Data
§  Several platters & reading / writing heads
§  Faraday & Lorentz
§  Seek time + Rotational Latency = L
§  Can not get to the same speeds as the disk controllers
Mechanical Media
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.34
Elastic Data
§  NAND gates
§  MLC
§  Write in pages erase in blocks
§  No Seek Time or Rotational Latency
Flash Media
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.35
Elastic Data
JVM
handle *
handle *
handle *
handle *
handle *
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
0111011100110110101
0111010110100110101
handle *(disk pointer)
HARRIET *
HARVEY *
HILARY *
HILTON *
Handles are
stored in
process
Number of handles
constrains amount of
data that can be
stored
Key Management
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.36
Elastic Data
H
AR
RIET VEY
IL
ARY TON
HARRIET
HARVEY
HILARY
HILTON
1 2 4 8Tickets:
§  Data structure to hold handles
is a Binary Radix Tree
–  allows sharing
of common denominators
§  Handles (keys) are stored in
serialized form
§  Benefit increases as common bytes increase and less
heap memory is used
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.37
Elastic Data
§  Writes are journalled
§  Erase is a logical removal
§  Update = erase + write
§  Avoids seek time or cascading pointer
changes
write write
write
APPEND
APPEND
APPEND
erase
100111010
100111100
010111001
101111010
100111010
100111100
010111001
101111010
100111010
100111100
010111001
101111010
100111010
100111100
010111001
101111010
100111010
100111100
100111010
100111100
010111001
101111010
100111010
100111100
010111001
101111010
100111010
100111100
010111001
101111010
100111010
100111100
100111010
100111100
010111001
101111010
100111010
100111100
010111001
101111010
100111010
100111100
010111001
101111010
100111010
100111100
010111001
101111010
100111010
100111100
File 1
File 2
File 3
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.38
Elastic Data
§  Require a Journal Garbage Collector
–  Reclaim unreferenced memory
§  Evacuation process for each file
§  Eviction logically removes from Journal File
§  May enter an exhaustive mode
–  Synonymous to Full GC
1001110
100111100
010111001
101111010
100111101
110011000
1001110
100111100
010111001
101111010
100111101
110011000
1001110
100111100
010111001
101111010
100111101
110011000
1001110
100111100
010111001
101111010
100111101
110011000
1001110
100111100
010111001
101111010
100111101
110011000
1001110
100111100
010111001
101111010
100111101
110011000
1001110
100111100
010111001
101111010
100111101
110011000
Journal
Collector
sort
evacuate
(exhaustive-
mode)
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.39
Elastic Data
§  Dedicated threads to unblock writes
–  Tuned to device type
§  Client write appears to be as fast as
heap write
§  Overwhelming number of writes will
result in push back to the client
Preparer Buffer
Writer
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.40
Who am I?
§  Chris Neal, Systems Architect
§  Started with Pegasus in 1994
§  Worked with Coherence since 3.3.1 in 2007
§  Participate in Coherence CAB
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.41
Who is Pegasus Solutions?
§  Founded in 1988
§  Provide technology and services to hotels and travel distributors
§  Three main service areas:
–  Representation Services
–  Distribution Services
–  Central Reservation Services (CRS)
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.42
Distribution Services
–  Connects hotel systems with distribution partners.
–  100,000 hotels connected to all major distributors (Expedia, Orbitz, Hotwire, Travelocity, etc)
–  Cheaper and easier than a direct connect
–  If you book a hotel online, chances are your transaction goes through Pegasus.
–  Pegasus processes roughly 8 billion transactions per month… sustained ~3000TPS @200ms
latency or less.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.43
Why do we need a cache?
§  In travel agent days, “Look to book” ratios were 3:1
§  At internet scale, they are roughly 4200:1
–  Travel aggregators like Sidestep, Kayak, Mobissimo, etc burst this to >100,000:1
§  Looks are the most expensive transaction from systems processing perspective.
§  We make no money on “looks”, so saving resources by not processing these
transactions is important to both Pegasus and the downstream hotel systems.
§  …Hello Coherence….
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.44
Distribution Services and Coherence
Physical deployment:
–  Production cluster consists of 6 servers, 144GB RAM each.
–  Each server runs 3 storage enabled JVMs
–  Servers run Solaris x64
–  Each Hotspot JVM is 32GB
–  Using CMS collector, and having no GC pauses
–  The vast majority of the storage space is for AvailibilityCache (22GB) to service the “looks”
Client Applications
–  ~120 storage disabled clients either in containers or stand alone
–  Backing store is so large that NearCache is disabled
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.45
The problem…
§  The cache is too small. Empty to full in ~3 hours.
§  Evicting valuable, usable data
§  Cache hit rate is too low
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.46
The Challenge….
§  Increase the cache size on the current servers from 20M to 200M
§  Spend as little money as possible
§  Do it by EOY
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.47
The Process…
§  “spend as little as possible” means adding servers to reach 200M is not
possible, which means no more RAM.
§  Enter ElasticData
§  In terms of $, RAM > SSD > SATA, but is SATA fast enough?
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.48
iozone
§  Reads and writes a file to a filesystem as fast as it can
§  Compare SSD to SATA with regards to throughput
§  We know SSD is faster, but will SATA do?
www.iozone.org - Filesystem benchmark tool
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.49
SSD benchmark results
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.50
SATA benchmark results, part 1
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.51
SATA benchmark results, part 2
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.52
Elastic Data Hardware configuration
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.53
What we observe
§  At production volumes in RAM:
–  Avg get/put times ~2ms
§  At production volumes on SATA:
–  Avg put times ~3ms
–  Avg get times ~10ms
Benchmark data through the application
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.54
Hurdles We Have Overcome
§  Configuring the heap size:
–  Enough room to store keys for 1.6TB of objects, so that Full GCs do not
occur
–  Enough room to store puts() while partitions are being evicted
–  64GB (up to 72GB with G1)
§  Eviction killing throughput. Eviction process was reading the values from disk
at it evicted (for indexes and listeners). Behavior was changed with
BlindCompactSerializationCache.
§  Stopping a JVM: Transferring full partitions. Instead, drop the data, then
transfer (DropContentPartitionListener)
§  Starting a JVM: Rebalancing partitions to the new JVM.
(DropContentPartitionListener)
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.55
Configuring Coherence for Elastic Data
§  Operational overrides:
<journaling-config>
<ramjournal-manager>
<minimum-load-factor>.4</minimum-load-factor>
<maximum-size>8GB</maximum-size>
</ramjournal-manager>
<flashjournal-manager>
<minimum-load-factor>.7</minimum-load-factor>
<!-- 3.6TB filesystem size total / 2 JVMs is 1843GB each VM -->
<!-- That gives 511 files @ 3690MB each per JVM -->
<maximum-file-size>3690MB</maximum-file-size>
<collector-timeout>30m</collector-timeout>
<!-- 1600GB to force a more aggressive prune (same as high-units) -->
<high-journal-size>1600GB</high-journal-size>
</flashjournal-manager>
</journaling-config>
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.56
Configuring Coherence for Elastic Data
§  Cache-config.xml
<distributed-scheme>
<backup-count>0</backup-count>
<partition-listener>
<class-name>com.tangosol.net.partition.DropContentPartitionListener</class-name>
</partition-listener>
<backing-map-scheme>
<ramjournal-scheme>
<class-name>com.tangosol.net.cache.BlindCompactSerializationCache</class-name>
<high-units>1600KB</high-units>
<low-units>1400KB</low-units>
<unit-calculator>Binary</unit-calculator>
<unit-factor>1048576</unit-factor>
</ramjournal-scheme>
</backing-map-scheme>
</distributed-scheme>
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.57
Garbage Collection settings
argv[12]: -XX:+UseG1GC
argv[13]: -XX:MaxGCPauseMillis=800
argv[14]: -XX:ConcGCThreads=10
argv[15]: -XX:ParallelGCThreads=10
argv[16]: -XX:InitiatingHeapOccupancyPercent=25
argv[17]: -XX:NewRatio=16
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.58
Did we meet our goals
§  Goal: 500M cached objects
–  Actual: 1.6B cached objects
§  Goal: Spend as little as possible
–  Actual: Spent 1400/machine (8640 total), 84x more objects
§  Goal: Do it by EOY
–  Actual: On track for production release before EOY
So far, so good…
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.59
Graphic Section Divider
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.60

Más contenido relacionado

La actualidad más candente

5 here today still here tomorrow new technology for big_forever_archives
5 here today still here tomorrow new technology for big_forever_archives5 here today still here tomorrow new technology for big_forever_archives
5 here today still here tomorrow new technology for big_forever_archives
Dr. Wilfred Lin (Ph.D.)
 
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
DataWorks Summit
 

La actualidad más candente (20)

Ronald Vargas 18c cloud service el siguiente paso en la nueva generacion
Ronald Vargas 18c cloud service el siguiente paso en la nueva generacionRonald Vargas 18c cloud service el siguiente paso en la nueva generacion
Ronald Vargas 18c cloud service el siguiente paso en la nueva generacion
 
OOW19 - HOL5221
OOW19 - HOL5221OOW19 - HOL5221
OOW19 - HOL5221
 
Database@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenDatabase@Home : The Future is Data Driven
Database@Home : The Future is Data Driven
 
Coherence 12.1.2 Hidden Gems
Coherence 12.1.2 Hidden GemsCoherence 12.1.2 Hidden Gems
Coherence 12.1.2 Hidden Gems
 
Database-as-a-Service with Oracle Enterprise Manager Cloud Control 12c and Or...
Database-as-a-Service with Oracle Enterprise Manager Cloud Control 12c and Or...Database-as-a-Service with Oracle Enterprise Manager Cloud Control 12c and Or...
Database-as-a-Service with Oracle Enterprise Manager Cloud Control 12c and Or...
 
Oracle engineered systems executive presentation
Oracle engineered systems executive presentationOracle engineered systems executive presentation
Oracle engineered systems executive presentation
 
AWR and ASH in an EM12c World
AWR and ASH in an EM12c WorldAWR and ASH in an EM12c World
AWR and ASH in an EM12c World
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
 
Oracle Database Availability & Scalability Across Versions & Editions
Oracle Database Availability & Scalability Across Versions & EditionsOracle Database Availability & Scalability Across Versions & Editions
Oracle Database Availability & Scalability Across Versions & Editions
 
5 here today still here tomorrow new technology for big_forever_archives
5 here today still here tomorrow new technology for big_forever_archives5 here today still here tomorrow new technology for big_forever_archives
5 here today still here tomorrow new technology for big_forever_archives
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
 
Oracle database 12c_and_DevOps
Oracle database 12c_and_DevOpsOracle database 12c_and_DevOps
Oracle database 12c_and_DevOps
 
Batch Applications for Java Platform 1.0: Java EE 7 and GlassFish
Batch Applications for Java Platform 1.0: Java EE 7 and GlassFishBatch Applications for Java Platform 1.0: Java EE 7 and GlassFish
Batch Applications for Java Platform 1.0: Java EE 7 and GlassFish
 
Oracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningOracle GoldenGate Performance Tuning
Oracle GoldenGate Performance Tuning
 
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
 
Oracle RAC 19c with Standard Edition (SE) 2 - Support Update
Oracle RAC 19c with Standard Edition (SE) 2 - Support UpdateOracle RAC 19c with Standard Edition (SE) 2 - Support Update
Oracle RAC 19c with Standard Edition (SE) 2 - Support Update
 
JAX-RS 2.0: New and Noteworthy in RESTful Web services API at JAX London
JAX-RS 2.0: New and Noteworthy in RESTful Web services API at JAX LondonJAX-RS 2.0: New and Noteworthy in RESTful Web services API at JAX London
JAX-RS 2.0: New and Noteworthy in RESTful Web services API at JAX London
 
JAX-RS 2.0: What’s New in JSR 339 ?
JAX-RS 2.0: What’s New in JSR 339 ?JAX-RS 2.0: What’s New in JSR 339 ?
JAX-RS 2.0: What’s New in JSR 339 ?
 
Improve PostgreSQL replication with Oracle GoldenGate
Improve PostgreSQL replication with Oracle GoldenGateImprove PostgreSQL replication with Oracle GoldenGate
Improve PostgreSQL replication with Oracle GoldenGate
 
Websocket 1.0
Websocket 1.0Websocket 1.0
Websocket 1.0
 

Destacado

Κλασική εποχή β'
Κλασική εποχή β'Κλασική εποχή β'
Κλασική εποχή β'
varalig
 
WeB Početni sastanak - Igor Pandžić
WeB Početni sastanak - Igor PandžićWeB Početni sastanak - Igor Pandžić
WeB Početni sastanak - Igor Pandžić
komorabl
 
Ministarstvo prosvjete u kulture - Sanela Dojčinović
Ministarstvo prosvjete u kulture - Sanela DojčinovićMinistarstvo prosvjete u kulture - Sanela Dojčinović
Ministarstvo prosvjete u kulture - Sanela Dojčinović
komorabl
 
Zakonsko okruženje kao preduslov razvoja informacionog društva - Igor Pandžić
Zakonsko okruženje kao preduslov razvoja informacionog društva - Igor PandžićZakonsko okruženje kao preduslov razvoja informacionog društva - Igor Pandžić
Zakonsko okruženje kao preduslov razvoja informacionog društva - Igor Pandžić
komorabl
 
Caster111111111
Caster111111111Caster111111111
Caster111111111
caster21
 
Radna snaga u it industriji - Brano Vujičić
Radna snaga u it industriji - Brano VujičićRadna snaga u it industriji - Brano Vujičić
Radna snaga u it industriji - Brano Vujičić
komorabl
 
Primjena informaciono-komunikacionih tehnologija u privrednim društvima kao b...
Primjena informaciono-komunikacionih tehnologija u privrednim društvima kao b...Primjena informaciono-komunikacionih tehnologija u privrednim društvima kao b...
Primjena informaciono-komunikacionih tehnologija u privrednim društvima kao b...
komorabl
 
Zakonsko okruženje kao preduslov razvoja informacionog društva - Srđan Rajčević
Zakonsko okruženje kao preduslov razvoja informacionog društva - Srđan RajčevićZakonsko okruženje kao preduslov razvoja informacionog društva - Srđan Rajčević
Zakonsko okruženje kao preduslov razvoja informacionog društva - Srđan Rajčević
komorabl
 
ραψωδία Ζ στ. 369 465
ραψωδία Ζ στ. 369 465ραψωδία Ζ στ. 369 465
ραψωδία Ζ στ. 369 465
varalig
 
ICT prezentacija udruzenja
ICT prezentacija udruzenjaICT prezentacija udruzenja
ICT prezentacija udruzenja
komorabl
 

Destacado (20)

Banche centrali
Banche centraliBanche centrali
Banche centrali
 
Κλασική εποχή β'
Κλασική εποχή β'Κλασική εποχή β'
Κλασική εποχή β'
 
Samir Mesić prezentacija
Samir Mesić prezentacijaSamir Mesić prezentacija
Samir Mesić prezentacija
 
WeB Početni sastanak - Igor Pandžić
WeB Početni sastanak - Igor PandžićWeB Početni sastanak - Igor Pandžić
WeB Početni sastanak - Igor Pandžić
 
Ministarstvo prosvjete u kulture - Sanela Dojčinović
Ministarstvo prosvjete u kulture - Sanela DojčinovićMinistarstvo prosvjete u kulture - Sanela Dojčinović
Ministarstvo prosvjete u kulture - Sanela Dojčinović
 
Program Konferencije o IKT
Program Konferencije o IKTProgram Konferencije o IKT
Program Konferencije o IKT
 
Zakonsko okruženje kao preduslov razvoja informacionog društva - Igor Pandžić
Zakonsko okruženje kao preduslov razvoja informacionog društva - Igor PandžićZakonsko okruženje kao preduslov razvoja informacionog društva - Igor Pandžić
Zakonsko okruženje kao preduslov razvoja informacionog društva - Igor Pandžić
 
Od 09
Od 09Od 09
Od 09
 
Caster111111111
Caster111111111Caster111111111
Caster111111111
 
Radna snaga u it industriji - Brano Vujičić
Radna snaga u it industriji - Brano VujičićRadna snaga u it industriji - Brano Vujičić
Radna snaga u it industriji - Brano Vujičić
 
Primjena informaciono-komunikacionih tehnologija u privrednim društvima kao b...
Primjena informaciono-komunikacionih tehnologija u privrednim društvima kao b...Primjena informaciono-komunikacionih tehnologija u privrednim društvima kao b...
Primjena informaciono-komunikacionih tehnologija u privrednim društvima kao b...
 
Coherence 12.1.3 hidden gems
Coherence 12.1.3 hidden gemsCoherence 12.1.3 hidden gems
Coherence 12.1.3 hidden gems
 
Program Konferencije o IKT
Program Konferencije o IKTProgram Konferencije o IKT
Program Konferencije o IKT
 
Coherence 12.1.2 Live Events
Coherence 12.1.2 Live EventsCoherence 12.1.2 Live Events
Coherence 12.1.2 Live Events
 
Tasso di cambio reale
Tasso di cambio realeTasso di cambio reale
Tasso di cambio reale
 
Zakonsko okruženje kao preduslov razvoja informacionog društva - Srđan Rajčević
Zakonsko okruženje kao preduslov razvoja informacionog društva - Srđan RajčevićZakonsko okruženje kao preduslov razvoja informacionog društva - Srđan Rajčević
Zakonsko okruženje kao preduslov razvoja informacionog društva - Srđan Rajčević
 
Prezentacija east code
Prezentacija east codePrezentacija east code
Prezentacija east code
 
Igor Pandžić prezentacija
Igor Pandžić prezentacijaIgor Pandžić prezentacija
Igor Pandžić prezentacija
 
ραψωδία Ζ στ. 369 465
ραψωδία Ζ στ. 369 465ραψωδία Ζ στ. 369 465
ραψωδία Ζ στ. 369 465
 
ICT prezentacija udruzenja
ICT prezentacija udruzenjaICT prezentacija udruzenja
ICT prezentacija udruzenja
 

Similar a JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap or Even on Disk

Ebs troubleshooting con9019_pdf_9019_0001
Ebs troubleshooting con9019_pdf_9019_0001Ebs troubleshooting con9019_pdf_9019_0001
Ebs troubleshooting con9019_pdf_9019_0001
jucaab
 

Similar a JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap or Even on Disk (20)

GlassFish in Production Environments
GlassFish in Production EnvironmentsGlassFish in Production Environments
GlassFish in Production Environments
 
Oracle storage best of-breed, best for oracle
Oracle storage  best of-breed, best for oracleOracle storage  best of-breed, best for oracle
Oracle storage best of-breed, best for oracle
 
Multi-Tenancy: Da Teoria à Prática, do DB ao Middleware
Multi-Tenancy: Da Teoria à Prática, do DB ao MiddlewareMulti-Tenancy: Da Teoria à Prática, do DB ao Middleware
Multi-Tenancy: Da Teoria à Prática, do DB ao Middleware
 
Things learned from OpenWorld 2013
Things learned from OpenWorld 2013Things learned from OpenWorld 2013
Things learned from OpenWorld 2013
 
MySQL Document Store - A Document Store with all the benefts of a Transactona...
MySQL Document Store - A Document Store with all the benefts of a Transactona...MySQL Document Store - A Document Store with all the benefts of a Transactona...
MySQL Document Store - A Document Store with all the benefts of a Transactona...
 
Sparc solaris servers
Sparc solaris serversSparc solaris servers
Sparc solaris servers
 
Ebs troubleshooting con9019_pdf_9019_0001
Ebs troubleshooting con9019_pdf_9019_0001Ebs troubleshooting con9019_pdf_9019_0001
Ebs troubleshooting con9019_pdf_9019_0001
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
 
NoSQL and MySQL
NoSQL and MySQLNoSQL and MySQL
NoSQL and MySQL
 
Session 203 iouc summit database
Session 203 iouc summit databaseSession 203 iouc summit database
Session 203 iouc summit database
 
Best Practices for Interoperable XML Databinding with JAXB
Best Practices for Interoperable XML Databinding with JAXBBest Practices for Interoperable XML Databinding with JAXB
Best Practices for Interoperable XML Databinding with JAXB
 
Presentation oracle exalogic elastic cloud
Presentation   oracle exalogic elastic cloudPresentation   oracle exalogic elastic cloud
Presentation oracle exalogic elastic cloud
 
Tom Kyte and and Cary Milsap - 2013
Tom Kyte and and Cary Milsap - 2013Tom Kyte and and Cary Milsap - 2013
Tom Kyte and and Cary Milsap - 2013
 
Con8833 access at scale for hundreds of millions of users final
Con8833 access at scale for hundreds of millions of users   finalCon8833 access at scale for hundreds of millions of users   final
Con8833 access at scale for hundreds of millions of users final
 
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
 
Oracle NoSQL
Oracle NoSQLOracle NoSQL
Oracle NoSQL
 
The great 8 of ODA
The great 8 of ODAThe great 8 of ODA
The great 8 of ODA
 
Why_Oracle_Hardware.ppt
Why_Oracle_Hardware.pptWhy_Oracle_Hardware.ppt
Why_Oracle_Hardware.ppt
 
Exadata z pohledu zákazníka a novinky generace X8M - 1. část
Exadata z pohledu zákazníka a novinky generace X8M - 1. částExadata z pohledu zákazníka a novinky generace X8M - 1. část
Exadata z pohledu zákazníka a novinky generace X8M - 1. část
 
GLOC Keynote 2014 - In-memory
GLOC Keynote 2014 - In-memoryGLOC Keynote 2014 - In-memory
GLOC Keynote 2014 - In-memory
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap or Even on Disk

  • 1. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.1
  • 2. Use your current wisely Harvey Raja Oracle Chris Neal Pegasus
  • 3. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.3 §  Senior Engineer §  Oracle Coherence Introduction Harvey Raja
  • 4. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.4 §  Systems Architect §  Pegasus Solutions Introduction Chris Neal
  • 5. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.5 5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.
  • 6. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.6 What’s all this about? §  Big Data / Big Memory on a transistor diet §  Applications and conceived concerns §  Object Profiling §  Elastic Data §  Improvements
  • 7. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.7 Heap me up §  JVM manages our objects §  Understands Live Data –  References –  Free Lists new Object()
  • 8. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.8 Heap me up §  Two distinct regions of data locality –  Young Generation –  Old Generation §  Allows conscious distinction between: –  Long living objects –  Short lived objects
  • 9. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.9 Heap me up §  All memory allocations are against the same resource §  Why would it be any different? §  Provides means to access –  off-heap memory –  File Descriptors a.k.a. any resource
  • 10. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.10 Heap me up JVM 100% 0% 0%
  • 11. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.11 Applications & Their Objects §  Every application has very different uses of objects –  Size –  Scope §  Structures / Containers –  Structures {Containers} –  Containers {Structures}
  • 12. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.12 Application Object Profiles §  Ye olde faithful… Pet Store §  Short-lived objects –  Search Results §  Long-living objects –  Popular items JEE Pet Store JEE Pet Store JEE Pet Store JEE Pet Store Pet Store
  • 13. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.13 Application Object Profiles §  Foreign Exchange Position Keeping §  Aggregate Trades Values per currency pair §  Some currencies are a ‘busier’ §  Currencies may have varying SLAs FX Position Keeping T T T T USD-GBP USD-EUR T T T T T T T T T T T T
  • 14. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.14 What we know about applications? §  The application understands more about each object than the JVM –  Frequency –  Size §  Keeping everything in RAM is possible but is it efficient? §  Huge Leap between Object on heap & stored in DB -  Recency -  Custom characteristics of the object
  • 15. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.15 Can the JVM just make the right choice? §  The JVM would have to span multiple devices §  Non-Heap must be serialized §  Applications are diverse therefore to make generic decisions on object usage would likely lead to false-positives §  Down to the application or a layer above the JVM allowing users to define resource assignment policies
  • 16. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.16 Municipality §  The application deems its own usage of each resource §  JVM provides primitives to load & store to these devices §  May be useful to have an API that performs this storage appropriate to the device: –  Routing stores to appropriate device –  Handle concerns of multiple applications on the same JVM
  • 17. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.17 CPU Architectures §  As latency increases so does capacity §  Data fetched as required by instructions §  Data is demoted as well as writes to shared data rippling through the caches
  • 18. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.18 CPU Architectures Speed Capacity Registry 1ns 0.00 L1 Cache 2-5ns 2x32KB L2 Cache 5ns 256KB L3 Cache 20ns 8MB Main Memory 20-60ns 16GB Nehalem2GHZ processor Mayfair Kensington Camden Wimbledon Manchester §  Perhaps we should charge our objects rental premiums?
  • 19. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.19 Blank Canvas £600pcm £1800pcm £2400pcm £100000pcm £100 Per TimeUnit £500 Per TimeUnit £250 Per TimeUnit £1000 Per TimeUnit
  • 20. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.20 How do you choose the right property? §  How do you select the right property: –  How often are you in the office? –  How long does it take you to get into the office? –  How much space do you occupy in the office? –  How do you get to the office? –  MFU –  Device latency –  Object size –  Every device has its own quirks
  • 21. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.21 Translated to memory USD -> GBP USD-> JMD Heap RAM (off heap) Disk GBP -> MUR Usage Pet store example Pedigree Chum Orijen Dog Ugg boots
  • 22. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22 Translated to memory §  Objects held on heap are: –  Structures (Containers) –  Containers (Structures) §  Similar to a file descriptor, each object has its own metadata: –  Access Time –  Modified Time §  With ((Map) Containers) we already have a location to store metadata -  Size -  Touch Count
  • 23. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.23 Translated to memory CPU Heap NIO Flash Mechanical Disk §  Each object has metadata §  Some policy can manage these objects §  Demoted & promoted to the various media types §  Big jump from heap to NIO
  • 24. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.24 Brainstorm Summary §  Cherish your high commodity investments §  Reduce the regularity of going to a highly contended foreign resource §  Would be ideal to have objects float between high latency resources using a telepathic API –  Having some metadata could drive our decision for data locality §  Map provides a nice abstraction for objects that should ripple
  • 25. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.25 Challenges §  Serialization cost §  Generally interactions are performed against the object form §  False Positives §  Device type peculiarities §  A handle to the object (key) and metadata must be held on heap
  • 26. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.26 Elastic Data §  A Feature in Coherence §  Store binary key and value objects in RAM or Disk §  Overflow from RAM to Disk §  RAM can be configured as NIO store(byte[] key, byte[] val) load(byte[] key) erase(byte[] key) RAM Flash
  • 27. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.27 Elastic Data §  Its simply writing a number of bytes to some stream? §  How do you maintain handles? §  Need a pointer to the written data? §  How about updates, seek & replace? Easy Peasy
  • 28. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.28 Elastic Data §  Require a compact structure to hold handles (keys) to device pointers §  Journal writes to the file system §  Consistent API regardless of write to RAM, NIO, Flash or Mechanical Disk §  Buffer writes with thread dedicated to writing Implementation Details
  • 29. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.29 Elastic Data FlashJournal RAMJournal Collector Overflow Preparer Writer Binary Key Pointer 00110101001 11001001011 ... ... Serialize 110101 LFU Collector Store Index Object deemed unworthy of Heap store(key, value)Pointer returned Create
  • 30. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.30 Elastic Data FlashJournal RAMJournal Collector Overflow Preparer Writer Binary Key Pointer 00110101001 11001001011 ... ... Collector Store Index Deserialize 110101 Deserialized object returned load(pointer)binary value The binary key provides the physical location of the stored item Read
  • 31. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.31 Elastic Data Elastic Data 100% 100% 100% Utilization
  • 32. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.32 Elastic Data §  RAM Journal can be used with NIO ByteBuffers –  Memory managed using same mechanisms between RAM & Flash §  Consider device specifics prior to use and design components / interactions accordingly §  Multi-threaded clients §  Flash vs Mechanical More Features
  • 33. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.33 Elastic Data §  Several platters & reading / writing heads §  Faraday & Lorentz §  Seek time + Rotational Latency = L §  Can not get to the same speeds as the disk controllers Mechanical Media
  • 34. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.34 Elastic Data §  NAND gates §  MLC §  Write in pages erase in blocks §  No Seek Time or Rotational Latency Flash Media
  • 35. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.35 Elastic Data JVM handle * handle * handle * handle * handle * 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 0111011100110110101 0111010110100110101 handle *(disk pointer) HARRIET * HARVEY * HILARY * HILTON * Handles are stored in process Number of handles constrains amount of data that can be stored Key Management
  • 36. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.36 Elastic Data H AR RIET VEY IL ARY TON HARRIET HARVEY HILARY HILTON 1 2 4 8Tickets: §  Data structure to hold handles is a Binary Radix Tree –  allows sharing of common denominators §  Handles (keys) are stored in serialized form §  Benefit increases as common bytes increase and less heap memory is used
  • 37. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.37 Elastic Data §  Writes are journalled §  Erase is a logical removal §  Update = erase + write §  Avoids seek time or cascading pointer changes write write write APPEND APPEND APPEND erase 100111010 100111100 010111001 101111010 100111010 100111100 010111001 101111010 100111010 100111100 010111001 101111010 100111010 100111100 010111001 101111010 100111010 100111100 100111010 100111100 010111001 101111010 100111010 100111100 010111001 101111010 100111010 100111100 010111001 101111010 100111010 100111100 100111010 100111100 010111001 101111010 100111010 100111100 010111001 101111010 100111010 100111100 010111001 101111010 100111010 100111100 010111001 101111010 100111010 100111100 File 1 File 2 File 3
  • 38. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.38 Elastic Data §  Require a Journal Garbage Collector –  Reclaim unreferenced memory §  Evacuation process for each file §  Eviction logically removes from Journal File §  May enter an exhaustive mode –  Synonymous to Full GC 1001110 100111100 010111001 101111010 100111101 110011000 1001110 100111100 010111001 101111010 100111101 110011000 1001110 100111100 010111001 101111010 100111101 110011000 1001110 100111100 010111001 101111010 100111101 110011000 1001110 100111100 010111001 101111010 100111101 110011000 1001110 100111100 010111001 101111010 100111101 110011000 1001110 100111100 010111001 101111010 100111101 110011000 Journal Collector sort evacuate (exhaustive- mode)
  • 39. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.39 Elastic Data §  Dedicated threads to unblock writes –  Tuned to device type §  Client write appears to be as fast as heap write §  Overwhelming number of writes will result in push back to the client Preparer Buffer Writer
  • 40. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.40 Who am I? §  Chris Neal, Systems Architect §  Started with Pegasus in 1994 §  Worked with Coherence since 3.3.1 in 2007 §  Participate in Coherence CAB
  • 41. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.41 Who is Pegasus Solutions? §  Founded in 1988 §  Provide technology and services to hotels and travel distributors §  Three main service areas: –  Representation Services –  Distribution Services –  Central Reservation Services (CRS)
  • 42. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.42 Distribution Services –  Connects hotel systems with distribution partners. –  100,000 hotels connected to all major distributors (Expedia, Orbitz, Hotwire, Travelocity, etc) –  Cheaper and easier than a direct connect –  If you book a hotel online, chances are your transaction goes through Pegasus. –  Pegasus processes roughly 8 billion transactions per month… sustained ~3000TPS @200ms latency or less.
  • 43. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.43 Why do we need a cache? §  In travel agent days, “Look to book” ratios were 3:1 §  At internet scale, they are roughly 4200:1 –  Travel aggregators like Sidestep, Kayak, Mobissimo, etc burst this to >100,000:1 §  Looks are the most expensive transaction from systems processing perspective. §  We make no money on “looks”, so saving resources by not processing these transactions is important to both Pegasus and the downstream hotel systems. §  …Hello Coherence….
  • 44. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.44 Distribution Services and Coherence Physical deployment: –  Production cluster consists of 6 servers, 144GB RAM each. –  Each server runs 3 storage enabled JVMs –  Servers run Solaris x64 –  Each Hotspot JVM is 32GB –  Using CMS collector, and having no GC pauses –  The vast majority of the storage space is for AvailibilityCache (22GB) to service the “looks” Client Applications –  ~120 storage disabled clients either in containers or stand alone –  Backing store is so large that NearCache is disabled
  • 45. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.45 The problem… §  The cache is too small. Empty to full in ~3 hours. §  Evicting valuable, usable data §  Cache hit rate is too low
  • 46. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.46 The Challenge…. §  Increase the cache size on the current servers from 20M to 200M §  Spend as little money as possible §  Do it by EOY
  • 47. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.47 The Process… §  “spend as little as possible” means adding servers to reach 200M is not possible, which means no more RAM. §  Enter ElasticData §  In terms of $, RAM > SSD > SATA, but is SATA fast enough?
  • 48. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.48 iozone §  Reads and writes a file to a filesystem as fast as it can §  Compare SSD to SATA with regards to throughput §  We know SSD is faster, but will SATA do? www.iozone.org - Filesystem benchmark tool
  • 49. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.49 SSD benchmark results
  • 50. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.50 SATA benchmark results, part 1
  • 51. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.51 SATA benchmark results, part 2
  • 52. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.52 Elastic Data Hardware configuration
  • 53. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.53 What we observe §  At production volumes in RAM: –  Avg get/put times ~2ms §  At production volumes on SATA: –  Avg put times ~3ms –  Avg get times ~10ms Benchmark data through the application
  • 54. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.54 Hurdles We Have Overcome §  Configuring the heap size: –  Enough room to store keys for 1.6TB of objects, so that Full GCs do not occur –  Enough room to store puts() while partitions are being evicted –  64GB (up to 72GB with G1) §  Eviction killing throughput. Eviction process was reading the values from disk at it evicted (for indexes and listeners). Behavior was changed with BlindCompactSerializationCache. §  Stopping a JVM: Transferring full partitions. Instead, drop the data, then transfer (DropContentPartitionListener) §  Starting a JVM: Rebalancing partitions to the new JVM. (DropContentPartitionListener)
  • 55. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.55 Configuring Coherence for Elastic Data §  Operational overrides: <journaling-config> <ramjournal-manager> <minimum-load-factor>.4</minimum-load-factor> <maximum-size>8GB</maximum-size> </ramjournal-manager> <flashjournal-manager> <minimum-load-factor>.7</minimum-load-factor> <!-- 3.6TB filesystem size total / 2 JVMs is 1843GB each VM --> <!-- That gives 511 files @ 3690MB each per JVM --> <maximum-file-size>3690MB</maximum-file-size> <collector-timeout>30m</collector-timeout> <!-- 1600GB to force a more aggressive prune (same as high-units) --> <high-journal-size>1600GB</high-journal-size> </flashjournal-manager> </journaling-config>
  • 56. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.56 Configuring Coherence for Elastic Data §  Cache-config.xml <distributed-scheme> <backup-count>0</backup-count> <partition-listener> <class-name>com.tangosol.net.partition.DropContentPartitionListener</class-name> </partition-listener> <backing-map-scheme> <ramjournal-scheme> <class-name>com.tangosol.net.cache.BlindCompactSerializationCache</class-name> <high-units>1600KB</high-units> <low-units>1400KB</low-units> <unit-calculator>Binary</unit-calculator> <unit-factor>1048576</unit-factor> </ramjournal-scheme> </backing-map-scheme> </distributed-scheme>
  • 57. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.57 Garbage Collection settings argv[12]: -XX:+UseG1GC argv[13]: -XX:MaxGCPauseMillis=800 argv[14]: -XX:ConcGCThreads=10 argv[15]: -XX:ParallelGCThreads=10 argv[16]: -XX:InitiatingHeapOccupancyPercent=25 argv[17]: -XX:NewRatio=16
  • 58. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.58 Did we meet our goals §  Goal: 500M cached objects –  Actual: 1.6B cached objects §  Goal: Spend as little as possible –  Actual: Spent 1400/machine (8640 total), 84x more objects §  Goal: Do it by EOY –  Actual: On track for production release before EOY So far, so good…
  • 59. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.59 Graphic Section Divider
  • 60. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.60