SlideShare a Scribd company logo
1 of 36
Download to read offline
High performant in-
memory datasets
with Netflix Hollow
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 1
Who am I?
Roberto Perez Alcolea
• Mexican
• Streaming Platform @ Target
• Groovy Enthusiast
• roberto@perezalcolea.info
• @rpalcolea
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 2
Dataset Distribution
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 3
The problem
• Dissemination of small or moderately sized data sets (no
big-data)
Common approaches:
• Sending data to a data store (RDMS, NoSQL)
• Serializing and keeping a local copy
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 4
Is there a solution?
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 5
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 6
What is Hollow?
• Java library and toolset for disseminating in-memory
datasets from a single producer to many consumers
Goals:
• Maximum development agility
• Highly optimized performance and resource
management
• Extreme stability and reliability
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 7
How it works
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 8
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 9
Key concepts
• Blob / Blob Store
• Snapshot/Delta
• Type
• Deduplication
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 10
Key concepts
• State Engine
• Ordinal
• unique identifier of the record within a type
• sufficient to locate the record within a type
• Immutability
• Cycle
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 11
Hollow Producer
• A single machine that retrieves all data from a source of
truth and produces a delta chain
• Encapsulates the details of compacting, publishing,
announcing, validating, and (if necessary) rollback of data
states
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 12
Hollow Producer
class MyProducer {
HollowProducer buildProducer() {
...
HollowProducer
.withPublisher(publisher) /// required: a BlobPublisher
.withAnnouncer(announcer) /// optional: an Announcer
.withValidators(validators) /// optional: one or more Validator
.withListeners(listeners) /// optional: one or more HollowProducerListeners
.withBlobStagingDir(dir) /// optional: a java.io.File
.withBlobCompressor(compressor) /// optional: a BlobCompressor
.withBlobStager(stager) /// optional: a BlobStager
.withSnapshotPublishExecutor(e) /// optional: a java.util.concurrent.Executor
.withNumStatesBetweenSnapshots(n) /// optional: an int
.withTargetMaxTypeShardSize(size) /// optional: a long
.build()
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 13
Hollow Producer - Capabilities
• Restore at Startup
• Rolling Back
• Validating Data
• Compacting Data
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 14
Hollow Consumer
• Encapsulates the details of initializing and keeping a dataset
up to date
• At initialization time, loads snapshot
• After initialization time, keeps a local copy of the dataset
current by applying delta transitions
• Each time a new version is announced, triggerRefresh()
should be called on the HollowConsumer
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 15
Hollow Consumer
class MyConsumer {
HollowConsumer buildConsumer() {
...
HollowConsumer
.withBlobRetriever(blobRetriever) /// required: a BlobRetriever
.withLocalBlobStore(localDiskDir) /// optional: a local disk location
.withAnnouncementWatcher(announcementWatcher) /// optional: a AnnouncementWatcher
.withRefreshListener(refreshListener) /// optional: a RefreshListener
.withGeneratedAPIClass(MyGeneratedAPI.class) /// optional: a generated client API class
.withFilterConfig(filterConfig) /// optional: a HollowFilterConfig
.withDoubleSnapshotConfig(doubleSnapshotCfg) /// optional: a DoubleSnapshotConfig
.withObjectLongevityConfig(objectLongevityCfg) /// optional: an ObjectLongevityConfig
.withObjectLongevityDetector(detector) /// optional: an ObjectLongevityDetector
.withRefreshExecutor(refreshExecutor) /// optional: an Executor
.build()
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 16
Hollow Consumer API Generation
• We can initialize the data model using our POJOs
HollowAPIGenerator generator = new HollowAPIGenerator.Builder().withAPIClassname("BooksAPI")
.withPackageName("io.perezalcolea.hollow.api")
.withDataModel(Book.class)
.withDestination("./data-model/src/main/java")
.build();
generator.generateSourceFiles();
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 17
Insight Tools
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 18
Hollow Explorer
• UI which can be used to browse and search records within
any dataset
class MyExplorer {
void startExplorer() {
//Initialize consumer
HollowConsumer hollowConsumer = HollowConsumerBuilder.build("publish-dir", BooksAPI.class)
hollowConsumer.triggerRefresh()
HollowExplorerUIServer server = new HollowExplorerUIServer(hollowConsumer, 8080)
server.start()
server.join()
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 19
Hollow History
• UI which can be used to browse and search changes in a
dataset over time
class MyHistory {
void startHistory() {
//Initialize consumer
HollowConsumer hollowConsumer = HollowConsumerBuilder.build("publish-dir", BooksAPI.class)
hollowConsumer.triggerRefresh()
HollowHistoryUIServer server = new HollowHistoryUIServer(hollowConsumer, 8090)
server.start()
server.join()
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 20
Heap Usage Analysis
• Given a loaded HollowReadStateEngine, it is possible to
iterate over each type and gather statistics about its
approximate heap usage
class MyMemoryInsight {
void printMemoryUsage() {
HollowReadStateEngine stateEngine = consumer.getStateEngine()
long totalApproximateHeapFootprint = 0
for(HollowTypeReadState typeState : stateEngine.typeStates) {
String typeName = typeState.schema.name
long heapCost = typeState.approximateHeapFootprintInBytes
println(typeName + ": " + heapCost);
totalApproximateHeapFootprint += heapCost
}
println("TOTAL: " + totalApproximateHeapFootprint)
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 21
Other tools
• Metrics Collector
• Blob Storage Cleaner
• Filtering
• Combining/Splitting
• Patching
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 22
Interacting with a
Hollow Dataset
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 23
Sample model
@HollowPrimaryKey(fields={"bookId"})
public class Book {
long bookId;
String title;
String isbn;
String publisher;
String language;
Set<Author> authors;
}
public class Author {
long id;
String authorName;
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 24
Indexing/Querying
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 25
Default Primary Keys
• Each type in our data model gets a custom index class
called <typename>PrimaryKeyIndex
• Backed by Hollow Consumer
• Will automatically stay up-to-date as your dataset updates
class MyPrimaryKeyIndex {
Book findBook(long bookId) {
HollowConsumer hollowConsumer = //my consumer
BookPrimaryKeyIndex bookPrimaryKeyIndex = new BookPrimaryKeyIndex(consumer)
Book book = bookPrimaryKeyIndex.findMatch(bookId)
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 26
Consumer-specified Primary Keys
• A primary key index is not restricted to just default primary
keys
class MyPrimaryKeyIndex {
Book findAuthor(long authorId) {
HollowConsumer hollowConsumer = //my consumer
AuthorPrimaryKeyIndex authorPrimaryKeyIndex = new AuthorPrimaryKeyIndex(consumer, "id")
Author author = authorPrimaryKeyIndex.findMatch(authorId)
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 27
Hash Index
• Records based on keys for which there is not a one-to-one
mapping between records and key values
• Must specify each of a query type, a select field, and one or
more match fields
class MyHashIndex {
Iterable<Book> findBooksByPublisher(String publisher) {
HollowConsumer hollowConsumer = //my consumer
BooksAPIHashIndex publisherHashIndex = new BooksAPIHashIndex(consumer, "Book", "", "publisher.value")
return publisherHashIndex.findMatch(publisher)
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 28
Data Modeling
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 29
Primary Keys
• @HollowPrimaryKey annotation
• Provide a shortcut when creating a primary key index
@HollowPrimaryKey(fields={"bookId"})
public class Book {
long bookId;
String title;
String isbn;
String publisher;
String language;
Set<Author> authors;
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 30
Inlined vs Referenced Fields
• Inline: fields that they are no longer REFERENCE fields, but
instead encode their data directly in each record
@HollowPrimaryKey(fields={"bookId"})
public class Book {
long bookId;
@HollowInline
String title;
String isbn;
String publisher;
String language;
Set<Author> authors;
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 31
Namespaced Record Type Names
• Fields with like values may reference the same record type,
but reference fields of the same primitive type elsewhere in
the data model use different record types
• Types can be filtered
@HollowPrimaryKey(fields={"bookId"})
public class Book {
long bookId;
...
@HollowTypeName(name="Publisher")
String publisher;
@HollowTypeName(name="Language")
String language;
...
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 32
Maintaining Backwards
Compatibility
• Adding a new type
• Removing an existing type
• Adding a new field to an existing type
• Removing an existing field from an existing type.
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 33
Demo
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 34
Useful links
https://hollow.how/
https://github.com/Netflix/hollow-reference-implementation
https://github.com/Netflix/hollow
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 35
Thank
YouRoberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 36

More Related Content

What's hot

Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaFlink Forward
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016Eric Sammer
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...HostedbyConfluent
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
 
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsMichael Stack
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computingTimothy Spann
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationShivji Kumar Jha
 
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...Michael Stack
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per DayAnkur Bansal
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggleconfluent
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingMichael Rainey
 
Streaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka StreamsStreaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka Streamsvito jeng
 

What's hot (20)

Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
 
Streaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka StreamsStreaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka Streams
 

Similar to High performant in-memory datasets with Netflix Hollow

The State of containerd
The State of containerdThe State of containerd
The State of containerdMoby Project
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaInfluxData
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Issac Buenrostro
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on KubernetesJoerg Henning
 
Containerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimeContainerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimePhil Estes
 
Architectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetesArchitectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetesRafał Leszko
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at BristechJulien Nioche
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKData Con LA
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentationLaura Steggles
 
Meetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on KubernetesMeetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on Kubernetesdtoledo67
 
Node.js and the MySQL Document Store
Node.js and the MySQL Document StoreNode.js and the MySQL Document Store
Node.js and the MySQL Document StoreRui Quelhas
 
Containerd internals: building a core container runtime
Containerd internals: building a core container runtimeContainerd internals: building a core container runtime
Containerd internals: building a core container runtimeDocker, Inc.
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
Intro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionIntro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionInfluxData
 
The state of containerd
The state of containerdThe state of containerd
The state of containerdDocker, Inc.
 
Tips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsTips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsAshish Thapliyal
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent
 

Similar to High performant in-memory datasets with Netflix Hollow (20)

The State of containerd
The State of containerdThe State of containerd
The State of containerd
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Containerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimeContainerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container Runtime
 
Architectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetesArchitectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetes
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentation
 
Meetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on KubernetesMeetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on Kubernetes
 
Node.js and the MySQL Document Store
Node.js and the MySQL Document StoreNode.js and the MySQL Document Store
Node.js and the MySQL Document Store
 
Containerd internals: building a core container runtime
Containerd internals: building a core container runtimeContainerd internals: building a core container runtime
Containerd internals: building a core container runtime
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Intro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionIntro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly Detection
 
The state of containerd
The state of containerdThe state of containerd
The state of containerd
 
Tips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsTips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight Deployments
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
K8s vs Cloud Foundry
K8s vs Cloud FoundryK8s vs Cloud Foundry
K8s vs Cloud Foundry
 

More from Roberto Pérez Alcolea

Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...Roberto Pérez Alcolea
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...Roberto Pérez Alcolea
 
DPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
DPE Summit - A More Integrated Build and CI to Accelerate Builds at NetflixDPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
DPE Summit - A More Integrated Build and CI to Accelerate Builds at NetflixRoberto Pérez Alcolea
 
Dependency Management in a Complex World (JConf Chicago 2022)
Dependency Management in a Complex World (JConf Chicago 2022)Dependency Management in a Complex World (JConf Chicago 2022)
Dependency Management in a Complex World (JConf Chicago 2022)Roberto Pérez Alcolea
 
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)Roberto Pérez Alcolea
 
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Roberto Pérez Alcolea
 
Dependency Management at Scale @ JConf Centroamérica 2020
Dependency Management at Scale @ JConf Centroamérica 2020Dependency Management at Scale @ JConf Centroamérica 2020
Dependency Management at Scale @ JConf Centroamérica 2020Roberto Pérez Alcolea
 
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and Hystrix
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and HystrixGR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and Hystrix
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and HystrixRoberto Pérez Alcolea
 

More from Roberto Pérez Alcolea (10)

Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
 
DPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
DPE Summit - A More Integrated Build and CI to Accelerate Builds at NetflixDPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
DPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
 
Dependency Management in a Complex World (JConf Chicago 2022)
Dependency Management in a Complex World (JConf Chicago 2022)Dependency Management in a Complex World (JConf Chicago 2022)
Dependency Management in a Complex World (JConf Chicago 2022)
 
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)
 
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
 
Dependency Management at Scale @ JConf Centroamérica 2020
Dependency Management at Scale @ JConf Centroamérica 2020Dependency Management at Scale @ JConf Centroamérica 2020
Dependency Management at Scale @ JConf Centroamérica 2020
 
Dependency Management at Scale
Dependency Management at ScaleDependency Management at Scale
Dependency Management at Scale
 
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and Hystrix
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and HystrixGR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and Hystrix
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and Hystrix
 

Recently uploaded

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 

Recently uploaded (20)

Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 

High performant in-memory datasets with Netflix Hollow

  • 1. High performant in- memory datasets with Netflix Hollow Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 1
  • 2. Who am I? Roberto Perez Alcolea • Mexican • Streaming Platform @ Target • Groovy Enthusiast • roberto@perezalcolea.info • @rpalcolea Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 2
  • 3. Dataset Distribution Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 3
  • 4. The problem • Dissemination of small or moderately sized data sets (no big-data) Common approaches: • Sending data to a data store (RDMS, NoSQL) • Serializing and keeping a local copy Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 4
  • 5. Is there a solution? Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 5
  • 6. Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 6
  • 7. What is Hollow? • Java library and toolset for disseminating in-memory datasets from a single producer to many consumers Goals: • Maximum development agility • Highly optimized performance and resource management • Extreme stability and reliability Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 7
  • 8. How it works Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 8
  • 9. Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 9
  • 10. Key concepts • Blob / Blob Store • Snapshot/Delta • Type • Deduplication Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 10
  • 11. Key concepts • State Engine • Ordinal • unique identifier of the record within a type • sufficient to locate the record within a type • Immutability • Cycle Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 11
  • 12. Hollow Producer • A single machine that retrieves all data from a source of truth and produces a delta chain • Encapsulates the details of compacting, publishing, announcing, validating, and (if necessary) rollback of data states Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 12
  • 13. Hollow Producer class MyProducer { HollowProducer buildProducer() { ... HollowProducer .withPublisher(publisher) /// required: a BlobPublisher .withAnnouncer(announcer) /// optional: an Announcer .withValidators(validators) /// optional: one or more Validator .withListeners(listeners) /// optional: one or more HollowProducerListeners .withBlobStagingDir(dir) /// optional: a java.io.File .withBlobCompressor(compressor) /// optional: a BlobCompressor .withBlobStager(stager) /// optional: a BlobStager .withSnapshotPublishExecutor(e) /// optional: a java.util.concurrent.Executor .withNumStatesBetweenSnapshots(n) /// optional: an int .withTargetMaxTypeShardSize(size) /// optional: a long .build() } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 13
  • 14. Hollow Producer - Capabilities • Restore at Startup • Rolling Back • Validating Data • Compacting Data Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 14
  • 15. Hollow Consumer • Encapsulates the details of initializing and keeping a dataset up to date • At initialization time, loads snapshot • After initialization time, keeps a local copy of the dataset current by applying delta transitions • Each time a new version is announced, triggerRefresh() should be called on the HollowConsumer Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 15
  • 16. Hollow Consumer class MyConsumer { HollowConsumer buildConsumer() { ... HollowConsumer .withBlobRetriever(blobRetriever) /// required: a BlobRetriever .withLocalBlobStore(localDiskDir) /// optional: a local disk location .withAnnouncementWatcher(announcementWatcher) /// optional: a AnnouncementWatcher .withRefreshListener(refreshListener) /// optional: a RefreshListener .withGeneratedAPIClass(MyGeneratedAPI.class) /// optional: a generated client API class .withFilterConfig(filterConfig) /// optional: a HollowFilterConfig .withDoubleSnapshotConfig(doubleSnapshotCfg) /// optional: a DoubleSnapshotConfig .withObjectLongevityConfig(objectLongevityCfg) /// optional: an ObjectLongevityConfig .withObjectLongevityDetector(detector) /// optional: an ObjectLongevityDetector .withRefreshExecutor(refreshExecutor) /// optional: an Executor .build() } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 16
  • 17. Hollow Consumer API Generation • We can initialize the data model using our POJOs HollowAPIGenerator generator = new HollowAPIGenerator.Builder().withAPIClassname("BooksAPI") .withPackageName("io.perezalcolea.hollow.api") .withDataModel(Book.class) .withDestination("./data-model/src/main/java") .build(); generator.generateSourceFiles(); Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 17
  • 18. Insight Tools Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 18
  • 19. Hollow Explorer • UI which can be used to browse and search records within any dataset class MyExplorer { void startExplorer() { //Initialize consumer HollowConsumer hollowConsumer = HollowConsumerBuilder.build("publish-dir", BooksAPI.class) hollowConsumer.triggerRefresh() HollowExplorerUIServer server = new HollowExplorerUIServer(hollowConsumer, 8080) server.start() server.join() } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 19
  • 20. Hollow History • UI which can be used to browse and search changes in a dataset over time class MyHistory { void startHistory() { //Initialize consumer HollowConsumer hollowConsumer = HollowConsumerBuilder.build("publish-dir", BooksAPI.class) hollowConsumer.triggerRefresh() HollowHistoryUIServer server = new HollowHistoryUIServer(hollowConsumer, 8090) server.start() server.join() } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 20
  • 21. Heap Usage Analysis • Given a loaded HollowReadStateEngine, it is possible to iterate over each type and gather statistics about its approximate heap usage class MyMemoryInsight { void printMemoryUsage() { HollowReadStateEngine stateEngine = consumer.getStateEngine() long totalApproximateHeapFootprint = 0 for(HollowTypeReadState typeState : stateEngine.typeStates) { String typeName = typeState.schema.name long heapCost = typeState.approximateHeapFootprintInBytes println(typeName + ": " + heapCost); totalApproximateHeapFootprint += heapCost } println("TOTAL: " + totalApproximateHeapFootprint) } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 21
  • 22. Other tools • Metrics Collector • Blob Storage Cleaner • Filtering • Combining/Splitting • Patching Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 22
  • 23. Interacting with a Hollow Dataset Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 23
  • 24. Sample model @HollowPrimaryKey(fields={"bookId"}) public class Book { long bookId; String title; String isbn; String publisher; String language; Set<Author> authors; } public class Author { long id; String authorName; } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 24
  • 25. Indexing/Querying Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 25
  • 26. Default Primary Keys • Each type in our data model gets a custom index class called <typename>PrimaryKeyIndex • Backed by Hollow Consumer • Will automatically stay up-to-date as your dataset updates class MyPrimaryKeyIndex { Book findBook(long bookId) { HollowConsumer hollowConsumer = //my consumer BookPrimaryKeyIndex bookPrimaryKeyIndex = new BookPrimaryKeyIndex(consumer) Book book = bookPrimaryKeyIndex.findMatch(bookId) } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 26
  • 27. Consumer-specified Primary Keys • A primary key index is not restricted to just default primary keys class MyPrimaryKeyIndex { Book findAuthor(long authorId) { HollowConsumer hollowConsumer = //my consumer AuthorPrimaryKeyIndex authorPrimaryKeyIndex = new AuthorPrimaryKeyIndex(consumer, "id") Author author = authorPrimaryKeyIndex.findMatch(authorId) } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 27
  • 28. Hash Index • Records based on keys for which there is not a one-to-one mapping between records and key values • Must specify each of a query type, a select field, and one or more match fields class MyHashIndex { Iterable<Book> findBooksByPublisher(String publisher) { HollowConsumer hollowConsumer = //my consumer BooksAPIHashIndex publisherHashIndex = new BooksAPIHashIndex(consumer, "Book", "", "publisher.value") return publisherHashIndex.findMatch(publisher) } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 28
  • 29. Data Modeling Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 29
  • 30. Primary Keys • @HollowPrimaryKey annotation • Provide a shortcut when creating a primary key index @HollowPrimaryKey(fields={"bookId"}) public class Book { long bookId; String title; String isbn; String publisher; String language; Set<Author> authors; } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 30
  • 31. Inlined vs Referenced Fields • Inline: fields that they are no longer REFERENCE fields, but instead encode their data directly in each record @HollowPrimaryKey(fields={"bookId"}) public class Book { long bookId; @HollowInline String title; String isbn; String publisher; String language; Set<Author> authors; } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 31
  • 32. Namespaced Record Type Names • Fields with like values may reference the same record type, but reference fields of the same primitive type elsewhere in the data model use different record types • Types can be filtered @HollowPrimaryKey(fields={"bookId"}) public class Book { long bookId; ... @HollowTypeName(name="Publisher") String publisher; @HollowTypeName(name="Language") String language; ... } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 32
  • 33. Maintaining Backwards Compatibility • Adding a new type • Removing an existing type • Adding a new field to an existing type • Removing an existing field from an existing type. Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 33
  • 34. Demo Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 34
  • 36. Thank YouRoberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 36