SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Building a High-
Performance Database with
Scala, Akka, and Spark
Evan Chan
November 2017
Who am I
User and contributor to Spark since 0.9,
Cassandra since 0.6
Created Spark Job Server and FiloDB
Talks at Spark Summit, Cassandra Summit, Strata,
Scala Days, etc.
http://velvia.github.io/
Why Build a New
Streaming Database?
Needs
• Ingest HUGE streams of events — IoT etc.
• Real-time, low latency, and somewhat flexible queries
• Dashboards, quick answers on new data
• Flexible schemas and query patterns
• Keep your streaming pipeline super simple
• Streaming = hardest to debug. Simplicity rules!
Message
Queue
Events
Stream
Processing
Layer
State /
Database
Happy
Users
Spark + HDFS Streaming
Kafka
Spark
Streaming
Many small files
(microbatches)
Dedup,
consolidate
job
Larger efficient
files
• High latency
• Big impedance mismatch between streaming
systems and a file system designed for big blobs
of data
Cassandra?
• Ingest HUGE streams of events — IoT etc.
• C* is not efficient for writing raw events
• Real-time, low latency, and somewhat flexible queries
• C* is real-time, but only low latency for simple
lookups. Add Spark => much higher latency
• Flexible schemas and query patterns
• C* only handles simple lookups
Introducing FiloDB
A distributed, columnar time-series/event database.
Built for streaming.
http://www.github.com/filodb/FiloDB
Message
Queue
Events
Spark
Streaming
Short term
storage, K-V
Adhoc,
SQL, ML
Cassandra
FiloDB: Events,
ad-hoc, batch
Spark
Dashboa
rds,
maps
100% Reactive
• Scala
• Akka Cluster
• Spark
• Monix / Reactive Streams
• Typesafe Config for all configuration
• Scodec, Ficus, Enumeratum, Scalactic, etc.
• Even most of the performance critical parts are written in Scala
:)
Scala, Akka, and
Spark for Database
Why use Scala and Akka?
• Akka Cluster!
• Just the right abstractions - streams, futures,
Akka, type safety….
• Failure handling and supervision are critical for
databases
• All the pattern matching and immutable goodness
:)
Scala Big Data Projects
• Spark
• GeoMesa
• Khronus - Akka time-series DB
• Sirius - Akka distributed KV Store
• FiloDB!
Actors vs Futures vs
Observables
One FiloDB Node
NodeCoordinatorActor
(NCA)
DatasetCoordinatorActor
(DsCA)
DatasetCoordinatorActor
(DsCA)
Active MemTable
Flushing MemTable
Reprojector ColumnStore
Data, commands
Akka vs Futures
NodeCoordinatorActor
(NCA)
DatasetCoordinatorActor
(DsCA)
DatasetCoordinatorActor
(DsCA)
Active MemTable
Flushing MemTable
Reprojector ColumnStore
Data, commands
Akka - control
flow
Core I/O - Futures/Observables
Akka vs Futures
• Akka Actors:
• External FiloDB node API (remote + cluster)
• Async messaging with clients
• Cluster/distributed state management
• Futures and Observables:
• Core I/O
• Columnar data processing / ingestion
• Type-safe processing stages
Futures for Single Actions
/**
* Clears all data from the column store for that given projection, for all versions.
* More like a truncation, not a drop.
* NOTE: please make sure there are no reprojections or writes going on before calling this
*/
def clearProjectionData(projection: Projection): Future[Response]
/**
* Completely and permanently drops the dataset from the column store.
* @param dataset the DatasetRef for the dataset to drop.
*/
def dropDataset(dataset: DatasetRef): Future[Response]
/**
* Appends the ChunkSets and incremental indices in the segment to the column store.
* @param segment the ChunkSetSegment to write / merge to the columnar store
* @param version the version # to write the segment to
* @return Success. Future.failure(exception) otherwise.
*/
def appendSegment(projection: RichProjection,
segment: ChunkSetSegment,
version: Int): Future[Response]
Monix / Reactive Streams
• http://monix.io
• “observable sequences that are exposed as
asynchronous streams, expanding on the
observer pattern, strongly inspired by ReactiveX
and by Scalaz, but designed from the ground up
for back-pressure and made to cleanly interact
with Scala’s standard library, compatible out-of-
the-box with the Reactive Streams protocol”
• Much better than Future[Iterator[_]]
Monix / Reactive Streams
def readChunks(projection: RichProjection,
columns: Seq[Column],
version: Int,
partMethod: PartitionScanMethod,
chunkMethod: ChunkScanMethod = AllChunkScan): Observable[ChunkSetReader] = {
scanPartitions(projection, version, partMethod)
// Partitions to pipeline of single chunks
.flatMap { partIndex =>
stats.incrReadPartitions(1)
readPartitionChunks(projection.datasetRef, version, columns, partIndex, chunkMethod)
// Collate single chunks to ChunkSetReaders
}.scan(new ChunkSetReaderAggregator(columns, stats)) { _ add _ }
.collect { case agg: ChunkSetReaderAggregator if agg.canEmit => agg.emit() }
}
}
Functional Reactive Stream
Processing
• Ingest stream merged with flush commands
• Built in async/parallel tasks via mapAsync
• Notify on end of stream, errors
val combinedStream = Observable.merge(stream.map(SomeData), flushStream)
combinedStream.map {
case SomeData(records) => shard.ingest(records)
None
case FlushCommand(group) => shard.switchGroupBuffers(group)
Some(FlushGroup(shard.shardNum, group, shard.latestOffset))
}.collect { case Some(flushGroup) => flushGroup }
.mapAsync(numParallelFlushes)(shard.createFlushTask _)
.foreach { x => }
.recover { case ex: Exception => errHandler(ex) }
Akka Cluster and
Spark
Spark/Akka Cluster Setup
Driver
NodeClusterActor
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Adding one executor
Driver
NodeClusterActor
Client
executor1
NCA
DsCA1 DsCA2
State:

Executors ->
(executor1)
MemberUp
ActorSelection
ActorRef
Adding second executor
Driver
NodeClusterActor
Client
executor1
NCA
DsCA1 DsCA2
State:

Executors ->
(executor1,
executor2) MemberUp
ActorSelection ActorRef
executor2
NCA
DsCA1 DsCA2
Sending a command
Driver
NodeClusterActor
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Flush()
Yes, Akka in Spark
• Columnar ingestion is stateful - need stickiness of
state. This is inherently difficult in Spark.
• Akka (cluster) gives us a separate, asynchronous
control channel to talk to FiloDB ingestors
• Spark only gives data flow primitives, not async
messaging
• We need to route incoming records to the correct
ingestion node. Sorting data is inefficient and forces
all nodes to wait for sorting to be done.
Data Ingestion Setup
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Node
Cluster
Actor
Partition Map
FiloDB NodeFiloDB Node
FiloDB separate nodes
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Node
Cluster
Actor
Partition Map
Testing Akka Cluster
• MultiNodeSpec / sbt-multi-jvm
• NodeClusterSpec
• Tests joining of different cluster nodes and
partition map updates
• Is partition map updated properly if a cluster
node goes down — inject network failures
• Lessons
Kamon Tracing
• http://kamon.io
• One trace can encapsulate multiple Future steps
all executing on different threads
• Tunable tracing levels
• Summary stats and histograms for segments
• Super useful for production debugging of reactive
stack
Kamon Tracing
def appendSegment(projection: RichProjection,
segment: ChunkSetSegment,
version: Int): Future[Response] = Tracer.withNewContext("append-segment") {
val ctx = Tracer.currentContext
stats.segmentAppend()
if (segment.chunkSets.isEmpty) {
stats.segmentEmpty()
return(Future.successful(NotApplied))
}
for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx)
writeIndexResp <- writeIndices(projection, version, segment, ctx)
if writeChunksResp == Success
} yield {
ctx.finish()
writeIndexResp
}
}
private def writeChunks(dataset: DatasetRef,
version: Int,
segment: ChunkSetSegment,
ctx: TraceContext): Future[Response] = {
asyncSubtrace(ctx, "write-chunks", "ingestion") {
val binPartition = segment.binaryPartition
val segmentId = segment.segmentId
val chunkTable = getOrCreateChunkTable(dataset)
Future.traverse(segment.chunkSets) { chunkSet =>
chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats)
}.map { responses => responses.head }
}
}
Kamon Metrics
• Uses HDRHistogram for much finer and more
accurate buckets
• Built-in metrics for Akka actors, Spray, Akka-Http,
Play, etc. etc.
KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312
p99=9895936 p999=16121856 max=19529728
KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824
p95=3424256 p99=9109504 p999=15335424 max=18874368
KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016
p99=888832 p999=2260992 max=8355840
Validation: Scalactic
private def getColumnsFromNames(allColumns: Seq[Column],
columnNames: Seq[String]): Seq[Column] Or BadSchema = {
if (columnNames.isEmpty) {
Good(allColumns)
} else {
val columnMap = allColumns.map { c => c.name -> c }.toMap
val missing = columnNames.toSet -- columnMap.keySet
if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) }
else { Good(columnNames.map(columnMap)) }
}
}
for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns)
dataColumns <- getColumnsFromNames(columns, normProjection.columns)
richColumns = dataColumns ++ computedColumns
// scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently
segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment")
keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row")
partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") }
yield {
• Notice how multiple validations compose!
Machine-Speed Scala
How do you go REALLY fast?
• Don’t serialize
• Don’t allocate
• Don’t copy
Filo fast
• Filo binary vectors - 2 billion records/sec
• Spark InMemoryColumnStore - 125 million
records/sec
• Spark CassandraColumnStore - 25 million
records/sec
Filo: High Performance
Binary Vectors
• Designed for NoSQL, not a file format
• random or linear access
• on or off heap
• missing value support
• Scala only, but cross-platform support possible
http://github.com/velvia/filo is a binary data vector library designed
for extreme read performance with minimal deserialization costs.
Billions of Ops / Sec
• JMH benchmark: 0.5ns per FiloVector element access / add
• 2 Billion adds per second - single threaded
• Who said Scala cannot be fast?
• Spark API (row-based) limits performance significantly
val randomInts = (0 until numValues).map(i => util.Random.nextInt)
val randomIntsAray = randomInts.toArray
val filoBuffer = VectorBuilder(randomInts).toFiloBuffer
val sc = FiloVector[Int](filoBuffer)
@Benchmark
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MICROSECONDS)
def sumAllIntsFiloApply(): Int = {
var total = 0
for { i <- 0 until numValues optimized } {
total += sc(i)
}
total
}
JVM Inlining
• Very small methods can be inlined by the JVM
• final def avoids virtual method dispatch.
• Thus methods in traits, abstract classes not inlinable
val base = baseReader.readInt(0)
final def apply(i: Int): Int = base + dataReader.read(i)
case (32, _) => new TypedBufferReader[Int] {
final def read(i: Int): Int = reader.readInt(i)
}
final def readInt(i: Int): Int = unsafe.getInt(byteArray, (offset + i * 4).toLong)
0.5ns/read is achieved through a stack of very small methods:
BinaryRecord
• Tough problem: FiloDB must handle many
different datasets, each with different schemas
• Cannot rely on static types and standard
serialization mechanisms - case classes,
Protobuf, etc.
• Serialization very costly, especially strings
• Solution: BinaryRecord
BinaryRecord II
• BinaryRecord is a binary (ie transport ready) record
class that supports any schema or mix of column
types
• Values can be extracted or written with no serialization
cost
• UTF8-encoded string class
• String compare as fast as native Java strings
• Immutable API once built
Use Case: Sorting
• Regular sorting: deserialize record, create sort
key, compare sort key
• BinaryRecord sorting: binary compare fields
directly — no deserialization, no object allocations
Regular Sorting
Protobuf/Avro etc record
Deserialized instance
Sort Key
Protobuf/Avro etc record
Deserialized instance
Sort Key
Cmp
BinaryRecord Sorting
• BinaryRecord sorting: binary compare fields
directly — no deserialization, no object allocations
name: Str age: Int
lastTimestamp:
Long
group: Str
name: Str age: Int
lastTimestamp:
Long
group: Str
SBT-JMH
• Super useful tool to leverage JMH, the best micro
benchmarking harness
• JMH is written by the JDK folks
In Summary
• Scala, Akka, reactive can give you both awesome
abstractions AND performance
• Use Akka for distribution, state, protocols
• Use reactive/Monix for functional, concurrent
stream processing
• Build (or use FiloDB’s) fast low-level abstractions
with good APIs
Thank you Scala OSS!

Más contenido relacionado

Destacado

Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Patricia Aas
 
What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?Black Duck by Synopsys
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware
 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocationJoris Bonnefoy
 
In-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry filesIn-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry filesMaxim Suhanov
 
SDN Architecture & Ecosystem
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & EcosystemKingston Smiler
 
Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Roman Elizarov
 
Introduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVIntroduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVKingston Smiler
 
Scaling and Transaction Futures
Scaling and Transaction FuturesScaling and Transaction Futures
Scaling and Transaction FuturesMongoDB
 
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...Azamat Abdoullaev
 
Blockchain demystification
Blockchain demystificationBlockchain demystification
Blockchain demystificationBellaj Badr
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Codemotion
 
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017Codemotion
 

Destacado (20)

Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
 
What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocation
 
Docker Networking
Docker NetworkingDocker Networking
Docker Networking
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Server virtualization
Server virtualizationServer virtualization
Server virtualization
 
Go Execution Tracer
Go Execution TracerGo Execution Tracer
Go Execution Tracer
 
In-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry filesIn-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry files
 
SDN Architecture & Ecosystem
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & Ecosystem
 
OpenFlow
OpenFlowOpenFlow
OpenFlow
 
Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017
 
Network Virtualization
Network VirtualizationNetwork Virtualization
Network Virtualization
 
Introduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVIntroduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFV
 
Scaling and Transaction Futures
Scaling and Transaction FuturesScaling and Transaction Futures
Scaling and Transaction Futures
 
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
 
Blockchain demystification
Blockchain demystificationBlockchain demystification
Blockchain demystification
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
 
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
 

Más de Evan Chan

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesEvan Chan
 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Evan Chan
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server TalkEvan Chan
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkEvan Chan
 

Más de Evan Chan (17)

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and Kubernetes
 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
 

Último

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 

Último (20)

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

2017 High Performance Database with Scala, Akka, Spark

  • 1. Building a High- Performance Database with Scala, Akka, and Spark Evan Chan November 2017
  • 2. Who am I User and contributor to Spark since 0.9, Cassandra since 0.6 Created Spark Job Server and FiloDB Talks at Spark Summit, Cassandra Summit, Strata, Scala Days, etc. http://velvia.github.io/
  • 3. Why Build a New Streaming Database?
  • 4. Needs • Ingest HUGE streams of events — IoT etc. • Real-time, low latency, and somewhat flexible queries • Dashboards, quick answers on new data • Flexible schemas and query patterns • Keep your streaming pipeline super simple • Streaming = hardest to debug. Simplicity rules!
  • 6. Spark + HDFS Streaming Kafka Spark Streaming Many small files (microbatches) Dedup, consolidate job Larger efficient files • High latency • Big impedance mismatch between streaming systems and a file system designed for big blobs of data
  • 7. Cassandra? • Ingest HUGE streams of events — IoT etc. • C* is not efficient for writing raw events • Real-time, low latency, and somewhat flexible queries • C* is real-time, but only low latency for simple lookups. Add Spark => much higher latency • Flexible schemas and query patterns • C* only handles simple lookups
  • 8. Introducing FiloDB A distributed, columnar time-series/event database. Built for streaming. http://www.github.com/filodb/FiloDB
  • 9. Message Queue Events Spark Streaming Short term storage, K-V Adhoc, SQL, ML Cassandra FiloDB: Events, ad-hoc, batch Spark Dashboa rds, maps
  • 10. 100% Reactive • Scala • Akka Cluster • Spark • Monix / Reactive Streams • Typesafe Config for all configuration • Scodec, Ficus, Enumeratum, Scalactic, etc. • Even most of the performance critical parts are written in Scala :)
  • 11. Scala, Akka, and Spark for Database
  • 12. Why use Scala and Akka? • Akka Cluster! • Just the right abstractions - streams, futures, Akka, type safety…. • Failure handling and supervision are critical for databases • All the pattern matching and immutable goodness :)
  • 13. Scala Big Data Projects • Spark • GeoMesa • Khronus - Akka time-series DB • Sirius - Akka distributed KV Store • FiloDB!
  • 14. Actors vs Futures vs Observables
  • 16. Akka vs Futures NodeCoordinatorActor (NCA) DatasetCoordinatorActor (DsCA) DatasetCoordinatorActor (DsCA) Active MemTable Flushing MemTable Reprojector ColumnStore Data, commands Akka - control flow Core I/O - Futures/Observables
  • 17. Akka vs Futures • Akka Actors: • External FiloDB node API (remote + cluster) • Async messaging with clients • Cluster/distributed state management • Futures and Observables: • Core I/O • Columnar data processing / ingestion • Type-safe processing stages
  • 18. Futures for Single Actions /** * Clears all data from the column store for that given projection, for all versions. * More like a truncation, not a drop. * NOTE: please make sure there are no reprojections or writes going on before calling this */ def clearProjectionData(projection: Projection): Future[Response] /** * Completely and permanently drops the dataset from the column store. * @param dataset the DatasetRef for the dataset to drop. */ def dropDataset(dataset: DatasetRef): Future[Response] /** * Appends the ChunkSets and incremental indices in the segment to the column store. * @param segment the ChunkSetSegment to write / merge to the columnar store * @param version the version # to write the segment to * @return Success. Future.failure(exception) otherwise. */ def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response]
  • 19. Monix / Reactive Streams • http://monix.io • “observable sequences that are exposed as asynchronous streams, expanding on the observer pattern, strongly inspired by ReactiveX and by Scalaz, but designed from the ground up for back-pressure and made to cleanly interact with Scala’s standard library, compatible out-of- the-box with the Reactive Streams protocol” • Much better than Future[Iterator[_]]
  • 20. Monix / Reactive Streams def readChunks(projection: RichProjection, columns: Seq[Column], version: Int, partMethod: PartitionScanMethod, chunkMethod: ChunkScanMethod = AllChunkScan): Observable[ChunkSetReader] = { scanPartitions(projection, version, partMethod) // Partitions to pipeline of single chunks .flatMap { partIndex => stats.incrReadPartitions(1) readPartitionChunks(projection.datasetRef, version, columns, partIndex, chunkMethod) // Collate single chunks to ChunkSetReaders }.scan(new ChunkSetReaderAggregator(columns, stats)) { _ add _ } .collect { case agg: ChunkSetReaderAggregator if agg.canEmit => agg.emit() } } }
  • 21. Functional Reactive Stream Processing • Ingest stream merged with flush commands • Built in async/parallel tasks via mapAsync • Notify on end of stream, errors val combinedStream = Observable.merge(stream.map(SomeData), flushStream) combinedStream.map { case SomeData(records) => shard.ingest(records) None case FlushCommand(group) => shard.switchGroupBuffers(group) Some(FlushGroup(shard.shardNum, group, shard.latestOffset)) }.collect { case Some(flushGroup) => flushGroup } .mapAsync(numParallelFlushes)(shard.createFlushTask _) .foreach { x => } .recover { case ex: Exception => errHandler(ex) }
  • 24. Adding one executor Driver NodeClusterActor Client executor1 NCA DsCA1 DsCA2 State:
 Executors -> (executor1) MemberUp ActorSelection ActorRef
  • 25. Adding second executor Driver NodeClusterActor Client executor1 NCA DsCA1 DsCA2 State:
 Executors -> (executor1, executor2) MemberUp ActorSelection ActorRef executor2 NCA DsCA1 DsCA2
  • 27. Yes, Akka in Spark • Columnar ingestion is stateful - need stickiness of state. This is inherently difficult in Spark. • Akka (cluster) gives us a separate, asynchronous control channel to talk to FiloDB ingestors • Spark only gives data flow primitives, not async messaging • We need to route incoming records to the correct ingestion node. Sorting data is inefficient and forces all nodes to wait for sorting to be done.
  • 28. Data Ingestion Setup Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Node Cluster Actor Partition Map
  • 29. FiloDB NodeFiloDB Node FiloDB separate nodes Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Node Cluster Actor Partition Map
  • 30. Testing Akka Cluster • MultiNodeSpec / sbt-multi-jvm • NodeClusterSpec • Tests joining of different cluster nodes and partition map updates • Is partition map updated properly if a cluster node goes down — inject network failures • Lessons
  • 31. Kamon Tracing • http://kamon.io • One trace can encapsulate multiple Future steps all executing on different threads • Tunable tracing levels • Summary stats and histograms for segments • Super useful for production debugging of reactive stack
  • 32. Kamon Tracing def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response] = Tracer.withNewContext("append-segment") { val ctx = Tracer.currentContext stats.segmentAppend() if (segment.chunkSets.isEmpty) { stats.segmentEmpty() return(Future.successful(NotApplied)) } for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx) writeIndexResp <- writeIndices(projection, version, segment, ctx) if writeChunksResp == Success } yield { ctx.finish() writeIndexResp } } private def writeChunks(dataset: DatasetRef, version: Int, segment: ChunkSetSegment, ctx: TraceContext): Future[Response] = { asyncSubtrace(ctx, "write-chunks", "ingestion") { val binPartition = segment.binaryPartition val segmentId = segment.segmentId val chunkTable = getOrCreateChunkTable(dataset) Future.traverse(segment.chunkSets) { chunkSet => chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats) }.map { responses => responses.head } } }
  • 33. Kamon Metrics • Uses HDRHistogram for much finer and more accurate buckets • Built-in metrics for Akka actors, Spray, Akka-Http, Play, etc. etc. KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312 p99=9895936 p999=16121856 max=19529728 KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824 p95=3424256 p99=9109504 p999=15335424 max=18874368 KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016 p99=888832 p999=2260992 max=8355840
  • 34. Validation: Scalactic private def getColumnsFromNames(allColumns: Seq[Column], columnNames: Seq[String]): Seq[Column] Or BadSchema = { if (columnNames.isEmpty) { Good(allColumns) } else { val columnMap = allColumns.map { c => c.name -> c }.toMap val missing = columnNames.toSet -- columnMap.keySet if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) } else { Good(columnNames.map(columnMap)) } } } for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns) dataColumns <- getColumnsFromNames(columns, normProjection.columns) richColumns = dataColumns ++ computedColumns // scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment") keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row") partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") } yield { • Notice how multiple validations compose!
  • 36. How do you go REALLY fast? • Don’t serialize • Don’t allocate • Don’t copy
  • 37. Filo fast • Filo binary vectors - 2 billion records/sec • Spark InMemoryColumnStore - 125 million records/sec • Spark CassandraColumnStore - 25 million records/sec
  • 38. Filo: High Performance Binary Vectors • Designed for NoSQL, not a file format • random or linear access • on or off heap • missing value support • Scala only, but cross-platform support possible http://github.com/velvia/filo is a binary data vector library designed for extreme read performance with minimal deserialization costs.
  • 39. Billions of Ops / Sec • JMH benchmark: 0.5ns per FiloVector element access / add • 2 Billion adds per second - single threaded • Who said Scala cannot be fast? • Spark API (row-based) limits performance significantly val randomInts = (0 until numValues).map(i => util.Random.nextInt) val randomIntsAray = randomInts.toArray val filoBuffer = VectorBuilder(randomInts).toFiloBuffer val sc = FiloVector[Int](filoBuffer) @Benchmark @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MICROSECONDS) def sumAllIntsFiloApply(): Int = { var total = 0 for { i <- 0 until numValues optimized } { total += sc(i) } total }
  • 40. JVM Inlining • Very small methods can be inlined by the JVM • final def avoids virtual method dispatch. • Thus methods in traits, abstract classes not inlinable val base = baseReader.readInt(0) final def apply(i: Int): Int = base + dataReader.read(i) case (32, _) => new TypedBufferReader[Int] { final def read(i: Int): Int = reader.readInt(i) } final def readInt(i: Int): Int = unsafe.getInt(byteArray, (offset + i * 4).toLong) 0.5ns/read is achieved through a stack of very small methods:
  • 41. BinaryRecord • Tough problem: FiloDB must handle many different datasets, each with different schemas • Cannot rely on static types and standard serialization mechanisms - case classes, Protobuf, etc. • Serialization very costly, especially strings • Solution: BinaryRecord
  • 42. BinaryRecord II • BinaryRecord is a binary (ie transport ready) record class that supports any schema or mix of column types • Values can be extracted or written with no serialization cost • UTF8-encoded string class • String compare as fast as native Java strings • Immutable API once built
  • 43. Use Case: Sorting • Regular sorting: deserialize record, create sort key, compare sort key • BinaryRecord sorting: binary compare fields directly — no deserialization, no object allocations
  • 44. Regular Sorting Protobuf/Avro etc record Deserialized instance Sort Key Protobuf/Avro etc record Deserialized instance Sort Key Cmp
  • 45. BinaryRecord Sorting • BinaryRecord sorting: binary compare fields directly — no deserialization, no object allocations name: Str age: Int lastTimestamp: Long group: Str name: Str age: Int lastTimestamp: Long group: Str
  • 46. SBT-JMH • Super useful tool to leverage JMH, the best micro benchmarking harness • JMH is written by the JDK folks
  • 47. In Summary • Scala, Akka, reactive can give you both awesome abstractions AND performance • Use Akka for distribution, state, protocols • Use reactive/Monix for functional, concurrent stream processing • Build (or use FiloDB’s) fast low-level abstractions with good APIs