SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Xephon-K
A lightweight TSDB with multiple backends
Pinglei Guo https://github.com/xephonhq/xephon-k
Agenda
● Overview
● Time Series Data Revisited
● Time Series Database state of the art
● Xephon-K Design
● Xephon-K Implementation
● Evaluation
● Lessons learned
● Related & Future work
● Conclusion
Overview
● Written in Golang (1,700 loc including bench and test)
● Use Cassandra as main backend
● Simple data model
● It is working
Time Series Data Revisited
NOT just data with timestamp
‘What happened, happened
and couldn’t have happened
another way’
- The Matrix
Time Series Data Revisited
Name Saving Update
time
Rabbit $100 2017/03/20
:12:59:33
Tiger $250 2017/03/20
:12:59:33
Name Daily
Transaction
Date
Rabbit +$100, 000 2017/03/19
Rabbit -$99, 900 2017/03/20
Tiger +$125 2017/03/19
Tiger +$125 2017/03/20
Single record, update in place, tell current state
A series of events, immutable, tell the history
Time Series Database state of the art
Xephon-K Cassandra Yes Golang at15 N/A 1
Full list on: https://github.com/xephonhq/awesome-time-series-database
Xephon-K Design
Xephon-K Implementation
● Naive schema and Cassandra data model
● Internal representation
● In Memory storage
● API
Xephon-K Implementation - Naive schema
metric_name metric_timestamp value
cpu 2017/03/17:13:24:00:20 10.2
cpu 2017/03/17:13:25:00:00 3.3
cpu 2017/03/17:13:26:00:00 5.6
mem 2017/03/17:13:24:00:20 80.3
mem 2017/03/17:13:25:00:00 60.2
mem 2017/03/17:13:26:00:00 90.3
cqlsh> SELECT * FROM metrics
Xephon-K Implementation - Naive schema
name metric_timestamp val
cpu 2017/03/17:13:24:00:20 10.2
cpu 2017/03/17:13:25:00:00 3.3
cpu 2017/03/17:13:26:00:00 5.6
mem 2017/03/17:13:24:00:20 80.3
mem 2017/03/17:13:25:00:00 60.2
mem 2017/03/17:13:26:00:00 90.3
The table is an abstraction of underlying map
Xephon-K Implementation
● Naive schema and Cassandra data model
● Internal representation
● In Memory storage
● API
Xephon-K Implementation - Internal representation
type IntPoint struct {
T int64
V int
}
type DoublePoint struct {
T int64
V double
}
type IntSeries struct {
Name string
Tags map[string]string
Points []IntPoint
}
type DoubleSeries struct {
Name string
Tags map[string]string
Points []DoublePoint
}
Xephon-K Implementation
● Naive schema and Cassandra data model
● Internal representation
● In Memory storage
● API
Xephon-K Implementation - In Memory storage
type Data map[SeriesID]*IntSeriesStore
type IntSeriesStore struct {
mu sync.RWMutex
series common.IntSeries
length int
}
type Index []IndexRow
type IndexRow struct {
key string
value string
seriesID SeriesID
}
Xephon-K Implementation
● Naive schema and Cassandra data model
● Internal representation
● In Memory storage
● API
Xephon-K Implementation - API Write
[
{
"name": "archive_file_tracked",
"tags": {
"host": "server1",
"data_center": "DC1"
},
"points": [
[1359788400000, 123],
[1359788300000, 13],
[1359788410000, 23]
]
}
]
http://localhost:2333/write
{
"points": [
[1359788400000, 123],
[1359788300000, 13],
],
"points": [
{"t": 1359788400000, "v": 123},
{"t": 1359788300000, "v": 13},
]
}
Use array instead of object, all numeric values are number in JSON
Evaluation Environment Setup
● i7-6700 CPU @ 3.40GHz 32 GB RAM HDD Ubuntu 16.10 ( kernel 4.8.0-39 )
● Docker 1.13 without resource limits on container
● InfluxDB 1.2
● KairosDB 1.12 + Cassandra 2.2
● Xephon-K (Go 1.7.4) + Cassandra 3.10
● Write to one series with one tag `cpi{agent:xephon-bench}` with fixed value
● Batch size 100 points, client timeout 30 seconds
● No QPS limit, No retry, No backoff
Evaluation - Throughput
Evaluation - Throughput
Database Total Requests
XKM 12327
XKC 7931
KairosDB 15561
InfluxDB 118
5 seconds, 10 workers
● InfluxDB performance is extremely poor (my bad?)
● KairosDB outperformed Xephon-K (K is from KairosDB …)
● Prometheus can’t be benchmarked (no HTTP API)
Evaluation Analysis
Q: Why InfluxDB is so slow ?
A: Good question, I am still figuring it out (see #15), you can’t blame docker, run it locally results the same
Q: Why KairosDB is faster, Java > Golang ?
● lock
● Buffer (batch size)
Q: That’s it?
A: Bingo! But https://github.com/xephonhq/xephon-k/tree/master/doc/bench
has bunch of results I didn’t dealt with
Q: The chart looks good, what are you using?
A: echarts3 http://echarts.baidu.com/ (One JavaScript a day, Keep Microsoft Excel away)
Lessons learned
● Write ugly code and make things work
● Hardware improve productivity, double the monitor, double the Loc/hr
● Source code is your bestfriend, don’t blindly believe what people say in the
doc, blog, conference, paper, twitter, stackoverflow
Related work
Xephon-B: A TSDB benchmark tool and benchmark result sharing platform
● https://github.com/xephonhq/xephon-b
● Is a never finished course project with @zchen
Reika A DSL for TSDB
● https://github.com/xephonhq/tsdb-proxy-java/tree/master/ql
● Is also a course project two
Xephon-K: I am course project three QvQ
<- Reika
Future work
● Refactor (everyday I am blaming the code of yesterday)
● Storage without Cassandra (yeah, this is course project four)
● Dashboard
● Benchmark driven development using Xephon-B
Acknowledgement
● Zheyuan Chen and Prof. Peter Alvaro for Xephon-B
● Chujiao Hou for Reika
Conclusion
● Time series data is a series of immutable data points, it tells history
● CQL is an illusion created for RDBMS people
● Cassandra is a map of maps that contains maps
● http://echarts.baidu.com/ is a good charting library
● Ugly code works, perfect is the enemy of deadline (well, video games to be honest)
● Xephon-K is awesome
● What people say in their presentation may not be true, use the source, Luke
Thank You!
No question, please, just let me go.

Más contenido relacionado

La actualidad más candente

MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011Chris Westin
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaMd Safiyat Reza
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScyllaDB
 
Redis: REmote DIctionary Server
Redis: REmote DIctionary ServerRedis: REmote DIctionary Server
Redis: REmote DIctionary ServerEzra Zygmuntowicz
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYEmanuel Calvo
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with ScyllaScyllaDB
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases labFabio Fumarola
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkDvir Volk
 
Data file handling in python binary & csv files
Data file handling in python binary & csv filesData file handling in python binary & csv files
Data file handling in python binary & csv fileskeeeerty
 

La actualidad más candente (20)

MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
 
Redis: REmote DIctionary Server
Redis: REmote DIctionary ServerRedis: REmote DIctionary Server
Redis: REmote DIctionary Server
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Data file handling in python binary & csv files
Data file handling in python binary & csv filesData file handling in python binary & csv files
Data file handling in python binary & csv files
 

Similar a Xephon K A Time series database with multiple backends

Make BDD great again
Make BDD great againMake BDD great again
Make BDD great againYana Gusti
 
JCConf 2016 - Google Dataflow 小試
JCConf 2016 - Google Dataflow 小試JCConf 2016 - Google Dataflow 小試
JCConf 2016 - Google Dataflow 小試Simon Su
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Learn backend java script
Learn backend java scriptLearn backend java script
Learn backend java scriptTsuyoshi Maeda
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Holden Karau
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudChangshu Liu
 
A quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesA quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesNicholas Crouch
 
GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionSimon Su
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMHolden Karau
 

Similar a Xephon K A Time series database with multiple backends (20)

Make BDD great again
Make BDD great againMake BDD great again
Make BDD great again
 
JCConf 2016 - Google Dataflow 小試
JCConf 2016 - Google Dataflow 小試JCConf 2016 - Google Dataflow 小試
JCConf 2016 - Google Dataflow 小試
 
Wider than rails
Wider than railsWider than rails
Wider than rails
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Learn backend java script
Learn backend java scriptLearn backend java script
Learn backend java script
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Einführung in MongoDB
Einführung in MongoDBEinführung in MongoDB
Einführung in MongoDB
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in Cloud
 
Revealing ALLSTOCKER
Revealing ALLSTOCKERRevealing ALLSTOCKER
Revealing ALLSTOCKER
 
A quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesA quick review of Python and Graph Databases
A quick review of Python and Graph Databases
 
GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow Introduction
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
 

Último

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Último (20)

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

Xephon K A Time series database with multiple backends

  • 1. Xephon-K A lightweight TSDB with multiple backends Pinglei Guo https://github.com/xephonhq/xephon-k
  • 2. Agenda ● Overview ● Time Series Data Revisited ● Time Series Database state of the art ● Xephon-K Design ● Xephon-K Implementation ● Evaluation ● Lessons learned ● Related & Future work ● Conclusion
  • 3. Overview ● Written in Golang (1,700 loc including bench and test) ● Use Cassandra as main backend ● Simple data model ● It is working
  • 4. Time Series Data Revisited NOT just data with timestamp ‘What happened, happened and couldn’t have happened another way’ - The Matrix
  • 5. Time Series Data Revisited Name Saving Update time Rabbit $100 2017/03/20 :12:59:33 Tiger $250 2017/03/20 :12:59:33 Name Daily Transaction Date Rabbit +$100, 000 2017/03/19 Rabbit -$99, 900 2017/03/20 Tiger +$125 2017/03/19 Tiger +$125 2017/03/20 Single record, update in place, tell current state A series of events, immutable, tell the history
  • 6. Time Series Database state of the art Xephon-K Cassandra Yes Golang at15 N/A 1 Full list on: https://github.com/xephonhq/awesome-time-series-database
  • 8. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  • 9. Xephon-K Implementation - Naive schema metric_name metric_timestamp value cpu 2017/03/17:13:24:00:20 10.2 cpu 2017/03/17:13:25:00:00 3.3 cpu 2017/03/17:13:26:00:00 5.6 mem 2017/03/17:13:24:00:20 80.3 mem 2017/03/17:13:25:00:00 60.2 mem 2017/03/17:13:26:00:00 90.3 cqlsh> SELECT * FROM metrics
  • 10. Xephon-K Implementation - Naive schema name metric_timestamp val cpu 2017/03/17:13:24:00:20 10.2 cpu 2017/03/17:13:25:00:00 3.3 cpu 2017/03/17:13:26:00:00 5.6 mem 2017/03/17:13:24:00:20 80.3 mem 2017/03/17:13:25:00:00 60.2 mem 2017/03/17:13:26:00:00 90.3 The table is an abstraction of underlying map
  • 11. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  • 12. Xephon-K Implementation - Internal representation type IntPoint struct { T int64 V int } type DoublePoint struct { T int64 V double } type IntSeries struct { Name string Tags map[string]string Points []IntPoint } type DoubleSeries struct { Name string Tags map[string]string Points []DoublePoint }
  • 13. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  • 14. Xephon-K Implementation - In Memory storage type Data map[SeriesID]*IntSeriesStore type IntSeriesStore struct { mu sync.RWMutex series common.IntSeries length int } type Index []IndexRow type IndexRow struct { key string value string seriesID SeriesID }
  • 15. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  • 16. Xephon-K Implementation - API Write [ { "name": "archive_file_tracked", "tags": { "host": "server1", "data_center": "DC1" }, "points": [ [1359788400000, 123], [1359788300000, 13], [1359788410000, 23] ] } ] http://localhost:2333/write { "points": [ [1359788400000, 123], [1359788300000, 13], ], "points": [ {"t": 1359788400000, "v": 123}, {"t": 1359788300000, "v": 13}, ] } Use array instead of object, all numeric values are number in JSON
  • 17. Evaluation Environment Setup ● i7-6700 CPU @ 3.40GHz 32 GB RAM HDD Ubuntu 16.10 ( kernel 4.8.0-39 ) ● Docker 1.13 without resource limits on container ● InfluxDB 1.2 ● KairosDB 1.12 + Cassandra 2.2 ● Xephon-K (Go 1.7.4) + Cassandra 3.10 ● Write to one series with one tag `cpi{agent:xephon-bench}` with fixed value ● Batch size 100 points, client timeout 30 seconds ● No QPS limit, No retry, No backoff
  • 19. Evaluation - Throughput Database Total Requests XKM 12327 XKC 7931 KairosDB 15561 InfluxDB 118 5 seconds, 10 workers ● InfluxDB performance is extremely poor (my bad?) ● KairosDB outperformed Xephon-K (K is from KairosDB …) ● Prometheus can’t be benchmarked (no HTTP API)
  • 20. Evaluation Analysis Q: Why InfluxDB is so slow ? A: Good question, I am still figuring it out (see #15), you can’t blame docker, run it locally results the same Q: Why KairosDB is faster, Java > Golang ? ● lock ● Buffer (batch size) Q: That’s it? A: Bingo! But https://github.com/xephonhq/xephon-k/tree/master/doc/bench has bunch of results I didn’t dealt with Q: The chart looks good, what are you using? A: echarts3 http://echarts.baidu.com/ (One JavaScript a day, Keep Microsoft Excel away)
  • 21. Lessons learned ● Write ugly code and make things work ● Hardware improve productivity, double the monitor, double the Loc/hr ● Source code is your bestfriend, don’t blindly believe what people say in the doc, blog, conference, paper, twitter, stackoverflow
  • 22. Related work Xephon-B: A TSDB benchmark tool and benchmark result sharing platform ● https://github.com/xephonhq/xephon-b ● Is a never finished course project with @zchen Reika A DSL for TSDB ● https://github.com/xephonhq/tsdb-proxy-java/tree/master/ql ● Is also a course project two Xephon-K: I am course project three QvQ <- Reika
  • 23. Future work ● Refactor (everyday I am blaming the code of yesterday) ● Storage without Cassandra (yeah, this is course project four) ● Dashboard ● Benchmark driven development using Xephon-B
  • 24. Acknowledgement ● Zheyuan Chen and Prof. Peter Alvaro for Xephon-B ● Chujiao Hou for Reika
  • 25. Conclusion ● Time series data is a series of immutable data points, it tells history ● CQL is an illusion created for RDBMS people ● Cassandra is a map of maps that contains maps ● http://echarts.baidu.com/ is a good charting library ● Ugly code works, perfect is the enemy of deadline (well, video games to be honest) ● Xephon-K is awesome ● What people say in their presentation may not be true, use the source, Luke
  • 26. Thank You! No question, please, just let me go.