2. Presenter and Company Bio
www.altinity.com
Enterprise provider for ClickHouse, a
popular, open source data warehouse.
Community sponsor and major
committers to ClickHouse project.
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security. Using
Kubernetes since 2018.
2
4. Single binary
Understands SQL
Runs on bare metal to cloud
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
ClickHouse is an open source data warehouse
ClickHouse Server
a b c d
And it’s really fast!
ClickHouse Server
a b c d
ClickHouse Server
a b c d
ClickHouse Server
a b c d
6. ClickHouse goodness delivered by Docker
mkdir $HOME/clickhouse-data
docker run -d --name clickhouse-server
--ulimit nofile=262144:262144
--volume=$HOME/clickhouse-data:/var/lib/clickhouse
-p 8123:8123 -p 9000:9000
yandex/clickhouse-server
6
Persist data
Make ports visible
Make ClickHouse happy
7. YES!
● Yandex Managed Service for ClickHouse --
Runs in Yandex.Cloud
● Altinity.Cloud -- Runs in Amazon Public Cloud
Is there ClickHouse cloud goodness?
7
8. Where is the documentation?
8
https://clickhouse.tech/
10. 10
First step: The ClickHouse Tutorial
10
https://clickhouse.tech/docs/en/getting-started/tutorial/
11. Second step: Design table(s) and load data
CREATE TABLE meetup.readings (
sensor_id Int32,
time DateTime,
date Date,
temperature Decimal(5,2)
)
Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (sensor_id, time);
Don’t stress about data types
Use MergeTree table types
Partition by month or day
Sort by “keys” to find dataLZ4 compression by default
12. Table
Part
Index Columns
Sparse index
Columns sorted
on ORDER BY
columns
Rows match
PARTITION BY
expression
Part
Index Columns
Part
Compressed
block
12
Your friend: the MergeTree table type
12
14. # Load CSV
cat readings.csv |
clickhouse-client
--query "INSERT INTO meetup.readings FORMAT CSVWithNames"
# Load JSON
cat readings.json |
clickhouse-client --query "INSERT INTO meetup.readings
FORMAT JSONEachRow"
Loading through clickhouse-client
15. -- Load from a file function.
sudo mkdir -p /var/lib/clickhouse/user_files
sudo chmod 777 /var/lib/clickhouse/user_files
sudo cp readings.json /var/lib/clickhouse/user_files
clickhouse-client
pika :) INSERT INTO meetup.readings
SELECT *
FROM file('readings.json', 'JSONEachRow',
'sensor_id Int32, time DateTime, date Date, temperature
Decimal(5,2)')
Loading through table functions
16. -- Insert from S3
INSERT INTO meetup.readings
SELECT * FROM
s3('https://s3.us-east-1.amazonaws.com/altinity-data-1/readings.csv',
'CSVWithNames',
'sensor_id Int32, time DateTime, date Date, temperature
Decimal(5,2)')
NEW: loading data from S3 (20.8+)
17. 17
Third Step: Go crazy with your own queries
17
https://clickhouse.tech/docs/en/sql-reference/statements/select/
18. But what about client libraries??
1818
Language Popular Drivers
C++ https://github.com/ClickHouse/clickhouse-cpp
Golang https://github.com/ClickHouse/clickhouse-go
Java https://github.com/ClickHouse/clickhouse-jdbc
ODBC https://github.com/ClickHouse/clickhouse-odbc
Python https://github.com/mymarilyn/clickhouse-driver
PHP and Javascript Use a library listed on ClickHouse.tech *or* roll your own using
the ClickHouse HTTP interface
21. a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
MySQL: Row Store Access
Read row data serially
22. a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
Column Store Access
Read compressed columns in parallel
23. There is no penalty for wide tables
“Pay” only for the columns you read
24. Compression makes data even smaller
Data
Type
Codec Compression
LowCardinality
(String)
(none) LZ4
UInt32 DoubleDelta ZSTD(1)
25. Optimize compression to reduce I/O!
CREATE TABLE billy.readings (
sensor_id Int32 Codec(DoubleDelta, ZSTD(1)),
time DateTime Codec(DoubleDelta, ZSTD(1)),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, ZSTD(1))
)
Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (sensor_id, time);
Codec
Compression
Computed value
27. Materialized views restructure/reduce data
readings
Table
Ingest
All sensor readings Daily max/min by sensor
readings_daily
AggregatingMergeTree
(Trigger)
readings_daily_mv
Materialized View
CREATE MATERIALIZED VIEW billy.readings_daily_mv
TO billy.readings_daily AS
SELECT sensor_id, date,
minState(temperature) as temp_min,
maxState(temperature) as temp_max
FROM billy.readings
GROUP BY sensor_id, date;
Size: 544GB
Rows: 500B
Size: 1.7GB
Rows: 347M
28. Materialized views function like indexes!
SELECT max(temp_max)
FROM billy.readings_daily
WHERE sensor_id = 55
┌─max(temp_max)─┐
│ 75.91 │
└───────────────┘
1 rows in set. Elapsed: 0.011 sec. Processed 180.22
thousand rows, 1.44 MB (15.86 million rows/s., 126.84
MB/s.)
29. ClickHouse performance tuning is different...
The bad news…
● No query optimizer
● No EXPLAIN PLAN
● May need to move [a lot
of] data for performance
The good news…
● No query optimizer!
● System log is great
● System tables are too
● Performance drivers are
simple: I/O and CPU
● Constantly improving
30. Your friend: the ClickHouse query log
clickhouse-client --send_logs_level=trace
sudo less
/var/log/clickhouse-server/clickhouse-server.log
Return messages to
clickhouse-client
View all log
messages on server
31. Strengths and weaknesses of ClickHouse
(-) Lots of “small” lookups
(-) Lots of updates
(-) High concurrency
(-) Consistency critical
(+) Very long tables
(+) Very wide tables
(+) Open ended questions
(+) Lots of aggregates
OLTP
(“Online Transaction Processing”)
OLAP
(“Online Analytical Processing”)
ClickHouse >> MySQL for analytic queries
32. ● Community docs on ClickHouse.tech
○ Everything Clickhouse
● ClickHouse Youtube Channel
○ Piles of community videos
● Altinity Blog
○ Lots of articles about ClickHouse usage
● Altinity Webinars
○ Webinars on all aspects of ClickHouse
● ClickHouse source code on Github
○ Check out tests for examples of detailed usage
More information and references
32