How I increased ClickHouse performance by 1000%

Kafka
4 min readAug 18, 2021

I work as a Senior Backend Developer in ad software aka (adTech). Naturally I deal with an immense amount of data. I will share how I increased performance of ClickHouse using Material Views and datatypes.

The Problem: I had a large query that would take up to 1 minute to complete. If a user had to wait one minute for data to load they’re not ever coming back.

The Solution: Created Material Views for each table and only inserted columns that were needed for the query. With Material Views I decreased that time down to under 3 seconds.

What it looks like:

Tables: The primary tables are Impressions and Clicks. These tables have several 100 gigabytes of data in each table. Hence we’re roughly looking at 2 TB of data.

Material Views: I created a Material View for the Impressions table and Clicks table. When creating the Material Views I only add columns needed for the query. If I added all the column into the material view I wouldn’t get the same performance. Additionally, I’m using a “where” filter to insert data based of a date range. We are NOT adding the entire table and that is the reason for the huge performance gain.

Example: Material View for impressions

Example: Material View for clicks

There is two important parts that I want to explain.

The first part is datatypes. From the image above we can see “id” String, “createdDate” Date. These are datatypes which allow better compression. In fact it uses less resources aka (HDD space) because the data is better compressed. This also means faster queries.

The second part is the use of the where filter. I only need about 60 days worth of data to full fill the user queries. When creating the material view I used createdDate BETWEEN data_sub(DAY, 60, today()) which means today date minus 60 days AND todays date. After the functions return value it will look like this: createdDate BETWEEN 2021–06–01 and 2021–08–01

At any given time the Material View will only hold 60 days worth of that data. In other words you do not need to trim the fat. The where filter automatically removes data that is older than 60 days from the Material View.

What about all the older data.
So it's highly likely we still need the old data for some queries. Here's what I did in order to decide which table to use. If the user search date was under 60 days I would give him the fast table which is a Material View. If the user searched for data that was older then 60 days they would be using the slow tables.

Here is the logic used to decided which table to use: NodeJS

Now just plug in the impressions / clicks variable into the query:

Datatypes:
We should also optimize our tables. As with material views we can optimize the data in our tables with datatype. Lets start with the LowCardinality

LowCardinality(String)

`status` LowCardinality(String), When to use the LowCardinality datatype? The status only has 4 different possibilities: BAD, GOOD, INCOMPLETE, FAILED. You would think I String datatype would fit best here but in fact LowCardinality(String) is the most performant in this case.

String

id` String, A String datatype would best be used for an unique id:
“id” 4edutyfuytf685edtc.

int’s 123

`advertiserId` UInt16

Floats 0.421

`price` Float32

Conclusion: We can dramatically increase performance by creating and using Material Views in the way I described. This can scale up to as many tables needed. We could also use more then 60 days if needed. Well I hope this helped someone and if it did smash that hand 🖐 button. Have a great one. Cheers 🍻 .

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Kafka
Kafka

Written by Kafka

“Genius” is 1% inspiration and 99% perspiration. Accordingly, a ‘genius’ is often merely a talented person who has done all of his homework — T.E.

No responses yet

Write a response