Clickhouse is a very interesting open source columnar database, mostly used for online analytical processing (OLAP). It speaks custom native TCP protocol, but also supports HTTP, so you don’t really need any special client or “driver” to talk to it.
There are many great features, too many to mention here, but I’ll mention a few killer ones I’ve used:
You can read this great blog post about Replicated/Distributed setup. The main takeaways are:
How does HAProxy fit in this story? Very easily, it can help you with data ingestion, monitor and route traffic to the proper ClickHouse server, to balance the data ingestion, and to simulate sticky sessions for your ClickHouse users (or to force them to fresh server every time, so they don’t forget to use Distributed tables, like a real BOFH).
If I wanted to be funny, I’d say there is a great synergy between HAProxy and ClickHouse.
listen clickhouse_http
mode http
bind :8443 ssl crt /path/to/cert.pem alpn h2,http/1.1
option httpchk GET /ping
http-check expect string Ok.
balance roundrobin
server ch01_shard01 10.0.0.1:8443 check ssl verify required ca-file CA_chain.pem
server ch02_shard01 10.0.0.2:8443 check ssl verify required ca-file CA_chain.pem
server ch03_shard02 10.0.0.3:8443 check ssl verify required ca-file CA_chain.pem
server ch04_shard02 10.0.0.4:8443 check ssl verify required ca-file CA_chain.pem
listen clickhouse_tcp
mode tcp
bind :9440 ssl crt /path/to/cert.pem
option httpchk GET /ping
http-check expect string Ok.
balance leastconn
server ch01_shard01 10.0.0.1:9440 check-ssl port 8443 ssl verify required ca-file CA_chain.pem
server ch02_shard01 10.0.0.2:9440 check-ssl port 8443 ssl verify required ca-file CA_chain.pem
server ch03_shard02 10.0.0.3:9440 check-ssl port 8443 ssl verify required ca-file CA_chain.pem
server ch04_shard02 10.0.0.4:9440 check-ssl port 8443 ssl verify required ca-file CA_chain.pem
It is quite unbelievable how effective this simple configuration is. We proxy both, HTTP and TCP traffic to ClickHouse (secured by SSL). Bare TCP can be used by official ClickHouse client like this:
cat ldjson.data | clickhouse client --host 1.2.3.4 --query "INSERT INTO mydb.mytable FORMAT JSONEachRow" --max_insert_block_size=100000
Yep, you can directly insert line delimited json into ClickHouse, life is beautiful, isn’t it? Moreover, as you can see, we use different port and protocol for checks, and we’ve selected leastconn algorithm for balancing (expecting that TCP connections are long lived).
Using HTTP interface is very easy too:
cat ldjson.data | gzip | curl --data-binary "@-" \
-H "Content-Encoding: gzip" \ "https://1.2.3.4:8443?query=INSERT+INTO+mydb.mytable+FORMAT+JSONEachRow"
We can even use client side compression for our own benefit!
What HAProxy provides is automatic routing and sharding, it will pick up one of our four servers and push the data through, ClickHouse will then replicate the data to the other servers in the same shard. HAProxy also helps with daily cluster operations, e.g. you can drain specific servers, shut them down, without impact on regular cluster usage.
There you go.
I hope you enjoyed this multi-essay series of "recipes" that explore the ways of bulding reliable applications with (HA)Proxy.
If by any chance you missed any of the prior essays, I chronologically outlined them below.
Serve yourself:
PS. If you have any questions on any part of this series, feel free to share them in the comment section.
1. Reliable apps with (HA)Proxy — Intro
2. Club sandwich + Double PostgreSQL — Solving problems with more layers
3. TLS Calzone + Wrap Crispy JWT — Authentication on the Edge
4. Docker Cake + Say my name — Log and resolve doubts
5. Service Discovery Ice cream — The long and winding road to K8s
6. Belgian OLAP Cube — (you’re here now) Turn the tables on failures with shards