Scaling request logging with ClickHouse, Kafka, and Vector

(geocod.io)

86 points | by mjwhansen 5 days ago

8 comments

rozenmd 3 hours ago
Great write-up!
I had a similar project back in August when I realised my DB's performance (Postgres) was blocking me from implementing features users commonly ask for (querying out to 30 days of historical uptime data).
I was already blown away at the performance (200ms to query what Postgres was doing in 500-600ms), but then I realized I hadn't put an index on the Clickhouse table. Now the query returns in 50-70ms, and that includes network time.
saisrirampur 1 hour ago
Sai from ClickHouse here. Very compelling story! Really love your emphasis on using the right tool for the right job - power of row vs column stores.
We recently added a MySQL/MariaDB CDC connector in ClickPipes on ClickHouse Cloud. This would have simplified your migration from MariaDB.
https://clickhouse.com/docs/integrations/clickpipes/mysql https://clickhouse.com/docs/integrations/clickpipes/mysql/so...
nasretdinov 3 hours ago
BTW you could've used e.g. kittenhouse (https://github.com/YuriyNasretdinov/kittenhouse, my fork) or just a simpler buffer table, with 2 layers and a larger aggregation period than in the example.
Alternatively, you could've used async insert functionality built into ClickHouse: https://clickhouse.com/docs/optimize/asynchronous-inserts . All of these solutions are operationally simpler than Kafka + Vector, although obviously it's all tradeoffs.
[-]
- devmor 3 hours ago
  There were a lot of simpler options that came to mind while reading through this, frankly.
  But I imagine the writeup eschews myriad future concerns and does not entirely illustrate the pressure and stress of trying to solve such a high-scale problem.
  Ultimately, going with a somewhat more complex solution that involves additional architecture but has been tried and tested by a 3rd party that you trust can sometimes be the more fitting end result. Assurance often weighs more than simplicity, I think.
  [-]
  - nasretdinov 3 hours ago
    While kittenhouse is, unfortunately, abandonware (even though you can still use it and it works), you can't say the same about e.g. async inserts in ClickHouse: it's a very simple and robust solution to tackle exactly the problem the PHP (and some other languages') backends often face when trying to use ClickHouse
tlaverdure 2 hours ago
Thanks for sharing. I really enjoyed the breakdown, and great to see small tech companies helping each other out!
frenchmajesty 3 hours ago
Thanks for sharing I enjoyed reading this.
mperham 2 hours ago
Seems weird not to use Redis as the buffering layer + minutely cron job. Seems a lot simpler than installing Kafka + Vector.
[-]
- SteveNuts 19 minutes ago
  Vector is very simple to operate and (mostly) stateless, and can handle buffering if you choose.
  Kafka and Redis is a "pick your poison" IMO, scaling and operating those have their own headaches.
albertgoeswoof 2 hours ago
Currently at the millions stage with https://mailpace.com relying mostly on Postgres
Tbh this terrifies me! We don’t just have to log the requests but also store the full emails for a few days, and they can be up to 50 mib in total size.
But it will be exciting when we get there!
fnord77 2 hours ago
How does Clickhouse compare to Druid, Pinot or Star Tree?
[-]
- jamesblonde 1 hour ago
  Here's a good performance study by OneHouse comparing Clickhouse, StarRocks, Trino:
  https://www.onehouse.ai/blog/apache-spark-vs-clickhouse-vs-p...
  Druid is real-time analytics, similar to Clickhouse. StarRocks is best at Joins - Clickhouse is not good for joins.
  [-]
  - manish_gill 43 minutes ago
    > Clickhouse is not good for joins
    This is less and less true as time goes on tbh. 25.9 introduced Join Reordering as well - https://clickhouse.com/blog/clickhouse-release-25-09