Skip to content

Conversation

@bduffany
Copy link
Member

@bduffany bduffany commented Dec 1, 2025

This PR makes it so we can start writing usage data to ClickHouse (behind a flag) using the same UsageTracker interface, redis buffering logic, and DB flushing logic as the existing MySQL-based usage tracker.

To be able to reuse the same tracking interface, we transparently convert the MySQL usage rows to the new ClickHouse format. The benefit of starting out this way is that it allows us to start populating the DB so that we can start sorting out any data discrepancies sooner rather than later.

Once I've sorted out any data discrepancies, I plan on adding a first-class interface + dedicated redis entries for OLAP DB usage tracking, rather than reusing the primary DB structs and redis storage. At that point, we'll be able to start attaching arbitrary labels to usage rows, giving us the full benefit of ClickHouse (on top of just the performance benefits).

@bduffany bduffany force-pushed the usage-olap2 branch 5 times, most recently from fa2678b to 01eab20 Compare December 2, 2025 01:48
"test.recycle-runner": "true",
"test.runner-recycling-key": "clickhouse:25.3",
},
tags = ["docker"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this tag do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It indicates that the test requires docker to be available in the execution environment. The effect is that the test is excluded when running bazel test //... locally.

var (
region = flag.String("app.region", "", "The region in which the app is running.")
region = flag.String("app.region", "", "The region in which the app is running.")
writeToOLAP = flag.Bool("app.write_usage_to_olap_db", false, "If true, write usage data to OLAP DB.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If true, write usage data to OLAP DB.

Maybe specify that this is in addition to the primary write.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return err
}
// Update counts in the DB
// TODO: for ClickHouse, we should flush everything in a single
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to do this now since the current approach doesn't play nicely with async inserts.

  • Only one app is flushing at a time.
  • Because we use wait_for_async_insert=true and async_insert_busy_timeout_max_ms=200, each write can take up to 200ms.
  • We don't start the next write until the first is completed.

This means that we don't do any batching, and the progress will be very slow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good catch - done


// Flush a accumulated OLAP rows.
if len(olapRows) > 0 {
if err := ut.env.GetOLAPDBHandle().FlushUsages(ctx, olapRows); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about adding a check that GetOLAPDBHandle() isn't nil, so that we can return an error instead of panicking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants