Allow quota buckets to be specified in flagd experiment config #10753

tyler-french · 2025-11-20T16:05:47Z

This PR allows for an easier application of quota buckets using the flagd config, which is evaluated at runtime. When the context of a request matches the config, the quota manager will attempt to create a new quota bucket and store it. If this group+namespace already has a matching bucket, the bucket creation will be skipped and the existing bucket will be used. Quota manager refresh or a restart will clear the buckets, and enable fresh loading from flagd for existing buckets, new buckets will be added automatically.

Since flagd evaluates by the request context, the flagd evaluation criteria available to the caller doesn't include information on whether the flag was evaluated by "group_id" or by something else.

To keep this safe, we need to ensure that the experiment configs applied are only ever applied per group_id, otherwise we'll have unexpected behavior:

Flag applied for user_id
Quota manager stores it under group_id
Flags of similar/different configurations can create same/different quota manager buckets
Inconsistent mapping of flag -> bucket

To prevent this, we need to ensure flagd configuration for quota.buckets is only applied by group_id.

We can do this by adding strict tests like: https://github.com/buildbuddy-io/buildbuddy-internal/pull/6216

Ref: https://github.com/buildbuddy-io/buildbuddy-internal/issues/354

Testing info:

Tested in dev:
https://app.buildbuddy.dev/invocation/e83dfef5-8732-4de6-b076-d2d879c096b9#

INFO: Analyzed target //enterprise/server:server (2267 packages loaded, 43174 targets configured).
WARNING: Remote Cache: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: quota exceeded for "rpc:/google.bytestream.ByteStream/Read" - to increase quota, request a quote at https://buildbuddy.io/request-quote
ERROR: /private/var/tmp/_bazel_tfrench/239123e21b1e00ce2ad3b38bdab6259a/external/gazelle++go_deps+dev_cel_expr/proto/cel/expr/BUILD.bazel:4:14: Generating Descriptor Set proto_library @@gazelle++go_deps+dev_cel_expr//proto/cel/expr:checked_proto failed: lost inputs with digests: 525643e720a2a54d106079d369c7f58811c076c1ceca04e90c67980a9e431d08/193

tylerwilliams · 2025-11-20T20:08:32Z

Instead of adding new machinery to block RPCs like this, wdyt about relying on the existing quota implementation for that and just allowing it to be configured via the experiments framework?

It'd be handy to be able to push an experiment flag change, have quota pick it up, and for a malicious user to start getting ResourceExhaustedErrors.

tyler-french · 2025-11-21T01:16:17Z

Instead of adding new machinery to block RPCs like this, wdyt about relying on the existing quota implementation for that and just allowing it to be configured via the experiments framework?

It'd be handy to be able to push an experiment flag change, have quota pick it up, and for a malicious user to start getting ResourceExhaustedErrors.

It'd be quite complex blocking a specific group using a bunch of rate limits, so I kept a setting to always block based on user group.

Moving the logic over to the quota_manager did add some complexity, but I think I'm happy with the implementation. A user can configure the same settings that went into the SQL table but do it like this:

This example would restrict the rate for Execute RPC, for a specific group

{
    "quota.buckets": {
      "state": "ENABLED",
      "variants": {
        "default": {},
        "limited": {
          "rpc:/build.bazel.remote.execution.v2.Execution/Execute": {
            "name": "execute-restrict",
            "maxRate": {
              "numRequests": 50,
              "period": "60s"
            },
            "maxBurst": 5
          }
        }
      },
      "defaultVariant": "default",
      "targeting": [
        {
          "if": [
            {"in": ["$group_id", ["GR1111"]]}
          ],
          "variant": "limited"
        }
      ]
    }
}