Skip to content

Conversation

@tyler-french
Copy link
Contributor

@tyler-french tyler-french commented Nov 20, 2025

This PR allows for an easier application of quota buckets using the flagd config, which is evaluated at runtime. When the context of a request matches the config, the quota manager will attempt to create a new quota bucket and store it. If this group+namespace already has a matching bucket, the bucket creation will be skipped and the existing bucket will be used. Quota manager refresh or a restart will clear the buckets, and enable fresh loading from flagd for existing buckets, new buckets will be added automatically.

Since flagd evaluates by the request context, the flagd evaluation criteria available to the caller doesn't include information on whether the flag was evaluated by "group_id" or by something else.

To keep this safe, we need to ensure that the experiment configs applied are only ever applied per group_id, otherwise we'll have unexpected behavior:

  • Flag applied for user_id
  • Quota manager stores it under group_id
  • Flags of similar/different configurations can create same/different quota manager buckets
  • Inconsistent mapping of flag -> bucket

To prevent this, we need to ensure flagd configuration for quota.buckets is only applied by group_id.

We can do this by adding strict tests like: https://github.com/buildbuddy-io/buildbuddy-internal/pull/6216

Ref: https://github.com/buildbuddy-io/buildbuddy-internal/issues/354

Testing info:

Tested in dev:
https://app.buildbuddy.dev/invocation/e83dfef5-8732-4de6-b076-d2d879c096b9#

INFO: Analyzed target //enterprise/server:server (2267 packages loaded, 43174 targets configured).
WARNING: Remote Cache: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: quota exceeded for "rpc:/google.bytestream.ByteStream/Read" - to increase quota, request a quote at https://buildbuddy.io/request-quote
ERROR: /private/var/tmp/_bazel_tfrench/239123e21b1e00ce2ad3b38bdab6259a/external/gazelle++go_deps+dev_cel_expr/proto/cel/expr/BUILD.bazel:4:14: Generating Descriptor Set proto_library @@gazelle++go_deps+dev_cel_expr//proto/cel/expr:checked_proto failed: lost inputs with digests: 525643e720a2a54d106079d369c7f58811c076c1ceca04e90c67980a9e431d08/193

@tylerwilliams
Copy link
Member

Instead of adding new machinery to block RPCs like this, wdyt about relying on the existing quota implementation for that and just allowing it to be configured via the experiments framework?

It'd be handy to be able to push an experiment flag change, have quota pick it up, and for a malicious user to start getting ResourceExhaustedErrors.

@tyler-french
Copy link
Contributor Author

Instead of adding new machinery to block RPCs like this, wdyt about relying on the existing quota implementation for that and just allowing it to be configured via the experiments framework?

It'd be handy to be able to push an experiment flag change, have quota pick it up, and for a malicious user to start getting ResourceExhaustedErrors.

It'd be quite complex blocking a specific group using a bunch of rate limits, so I kept a setting to always block based on user group.

Moving the logic over to the quota_manager did add some complexity, but I think I'm happy with the implementation. A user can configure the same settings that went into the SQL table but do it like this:

This example would restrict the rate for Execute RPC, for a specific group

{
    "quota.buckets": {
      "state": "ENABLED",
      "variants": {
        "default": {},
        "limited": {
          "rpc:/build.bazel.remote.execution.v2.Execution/Execute": {
            "name": "execute-restrict",
            "maxRate": {
              "numRequests": 50,
              "period": "60s"
            },
            "maxBurst": 5
          }
        }
      },
      "defaultVariant": "default",
      "targeting": [
        {
          "if": [
            {"in": ["$group_id", ["GR1111"]]}
          ],
          "variant": "limited"
        }
      ]
    }
}

@tyler-french tyler-french changed the title prevent certain group_ids or user_ids from accessing cache allow quota buckets to be specified in flagd experiment config Nov 21, 2025
@tyler-french tyler-french changed the title allow quota buckets to be specified in flagd experiment config Allow quota buckets to be specified in flagd experiment config Nov 21, 2025
@tyler-french tyler-french force-pushed the tfrench/user-block branch 4 times, most recently from 3c4accb to 337d4fc Compare November 21, 2025 02:41
@tyler-french tyler-french marked this pull request as draft November 21, 2025 14:35
@tyler-french tyler-french force-pushed the tfrench/user-block branch 4 times, most recently from 50c1eef to e509acf Compare November 21, 2025 16:40
@tyler-french tyler-french force-pushed the tfrench/user-block branch 5 times, most recently from a27b1b6 to eb11963 Compare November 24, 2025 16:45
@tyler-french tyler-french marked this pull request as ready for review November 24, 2025 16:52
@tyler-french tyler-french force-pushed the tfrench/user-block branch 3 times, most recently from d4315f6 to 4f2376d Compare November 26, 2025 18:46
return status.InternalError("experiment flag provider not configured")
}

// If the bucket is already loaded, skip it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean existing quota buckets can't be updated in real time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the listenForUpdates to also Subscribe to the flagd updates. So now any change to flagd will refresh the buckets.

//
// quota.buckets must have valid structure, for example:
//
// "rpc:/google.bytestream.ByteStream/Read": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda confused about the structure here, is it supposed to be:

"rpcName": {
  "name": "arbitrary name",
  "maxRate": {...}
  "maxBurst": ...
}

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned this up a bit. I think name is not useful, so I'll just generate a name to satisfy the structure.

return res
}

func bucketRowFromMap(namespace string, bucketMap map[string]interface{}) (*tables.QuotaBucket, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if it would make sense to put the quota representation as a type so we can do all the parsing, merging, and type-checking in functions on that type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might make sense if we had more control over the flagd unmarshal. However, since the Object returns us a map[string]interface{}, we need to parse/validate that anyway to a useful form.

I originally looked at going from map -> proto -> bucket which would be more clean, but this code could theoretically be happening on the hot path. The proto has a validateBucket(pbBucket), which is used if a bucket is being added via the API.

If we transition the logic to entirely flagd, then we can rremove the the qpb.Bucket AND the tables.QuotaBucket entriely.

For now, I just used a local structure to decouple from the DB table object.

@tyler-french tyler-french force-pushed the tfrench/user-block branch 2 times, most recently from 6c9f5db to 35968ea Compare December 2, 2025 01:07
@tyler-french tyler-french force-pushed the tfrench/user-block branch 5 times, most recently from 83352df to 96b162b Compare December 2, 2025 19:27
periodDurationUsec: period,
maxBurst: maxBurst,
}
return config, config.validate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to return config if config.validate() returns an error?

@tyler-french tyler-french merged commit 9eac540 into master Dec 3, 2025
11 checks passed
@tyler-french tyler-french deleted the tfrench/user-block branch December 3, 2025 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants