snyk: add rate limit handling to audit_logs and issues data streams#16184
snyk: add rate limit handling to audit_logs and issues data streams#16184efd6 wants to merge 16 commits intoelastic:mainfrom
Conversation
🚀 Benchmarks reportTo see the full report comment with |
|
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
chrisberkhout
left a comment
There was a problem hiding this comment.
What do you think about testing this? I think the system test could be modified to exercise the new logic, but that's probably not worthwhile.
I think it would be nice if the commit message or at least the PR's "How to test this PR locally" section had some advice about how to manually exercise it.
I'm thinking I'd run the system test's stream config outside of a container, with extra response headers, then modify the CEL program to run in miko/mito and add some debug() calls to inspect the debugging value.
| rate_headers.with( | ||
| { | ||
| // Work around inf detection in input. | ||
| // If the headers are missing or rate_limit failed, rate and | ||
| // next may be missing. So use optional types. | ||
| ?"rate": (rate_headers.?rate == optional.of(double("Infinity"))) ? optional.of("inf") : optional.none(), | ||
| ?"next": (rate_headers.?next == optional.of(double("Infinity"))) ? optional.of("inf") : optional.none(), | ||
| } | ||
| ) |
There was a problem hiding this comment.
This part will also only affect the debugging value, right?
I think it'd be better to combine it the the following by adding the {"rate_limit": ...} wrapper, so we have one block for side effects and one block for a debugging value.
There was a problem hiding this comment.
Not quite, but you have identified a nest of bugs that are addressed in 94b456a.
486b884 to
14e09a1
Compare
|
@chrisberkhout there is significant work to do to get this to merge due to conflicts. I'll ping you when it's ready. |
* dynamically applied rate limits * canonical_mime_header_key overload
* fix propagation of rate_limit header where possible
* fix propagation of rate_limit header where possible * explain what is happening better * in-line rate_limit construction or drop unused values
14e09a1 to
94b456a
Compare
* add rate-limit headers
This should not have been converted to a timestamp.
|
/test |
|
/test |
There was a problem hiding this comment.
Looks good.
I put a few comments that don't need any response.
One thing left is that the "We are doing this work for the side-effects of the rate_limit call" comments are a bit misleading in the cases where the values are added to the response and returned in the final result.
Maybe some could be changed to:
We are doing this work only for the side-effects of the rate_limit call.
and the others could be removed or changed to something like:
The rate_limit call has side effects as well as generating values for the final result.
| @@ -144,26 +143,29 @@ program: |- | |||
| 0 | |||
| ) | |||
| ).as(rate_headers, | |||
There was a problem hiding this comment.
Could use resp.Header directly rather than as headers.
| ?"rate": (rate_headers.?rate == optional.of(double("Infinity"))) ? optional.of("inf") : optional.none(), | ||
| ?"next": (rate_headers.?next == optional.of(double("Infinity"))) ? optional.of("inf") : optional.none(), |
There was a problem hiding this comment.
Just thinking...
Looks like this checks for an Infinity valued double and switches it to the string "inf", which in the input will be switched back to rate.Inf, which is the finite math.MaxFloat64 rather than infinite.
Seems like it could have been made simpler, especially in x/time/rate.
But given that's the way it is, maybe at some point rate_limit should handle avoiding infinite rates?
There was a problem hiding this comment.
Yeah, this is unfortunate history. It's a conflict between between encoding/json, mito/lib, x/time/rate and cel-go. The origin of the "inf" is from JSON serialisation of math.Inf(1) that results from divisions. We could conceivably condition the return values from the rate limit extensions so that if either of these end up being infinite, it gets replaced with rate.Inf. I think this would be backwards compatible; the relevant text in the documentation is 'The map returned by the policy functions should have "rate" and "next" fields with type rate.Limit or string with the value "inf", a "burst" field with type int and a "reset" field with type time.Time in the UTC location. The semantics of "rate" and "burst" are described in the documentation for the golang.org/x/time/rate package.' This would remain true, but the second option would never happen.
Do you want to file an issue in mito?
| // Threading rate-limit results obtained here through to the final | ||
| // result is not tenable, so we do not do any work to allow them | ||
| // to be interpreted by the input and just drop them. |
There was a problem hiding this comment.
Just thinking about the existing structure of the CEL program...
Doing multiple requests per eval means you have to reduce the results with complicated logic and accept compromises regarding the feedback that can go back to the input. This is a case of that.
I think the work list approach like we have now in o365 is easier to reason about and can provide better feedback and error handling. The downside is the overhead of extra evals, but I think it's usually worth it. Or maybe I'm missing something?
There was a problem hiding this comment.
I think this would be worthwhile. I also think it's work for another PR.
💚 Build Succeeded
History
cc @efd6 |
Proposed commit message
Checklist
changelog.ymlfile.Author's Checklist
How to test this PR locally
The system tests exercise the rate limit handling path and you will see this in the pattern of event publications in the issues data stream when tests are run with
-v.To see the rate limit values and the extracted headers, make the following changes to the CEL programs
Details
And examine the agent logs (either via docker logs or by getting a diagnostic).
The
RATE_LIMIT_1andRATE_LIMIT_2tags will appear in the agent debug logs with the computedrate,burst,next, andresetfields from the rate limiter. Check that the values of these fields are sane relative to the HTTP headers which will also be present in the debug output in the message.Related issues
Screenshots