-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cloudflare worker to collect API key usage inside Influx DB #1
Conversation
pyropy
commented
Dec 10, 2024
•
edited
Loading
edited
- Reports API usage by API key to Influx db
- Closes Collect & visualise spark-stats API usage based on access tokens space-meridian/roadmap#206
@pyropy in the future, please link to the relevant milestone issues in your pull request descriptions, to make it easier for other to navigate. In this case: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have at least some basic tests to validate this worker works as intended. What are the best practices for writing automated tests for Cloudflare Workers?
I have added unit tests. They do have documentation here on testing but I wonder if we need integration tests? |
This doesn't close it fully, as the visualization isn't done yet, right? |
Co-authored-by: Julian Gruber <[email protected]>
Co-authored-by: Julian Gruber <[email protected]>
Co-authored-by: Julian Gruber <[email protected]>
Co-authored-by: Julian Gruber <[email protected]>
Co-authored-by: Julian Gruber <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are making great progress!
Other required environment variables include the following: | ||
- `INFLUX_URL` - InfluxDB URL | ||
- `INFLUX_DATABASE` - InfluxDB database (bucket) name | ||
- `INFLUX_METRIC_NAME` - InfluxDB metric name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from(bucket: "api-observability")
I find this confusing. We have spark-api and spark-stats projects, this bucket name leads me to think about spark-api, while we are reporting telemetry for spark-stats.
Proposed bucket name: spark-stats-telemetry
or simply spark-stats
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to spark-stats
, spark-stats-telemetry
is redundant I think in the context of InfluxDB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with spark-stats
once we have this finished. I'll keep this bucket name while in development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks very good now. Besides addressing the existing comments (and a few new ones below), the most important step now is to verify that the proposed architecture (worker - cache - origin integration) works as expected.
wrangler.toml.example
Outdated
@@ -1,11 +1,10 @@ | |||
name = "cf-metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose finding a more descriptive identifier for this worker. I find "cf-metrics" too vague/generic.
ATM, this worker is observing access tokens and reporting the usage data. In the future, we will also want to allow access to authenticated clients only. Also, this worker is specific to spark-stats service.
Here are some alternatives I find more fitting:
- spark-stats-auth
- spark-stats-interceptor
- spark-stats-handler
@juliangruber do you have any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find cf-metrics
too generic. Once we have found a name, let's update the repository name to match.
In the future, we will also want to allow access to authenticated clients only
I'm not sure if that isn't better handled in spark-stats
, so I'd be cautious to count this as a requirement. To me, its sole purpose is to report api token usage to InfluxDB. What about
spark-stats-request-metrics
spark-stats-user-metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliangruber spark-stats-request-metrics
sound good to me. Miro do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @bajtos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, we will also want to allow access to authenticated clients only
I'm not sure if that isn't better handled in
spark-stats
, so I'd be cautious to count this as a requirement. To me, its sole purpose is to report api token usage to InfluxDB. What about
In my understanding, we must do authentication at the edge (in a CF worker) because we are heavily caching the responses, and therefore, most requests never reach spark-stats.
Having written that, I am fine with leaving authentication out of the scope of the current discussion.
The name spark-stats-request-metrics
sounds good to me.
So I've re-read the docs again and here's my takeway
My understanding of above given that our resource (app server running on fly.dev) will still be proxied by cloudflare is that cloudflare will automatically handle caching the responses.
For the above my understanding is that if request is not proxied by cloudflare you have to manually put it to the cache, but the fetch itself will be able to get it from cache if it has been cached previously. |
wrangler.toml.example
Outdated
@@ -1,11 +1,10 @@ | |||
name = "cf-metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @bajtos
Co-authored-by: Julian Gruber <[email protected]>
Co-authored-by: Julian Gruber <[email protected]>