Releases: thought-machine/spot-interruption-exporter
Change prometheus label value to target_kubernetes_cluster
Fix bug: no permissions to list compute instances
The SA was missing a binding to the compute viewer role, causing it to fail to list compute instances.
Remove provider constraints
Remove provider constraints so the parent module can set them.
Fixes bug around instance interruption
In v1.0.0, if an instance was interrupted but no record of it existed locally, the goroutine would exit without cancelling context, so the app would just hang.
Add support for multiple clusters in the same project
The app only needs to be deployed once per project now, regardless of the number of clusters. This release introduces a fundamental change in how the application works.
Spot preemption events are emitted as an audit log that contain the compute instance ID. These audit logs are forwarded to a pubsub topic via GCP Log Sink. The app then subscribes to this topic and handles the interruption event.
The audit log for instance preemption does not contain information about the Kubernetes cluster the instance may or may not have been associated with. Since the node is already deleted by the time the preemption event is received, the compute API cannot be queried for more information.
To work around this, the app keeps a mapping of compute instance ID to Kubernetes cluster. It can then use this when processing preemption events to publish the correct kubernetes_cluster
label on the metric.
A second log router + pubsub topic exist to inform the app of new instances that belong to a Kubernetes cluster. On app startup, the compute API is queried to seed the mapping.
Fixes bug in v0.0.2
v0.0.3 Fix incorrect syntax
Change google_project_iam_binding to google_project_iam_member
This makes the IAM binding non-authoritative so additional logging components can add themselves to the binding
Initial release
v0.0.0 update readme and metric