Releases · thought-machine/spot-interruption-exporter

The app only needs to be deployed once per project now, regardless of the number of clusters. This release introduces a fundamental change in how the application works.

Spot preemption events are emitted as an audit log that contain the compute instance ID. These audit logs are forwarded to a pubsub topic via GCP Log Sink. The app then subscribes to this topic and handles the interruption event.

The audit log for instance preemption does not contain information about the Kubernetes cluster the instance may or may not have been associated with. Since the node is already deleted by the time the preemption event is received, the compute API cannot be queried for more information.

To work around this, the app keeps a mapping of compute instance ID to Kubernetes cluster. It can then use this when processing preemption events to publish the correct kubernetes_cluster label on the metric.

A second log router + pubsub topic exist to inform the app of new instances that belong to a Kubernetes cluster. On app startup, the compute API is queried to seed the mapping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: thought-machine/spot-interruption-exporter

Change prometheus label value to target_kubernetes_cluster

Fix bug: no permissions to list compute instances

Remove provider constraints

Fixes bug around instance interruption

Add support for multiple clusters in the same project

Fixes bug in v0.0.2

Change google_project_iam_binding to google_project_iam_member

Initial release