Bug: JGF Name was removed, and build with distroless destroyed logging #85
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There were a few problems here, and I tried to get at least 2 into separate commits.
Logging Not Working
The base image for the scheduler used to use alpine, which was a good strategy to have a minimal build (image size wise) and still have a filesystem. We used a filesystem to write logs to
/tmp/fluence.log
. So when this was changed to the current distroless, I basically saw no logging, and it's not clear if there were other errors being hidden beyond the initial startup in the entrypoint. I just saw nothing printed, anywhere, which made debugging hard, so I looked to the Dockerfile and found that. Then I was able to see the next layer of the error onion 🧅Discussed next.
Name removed
The interface conversion issue hinted at a change in flux, and looking at the data as presented (and what we expect to parse) it was quick to see. The "Name" field was removed from the response from fluxion:
Note that basename is present in the above, but not name. This led to an interface conversion error, where nil was attempted to be converted to a string.
Notably, another piece of this was that the bindings for fluxion-go were pinned to 0.32.0, and yet we were cloning the latest flux-sched. This is why I updated the bindings version to the latest, which has nicely been releasing itself for quite some time now :)
I'll ping for review when tests pass.