You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@cisaacstern and I just debugged a jobsubmission over here. It turns out that the job would fail if the job writes to a non-persistent bucket:
The bucket in question was gs://leap-scratch (set up by 2i2c for the leap-stc org) and the error was:
Workflow failed. Causes: Unable to create directory: gs://leap-scratch/data-library/temp/gh-leap-stc-data-management-da1b838-1683838917.1683838945.611305/dax-tmp-2023-05-11_14_02_30-7266315841491564283-S05-0-dcd150a5967231a.
Changing both temp_storage_location and the cache location to the persistent bucket fixed this issue.
Weird things 🤪
The service account used to deploy the job from gh actions is the same one used on the workers, yet the deployment writes successfully to the bucket, but the runtime write fails.
@cisaacstern said that in CI testing of pgf runner they use a non-persistent bucket for the temp_storage_location without issues -> Is something specific in the retention policies messing things up?
The text was updated successfully, but these errors were encountered:
Is something specific in the retention policies messing things up?
Yes, could be some subtle difference in the retention policies of our CI bucket for pangeo-forge-runner vs. the leap-scratch bucket. We should take a close look at any differences between these two
Unable to create directory: gs://leap-scratch/data-library/temp/gh-leap-stc-data-management-da1b838-1683838917.1683838945.611305/dax-tmp-2023-05-11_14_02_30-7266315841491564283-S05-0-dcd150a5967231a.
The fact that the error is in creating a directory feels possibly significant. When this job ultimately succeeded (using the persistent bucket for tmp_storage_location), we found a set of .recordio files in that directory. I have not previously seen these files (or their associated directory) in tmp buckets for other dataflow jobs, but admittedly I have also not looked too closely at what was in the tmp bucket.
I wonder if there is something specific about this dataflow job which prompted dataflow to create this directory of .recordio files, and in fact the issue is that creating an empty directory (which would subsequently be populated by these files) in a non-persistent bucket is what is raising the error? And we simply haven't hit it before because out CI pipeline doesn't trigger creation of these files?
Plot twist! Dataflow appears to delete this directory by the time the job is complete, whereas other objects in the tmp directory persist after job completion.
@cisaacstern and I just debugged a jobsubmission over here. It turns out that the job would fail if the job writes to a non-persistent bucket:
The bucket in question was
gs://leap-scratch
(set up by 2i2c for the leap-stc org) and the error was:Changing both temp_storage_location and the cache location to the persistent bucket fixed this issue.
Weird things 🤪
The text was updated successfully, but these errors were encountered: