Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script and Container jobs failing with: The resource 'projects/<ProjectNumber>/global/networks/default' was not found #27

Open
rmharrison opened this issue Mar 7, 2023 · 2 comments

Comments

@rmharrison
Copy link

I ran both busybox (Container Job) and transcoding (Script Job). Using both the sample scripts and web console.

This error does not exist in Troubleshooting.

My best guess...

  1. Batch assumes the default network
  2. I don't have a default network set

==> How do I set my default network?
I have an existing network interface that I've used without incident for a manually provisioned Compute Engine VM instance.

Busybox, Script

# gcloud beta batch jobs describe job-busybox-9172 --location=us-central1
...
status:
  runDuration: 0s
  state: FAILED
  statusEvents:
  - description: Job state is set from QUEUED to SCHEDULED for job projects/<ProjectNumber>/locations/us-central1/jobs/job-busybox-9172.
    eventTime: '2023-03-07T22:04:48.589892441Z'
    type: STATUS_CHANGED
  - description: "Job gets no longer retryable information Batch Error: code - CODE_GCE_RESOURCE_NOT_FOUND,\
      \ description - googleapi: Error 404: The resource 'projects/<ProjectNumber>/global/networks/default'\
      \ was not found, notFound, already retried 3 times, errors record CODE_GCE_RESOURCE_NOT_FOUND."
    eventTime: '2023-03-07T22:08:21.906612928Z'
    type: OPERATIONAL_INFO
  - description: Job state is set from SCHEDULED to SCHEDULED_PENDING_FAILED for job
      projects/<ProjectNumber>/locations/us-central1/jobs/job-busybox-9172.
    eventTime: '2023-03-07T22:08:22.671616487Z'
    type: STATUS_CHANGED
  - description: Job state is set from SCHEDULED_PENDING_FAILED to FAILED for job
      projects/<ProjectNumber>/locations/us-central1/jobs/job-busybox-9172.
    eventTime: '2023-03-07T22:08:23.845508708Z'
    type: STATUS_CHANGED

Transcoding, Web Console

# gcloud beta batch jobs describe transcode-manual --location=us-central1
...
status:
  runDuration: 0s
  state: FAILED
  statusEvents:
  - description: Job state is set from QUEUED to SCHEDULED for job projects/<ProjectNumber>/locations/us-central1/jobs/transcode-manual.
    eventTime: '2023-03-07T21:55:45.825619439Z'
    type: STATUS_CHANGED
  - description: "Job gets no longer retryable information Batch Error: code - CODE_GCE_RESOURCE_NOT_FOUND,\
      \ description - googleapi: Error 404: The resource 'projects/<ProjectNumber>/global/networks/default'\
      \ was not found, notFound, already retried 3 times, errors record CODE_GCE_RESOURCE_NOT_FOUND."
    eventTime: '2023-03-07T21:59:22.741203703Z'
    type: OPERATIONAL_INFO
  - description: Job state is set from SCHEDULED to SCHEDULED_PENDING_FAILED for job
      projects/<ProjectNumber>/locations/us-central1/jobs/transcode-manual.
    eventTime: '2023-03-07T21:59:23.388154925Z'
    type: STATUS_CHANGED
  - description: Job state is set from SCHEDULED_PENDING_FAILED to FAILED for job
      projects/<ProjectNumber>/locations/us-central1/jobs/transcode-manual.
    eventTime: '2023-03-07T21:59:24.291823883Z'
    type: STATUS_CHANGED
@rmharrison
Copy link
Author

Workaround using custom VM Instance Template (Transcoding example)

GCP Batch has instructions to use a custom VM instance template

I created an Instance Template via the web console.
Selecting my existing Network interface under "Advanced options" > "Networking" > "Network interfaces"
instance-template-redacted

Modified job.json to use the instanceTemplate instead of default policy

...
  "allocationPolicy": {
    "instances": [
      {
        "instanceTemplate": "[instance-template-created-in-console]"
      }
    ]
  },
...

I also had to modify transcode.sh

vopts="-c:v libvpx-vp9 -b:v 1800k -minrate 1500 -maxrate 1610"

Quotes around the options
Because all of my instances failed with
"2023-03-07 17:54:35.356 EST /mnt/share/transcode.sh: line 26: libvpx-vp9: command not found"

@rmharrison
Copy link
Author

Add to troubleshooting, because this was somewhat gnarly to resolve?

Root cause fix

For this error in your GCP Log Explorer "Query results"

The resource 'projects/[PROJECT_NUMBER]/global/networks/default' was not found

The project [PROJECT_NUMBER] did not have a default VPC created at project creation. This can occur in centrally managed enterprise accounts where an enterprise administrator uses a global default for the organization instead of project-specific defaults.

See also:

There doesn't seem to be a way to restore the actual "default", as it created only at project creation.
See: https://stackoverflow.com/questions/45789502/restore-google-cloud-default-network

However, you can resolve by manually creating a VPC named "default".

Briefly, from the GCP VPC Console

  1. Click "Create VPC Network" at near the top of the screen
  2. Set the "Name" to "default"
  3. Set "Subnets > Subnet creation mode" to "Automatic"
  4. Everything else should be default

Complete instructions for creating a VPC: https://cloud.google.com/vpc/docs/create-modify-vpc-networks#create-auto-network

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant