Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with windows runner - windows-latest #141

Open
sophia-guo opened this issue Oct 6, 2022 · 10 comments
Open

Issues with windows runner - windows-latest #141

sophia-guo opened this issue Oct 6, 2022 · 10 comments

Comments

@sophia-guo
Copy link
Contributor

windows-2016 is not supported any more. We'd like to update to windows-latests.

The issue with windows-latests:

  • Originally openjdk, system tests will fail at the beginning part of the build with message C:\cygwin64\bin\make.exe' failed with exit code 2 with all jdk_version , which suggests it might be an issue related with cygwin on windows-latest.
make[1]: *** [compile.mk:45: compile] Error 1
make[1]: Leaving directory '/cygdrive/d/a/run-aqa/run-aqa/aqa-tests/TKG'
make: *** [makefile:67: compile] Error 2
Error: The process 'C:\cygwin64\bin\make.exe' failed with exit code 2

https://github.com/adoptium/run-aqa/actions/runs/3016813951/jobs/4849941896

C:\cygwin64\bin\git.exe clone --depth 1 -q --reference-if-able D:\a\run-aqa\run-aqa/openjdk_cache https://github.com/adoptium/jdk11.git openjdk-jdk
info: Could not add alternate for 'D:\a\run-aqa\run-aqa/openjdk_cache': reference repository 'D:\a\run-aqa\run-aqa/openjdk_cache' is not a local repository.
Username for 'https://github.com/': 
Error: The operation was canceled.

Feels like it's not a cygwin with windows-latest issue any more.

@sophia-guo
Copy link
Contributor Author

The current issue only happened to jdk8 and jdk11, which don't have the repo https://github.com/adoptium/jdk8 or https://github.com/adoptium/jdk11. jdk17 does have the repo https://github.com/adoptium/jdk17.

Need to see the step of C:\cygwin64\bin\git.exe clone --depth 1 -q --reference-if-able D:\a\run-aqa\run-aqa/openjdk_cache https://github.com/adoptium/jdk8.git openjdk-jdk

@sophia-guo
Copy link
Contributor Author

sophia-guo commented Oct 19, 2022

Current failure was caused by the workaround to git clone system and openjdk material, which is git clone jdk with wrong repo https://github.com/adoptium/jdk${version}.git.

run-aqa/src/runaqa.ts

Lines 290 to 294 in fa4002f

if (buildList === 'openjdk' && version != '') {
process.chdir('openjdk')
// Shallow clone the adoptium JDK version - quietly - if there is a reference repo obtain objects from there - destination is openjdk-jdk
await exec.exec(`git clone --depth 1 -q --reference-if-able ${process.env.GITHUB_WORKSPACE}/openjdk_cache https://github.com/adoptium/jdk${version}.git openjdk-jdk`)
process.chdir('../')

Update https://github.com/adoptium/jdk${version}.git to https://github.com/adoptium/jdk${version}u.git can fix this issue. However the workaround means run-aqa will only work with master branch, openjdk, aqa-systemtest, STF cann't be customized or specified.

@sophia-guo
Copy link
Contributor Author

sophia-guo commented Oct 20, 2022

Remove the workaround the failure would be the same as https://github.com/adoptium/run-aqa/actions/runs/3016813951/jobs/4849941896. job failed as no bash available. Github runner windows-latest has [Bash 5.1.16(1)-release] installed (https://github.com/actions/runner-images/blob/main/images/win/Windows2022-Readme.md). With Cygwin running bash it looks good C:\cygwin64\bin\bash.exe ./get.sh --sdk_resource github-hosted --sdkdir D:\a\run-aqa\run-aqa

The problem happened when:

The error message suggest:

	Windows Subsystem for Linux has no installed distributions.
	
	Distributions can be installed by visiting the Microsoft Store:
	
	https://aka.ms/wslstore

This only happened to github runner, which installs cygwin to runner and edit the PATH environment variable to add the Cygwin.

Jenkins windows agent in adoptium are also installed cygwin and run the tests with cygwin, but didn't have the same issue.

@ultramancoder
Copy link
Contributor

ultramancoder commented Nov 3, 2022

Hi! According to my (limited) understanding, ProcessBuilder passes the command as bash. Internally that'll invoke CreateProcess apis of windows which resolve a command without absolute path using logic here: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa#parameters . As we can see cygwin dirs or the PATH is not searched to resolve the command to run.

However, when running directly in actions or usings actions/exec library, it calls https://github.com/actions/toolkit/blob/b36e70495fbee083eb20f600eafa9091d832577d/packages/io/src/io.ts#L231 internally. This function searches the path manually for trying to find tool in path so it works when invoking without java/ant.

If this understanding is correct, then either the process builder code in TKG will need to be updated to do the same kind of resolving the command from the path or alternatively the full path to bash executable will need to be defined.

I am not sure why it works in jenkins but if you could point me to where that is setup, I would like to look into it and figure out why.

@smlambert
Copy link
Contributor

Our Jenkins nodes are setup using Ansible playbooks (found in the infrastructure repository https://github.com/adoptium/infrastructure/blob/master/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/main.yml).

@ultramancoder
Copy link
Contributor

Thanks for the link, @smlambert.

Another thing I just noticed is that the functional tests do pass on windows. For instance see, https://github.com/adoptium/run-aqa/actions/runs/3380432551/jobs/5613604498.

Note that these logs have the same WSL error message mentioned earlier in this issue and the one we can also see in system tests that fail on windows-latest. Therefore, I think the WSL error may not be the cause of failures. I'll try to reproduce the failure locally, maybe that helps in getting more insight into the issue.

@smlambert
Copy link
Contributor

thanks for digging into it @ultramancoder !

@ultramancoder
Copy link
Contributor

Hi again! I looked at https://github.com/adoptium/infrastructure/blob/master/ansible/inventory.yml to find what windows server adoptium's temurin builds were using.

Github's windows-latest points to Windows 2022 server. In adoptium's windows build farm, there are mostly Windows 2012 servers, a few Windows 2016 Servers and 1 Windows 2019 server. Since there are no Windows 2022 servers there, this issue probably never showed up there.

@ultramancoder
Copy link
Contributor

I also looked into reproducing the errors locally and it does fail there as well. I modified the makefiles to change ant to ant -v to get more verbose logs. Where it failed for me was https://github.com/adoptium/aqa-tests/blob/66436e6253ac9dd65a9fbc6cd5ad0e258df34ed4/external/build_image.sh#L69. docker was unable to recognise /cygdrive paths, passing the file paths through cygpath -w made the script work and progress the build but it later failed due to other local issues on PC. As of now, I am not sure if this docker issue is also the one happening in CI. Will need more investigation.

Might I suggest to stick with Windows 2016 Runners till the adoptium build farm's have a 2022 server?

@ultramancoder
Copy link
Contributor

Hi! One more interesting development, some checks pass on windows-latest when impl is switched to hotspot instead of openj9 in the workflow. https://github.com/ultramancoder/run-aqa/actions/runs/3434222084/jobs/5725336952 . The ones that fail, fail on all platforms with an error about test groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants