Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze and Fix Failures for Vega Nodes in CI #2322

Open
4 tasks
TedThemistokleous opened this issue Oct 12, 2023 · 1 comment
Open
4 tasks

Analyze and Fix Failures for Vega Nodes in CI #2322

TedThemistokleous opened this issue Oct 12, 2023 · 1 comment
Labels
bug Something isn't working Continous Integration Pull request updates parts of continous integration pipeline

Comments

@TedThemistokleous
Copy link
Collaborator

TedThemistokleous commented Oct 12, 2023

We've seen inconsistent failures in CI when using Vega nodes (cdna which includes Vega, and MI cards) in our Jenkins file.

So far we've seen example of the following

We need to do the following

  • Capture the failure case
  • investigate why we're getting floating point errors
  • Fix the changes that cause this error (MIGraphX/External)
  • Enable the use of Vega Nodes again in our Jenkinsfile (mi100+ -> cnda) for pipeline runs
@TedThemistokleous TedThemistokleous added bug Something isn't working Continous Integration Pull request updates parts of continous integration pipeline labels Oct 12, 2023
@TedThemistokleous
Copy link
Collaborator Author

@causten @umangyadav we can throw any odd memory errors we see in this. Don't know who to assign to right now but we should tackle this one as a team as this is inconsistent between failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Continous Integration Pull request updates parts of continous integration pipeline
Projects
None yet
Development

No branches or pull requests

1 participant