Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: pytorch v2.6.0 #326

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
Draft

WIP: pytorch v2.6.0 #326

wants to merge 14 commits into from

Conversation

h-vetinari
Copy link
Member

Build the release candidates

Linux CI cancelled until builds for #322 are live

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Jan 18, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12863672331. Examine the logs at this URL for more detail.

@h-vetinari
Copy link
Member Author

This looks better than expected so far. Still have to double check dependency changes. Happy if someone could do that (even if it's just noting which bounds changed relative to the current recipe)

recipe/meta.yaml Outdated
@@ -305,27 +288,28 @@ outputs:
- typing_extensions
- {{ pin_subpackage('libtorch', exact=True) }}
run:
- {{ pin_subpackage('libtorch', exact=True) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old syntax was to workaround the fact that for the non megabuilds each libtorch may have different hashes making it incompatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe something like that. Maybe you have addressed the core problem and this is the better way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for the megabuild, libtorch has a unique hash, because there's only one. It's pytorch itself that gets different hashes due to the different python versions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for the megabuild, libtorch has a unique hash

Not true. See https://anaconda.org/conda-forge/libtorch/files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously cuda and non-CUDA create different libtorch hashes, but within a megabuild it's unique, which is what matters for pinning it in pytorch.

Unless the idea was to allow mixing CUDA-enabled pytorch with non-CUDA libtorch, but I don't see the sense in that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more interesting for the blas_impl - pytorch could theoretically be independent of that (if all the blas calls go through libtorch), but also there we're already creating different pytorch hashes there due to the {{ pin_subpackage("libtorch", exact=True) }} in the host dependencies, so AFAICT we're not materially changing the various installations here, just making it impossible to install untested/unsupported combinations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-megabuilds, the idea was to allow libtorch from any of the builds (with the same features except python version) to work with any pytorch build. This way, you don't have to download a different libtorch for different python version. Note that this is for non-megabuilds only i.e. osx where we don't have CUDA builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense! Let me try to reflect that in the run-deps.

@h-vetinari
Copy link
Member Author

Aarch builds fail with

$SRC_DIR/third_party/XNNPACK/src/reference/unary-elementwise.cc:125:14: error: invalid 'static_cast' from type 'xnn_bfloat16' to type '_Float16'
  125 |       return static_cast<TOut>(x);
      |              ^~~~~~~~~~~~~~~~~~~~

@h-vetinari
Copy link
Member Author

Sigh, since when is conda-build applying patches through git rather than through patch? The former is stricter than the latter, and doesn't work in some situations (like applying a patch in one of the submodules):

Applying patch: /Users/runner/work/1/s/recipe/patches_submodules/0001-Fix-bazel-linux-aarch64-gcc13-workflow-and-resolve-a.patch
Applying: Fix `bazel-linux-aarch64-gcc13` workflow and resolve accompanying build errors.
error: sha1 information is lacking or useless (third_party/XNNPACK/src/reference/unary-elementwise.cc).
error: could not build fake ancestor

@h-vetinari h-vetinari force-pushed the 2.6 branch 2 times, most recently from bd0bec7 to 022f063 Compare January 20, 2025 08:03
otherwise conda breaks
```
conda_build.exceptions.RecipeError: Mismatching hashes in recipe. Exact pins in dependencies that contribute to the hash often cause this. Can you change one or more exact pins to version bound constraints?
Involved packages were:
Mismatching package: libtorch (id cpu_generic_habf3c96_0); dep: libtorch 2.6.0.rc7 *0; consumer package: pytorch
```
@danpetry
Copy link
Contributor

on osx-64

FAILED [0.0518s] test/test_nn.py::TestNN::test_batchnorm_nhwc_cpu - AssertionError: Tensor-likes are not close!

Mismatched elements: 1 / 8 (12.5%)
Greatest absolute difference: 1.430511474609375e-05 at index (0,) (up to 1e-05 allowed)
Greatest relative difference: 4.616721980710281e-06 at index (0,) (up to 1.3e-06 allowed)

skip?

@h-vetinari
Copy link
Member Author

h-vetinari commented Jan 23, 2025

Yeah, this minor accuracy violation indeed sounds skippable, but I've deprioritised this PR until we get the windows builds for 2.5 fixed (and ideally your #318 merged as well).

@danpetry
Copy link
Contributor

ok, good to know

@danpetry
Copy link
Contributor

worth pointing out that as of 6 days time, pypi will have an up to date pytorch package whereas conda won't. Will have a look at that other PR

@h-vetinari
Copy link
Member Author

worth pointing out that as of 6 days time, pypi will have an up to date pytorch package whereas conda won't.

Are you talking about rc's, or are we not looking at the same index? 2.6.0 GA hasn't been published AFAICT. Or are you saying that 2.6.0 will be released in 6 days?

In any case, this is no reason to rush. We didn't have windows packages for years, and I'm more concerned about fixing them, than lagging behind the PyPI release a bit (and we've often lagged for months in the past; this has gotten much better with the open-gpu server, but it still happens; 2.5.0 was released Oct 18th last year, we had first builds on Nov 3rd).

@danpetry
Copy link
Contributor

are you saying that 2.6.0 will be released in 6 days

yes

I'm more concerned about fixing them, than lagging behind the PyPI release a bit

100%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants