Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tt-metalium #28854

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Adding tt-metalium #28854

wants to merge 29 commits into from

Conversation

blozano-tt
Copy link

@blozano-tt blozano-tt commented Jan 17, 2025

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Package does not ship static libraries. If static libraries are needed, follow CFEP-18.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Hi! This is the staged-recipes linter and I found some lint.

File-specific lints and/or hints:

  • recipes/tt-metalium/meta.yaml:
    • lints:
      • The following maintainers have not yet confirmed that they are willing to be listed here: afuller-TT. Please ask them to comment on this PR if they are.

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Jan 17, 2025

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/tt-metalium/meta.yaml) and found some lint.

Here's what I've got...

For recipes/tt-metalium/meta.yaml:

  • ❌ The source section contained an unexpected subsection name. git_submodules is not a valid subsection name.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12927119653. Examine the logs at this URL for more detail.

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Jan 23, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/tt-metalium/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipes/tt-metalium/meta.yaml:

  • ℹ️ Please depend on pytorch directly, in order to avoid forcing CUDA users to downgrade to the CPU version for no reason.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12944803893. Examine the logs at this URL for more detail.

@blozano-tt
Copy link
Author

blozano-tt commented Jan 23, 2025

@h-vetinari would you be willing to review this? (Pretty please)

The C++ code builds.

I still need to iron out the python wheel generation and installation, but I believe all pieces are in place (just might be missing some python module dependencies).

extra:
recipe-maintainers:
- blozano-tt
- afuller-TT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm I am willing to be listed here. ✅

version: {{ version }}

source:
- url: https://github.com/tenstorrent/tt-metal/releases/download/v{{ version }}-rc13/tt-metalium.tar.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is using an RC (rc13)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am putting a lot of fixes into main to support this effort. I needed a very recent version of the code. We upload release candidates daily. We cut releases very rarely.

@jakirkham
Copy link
Member

@conda-forge-admin , please relint

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/tt-metalium/meta.yaml) and found it was in an excellent condition.

@h-vetinari
Copy link
Member

As a meta-comment, I note the -tt in both your handles, and was looking around the tenstorrent GH org a bit. I'm not 100% sure what setup you're aiming for, but if you have some time/funding for this, you might be interested to pursue conda-forge/conda-forge.github.io#1744. Bootstrapping a new architecture is a substantial amount of work, but OTOH you save yourself the hassles of emulation or other cross-platform shenanigans.

@blozano-tt
Copy link
Author

As a meta-comment, I note the -tt in both your handles, and was looking around the tenstorrent GH org a bit. I'm not 100% sure what setup you're aiming for, but if you have some time/funding for this, you might be interested to pursue conda-forge/conda-forge.github.io#1744. Bootstrapping a new architecture is a substantial amount of work, but OTOH you save yourself the hassles of emulation or other cross-platform shenanigans.

I'll keep it in mind!

The common scenario we currently have is x86_64 host CPU that does all the orchestrating, and one or more accelerator chips hanging off the PCIe interface. (Reductionist) The accelerators are arrays of RISCV cores that run data movement/compute kernels. So we don't want Anaconda on the device.

However, I have heard of a few efforts where the host is desired to be RISCV, and in that case this could be of interest.

- python
- numactl
- libhwloc
- libzlib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally you shouldn't add libzlib at runtime, but put zlib into host. Your commit message in ed3b681 says "RISCV compiler binaries" - where are these coming from and how do they come into play?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/tenstorrent/tt-metal/blob/8b2c6cd16a27bd54675a018dc8a66013af3181e3/tt_metal/hw/CMakeLists.txt#L34

Released binaries are fetched At tt-metalium build time. They are used to compile some static riscv firmware.
They are used at runtime to jit compile kernel code that is later launched on riscv accelerators.

The compiler binaries are dynamically linking some things.

I realize after you highlight it, that this is not ideal for conda packaging.

Source is here:

https://github.com/tenstorrent/sfpi

I guess I could ask them to static link.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I’d like to avoid creating a conda package for that repo…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Released binaries are fetched [at] build time. They are used to compile some static riscv firmware.
They are used at runtime to jit compile kernel code that is later launched on riscv accelerators.

Given that we don't have riscv support at the moment, I'd say this is probably alright for now, but still CC @conda-forge/core if someone has other opinions.

Copy link
Contributor

Hi! This is the staged-recipes linter and your PR looks excellent! 🚀

@blozano-tt
Copy link
Author

blozano-tt commented Jan 25, 2025

@h-vetinari I finally got the recipe building locally. I had to adjust Python packaging in setup.py. I removed the Python version constraint so that all versions are built. Now I see that the conda recipe build is killed because of how long it takes to build. It takes 2.5 hours to build for a single version of Python on the Azure infra. (I think it’s using a single core. It usually takes 5 minutes on our powerful build machines with many cores and copious ram.

What are my options here?

Limit the number of supported Python versions?

Cleanup the C++ code to at least halve the build time?

Adjust the recipe to ingest a prebuilt artifact?

That’s all I can think of.

@h-vetinari
Copy link
Member

The best approach is to only build for one python version in staged recipes for now (by adding skip: true # [py!=310] to the recipe, with a comment), and once the feedstock is created, we'll remove that skip and conda-smithy will generate separate jobs for the different python versions, so we won't be running into the timeout issues.

@blozano-tt
Copy link
Author

Things are now in good shape.

I have two action items left.

  • confirm all necessary files are packaged in site-packages after recipe build
  • switch recipe back to using tarball source

Will update when completed

@h-vetinari
Copy link
Member

confirm all necessary files are packaged in site-packages after recipe build

At the bottom of the logs there's a fold out section called Inspecting artifacts where you can see everything that ends up in the final package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants