Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter subdir out of filename when finding .conda component files #83

Closed
wants to merge 5 commits into from

Conversation

msarahan
Copy link

@msarahan msarahan commented Jun 5, 2024

Description

Closes conda/conda-package-handling#230

This just makes the component finder a tiny bit smarter in that it ignores common platform patterns when finding these files.

The extracted folder still has these prefixes present, matching the input filename. If you'd prefer that the prefix be stripped from that as well, I can look into it.

I did look at writing a test for this, but it seems like this project doesn't contain testing tarballs/.conda packages, relying instead on using what is already present in the local package cache. If you'd like me to include a test, would it be OK to include a package file in this repo's test folder?

@msarahan msarahan changed the title filter platform out of filename when finding .conda component files filter subdir out of filename when finding .conda component files Jun 5, 2024
@dholth
Copy link
Contributor

dholth commented Jun 5, 2024

https://github.com/conda/conda-index/tree/main/tests/archives has some archives that have been stripped of their package data (so they are nice and small) similar ones in conda-build. The script to make these from normal "has data" archives is simple but doesn't appear to be in the repository.

@msarahan
Copy link
Author

msarahan commented Jun 5, 2024

I think I can just copy and rename files that exist in the package cache. I'll work on that.

@dholth
Copy link
Contributor

dholth commented Jun 5, 2024

As an Anaconda employee I've looked into editing anaconda.org to not include the prefix, it was not entirely clear how to do so. (It would have been string-based)

@msarahan
Copy link
Author

msarahan commented Jun 5, 2024

Thanks for looking into it! It is an annoying little detail, but hopefully this hacky workaround will be good enough. As I said, if it makes sense to filter the subdir out of the extracted directory name, I can do that too. I kind of think that would happen in conda-package-handling though. And I also think it's more intuitive to not do that filtering, because matching the filename to the extracted folder name is pretty well-established behavior.

@@ -125,6 +126,9 @@ def stream_conda_component(

zf = zipfile.ZipFile(fileobj or filename)
file_id, _, _ = os.path.basename(filename).rpartition(".")
# this substitution compensates for web downloads from anaconda.org having
# the platform as a prefix
file_id = re.sub("^(osx|linux|win|noarch)(-.+?)?_", "", file_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we don't have access to DEFAULT_SUBDIRS here, since we don't depend on conda.

@dholth
Copy link
Contributor

dholth commented Jun 5, 2024

Another way to address this issue would be in conda-package-handling's _extract function

It would need to remove the subdir prefix from filename before passing it to stream_conda_component.

@dholth
Copy link
Contributor

dholth commented Jun 5, 2024

Here's my take on it as a conda-package-streaming change.

@msarahan
Copy link
Author

msarahan commented Jun 6, 2024

Yeah, I like that endswith instead of startswith approach. The original idea with the conda format was to not necessarily be specific to .zst files, but I think that falls into YAGNI. Would it be worth an extra step to also try to strip off file extension?

@dholth
Copy link
Contributor

dholth commented Jun 6, 2024

No, I read that part of the conda specification as being enthusiastic about libarchive. Which is fine but we aren't using libarchive anymore.

@msarahan
Copy link
Author

msarahan commented Jun 6, 2024

superseded by #85; closing

@msarahan msarahan closed this Jun 6, 2024
@msarahan msarahan deleted the component_name_contains branch June 6, 2024 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Packages downloaded from anaconda.org have unsupported filenames?
3 participants