Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow using sharded repodata #2467

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

baszalmstra
Copy link
Contributor

@baszalmstra baszalmstra commented Nov 12, 2024

Enables sharded repodata for specific channels.

Add:

[repodata-config."https://prefix.dev"]
disable-sharded = false

To your global config to not disable sharded repodata for all prefix channels. The double negation is there because in the future we might want to enable sharded repodata by default.

This PR is in draft because conda/rattler#910 needs to be merged first.

@ruben-arts is there a good place to add this to the documentation?

@ruben-arts
Copy link
Contributor

The documentation can go here: https://pixi.sh/latest/reference/pixi_configuration/

@ruben-arts
Copy link
Contributor

When I enable all logic I get an error on the normal conda-forge. I think rattler should just skip this and try the next option.
config.toml:

[repodata-config]
disable-jlap = false
disable-bzip2 = false
disable-zstd = false
disable-sharded = false
❯ pixi exec cowpy
ERROR error=could not find subdir 'noarch' in channel 'https://conda.anaconda.org/conda-forge/'
  × failed to get repodata
  ├─▶ could not find subdir 'noarch' in channel 'https://conda.anaconda.org/conda-forge/'
  ╰─▶ HTTP status client error (404 Not Found) for url (https://conda.anaconda.org/conda-forge/noarch/repodata_shards.msgpack.zst)

@baszalmstra
Copy link
Contributor Author

Yes correct you have to configure it on a per channel basis currently.

@ruben-arts
Copy link
Contributor

I would like to add this test, but currently the default and per_channel aren't merged.

#[test]
  fn test_repodata_config() {
      let toml = r#"
          [repodata-config]
          disable-jlap = true
          disable-bzip2 = true
          disable-zstd = true
          disable-sharded = true

          [repodata-config."https://prefix.dev/conda-forge"]
          disable-jlap = false
          disable-bzip2 = false
          disable-zstd = false
          disable-sharded = false

          [repodata-config."https://conda.anaconda.org/conda-forge"]
          disable-jlap = false
          disable-bzip2 = false
          disable-zstd = false
      "#;
      let (config, _) = Config::from_toml(toml).unwrap();
      let repodata_config = config.repodata_config();
      assert_eq!(repodata_config.default.disable_jlap, Some(true));
      assert_eq!(repodata_config.default.disable_bzip2, Some(true));
      assert_eq!(repodata_config.default.disable_zstd, Some(true));
      assert_eq!(repodata_config.default.disable_sharded, Some(true));

      let per_channel = repodata_config.clone().per_channel;
      assert_eq!(per_channel.len(), 2);

      let prefix_config = per_channel.get(&Url::from_str("https://prefix.dev/conda-forge").unwrap()).unwrap();
      assert_eq!(prefix_config.disable_jlap, Some(false));
      assert_eq!(prefix_config.disable_bzip2, Some(false));
      assert_eq!(prefix_config.disable_zstd, Some(false));
      assert_eq!(prefix_config.disable_sharded, Some(false));

      let anaconda_config = per_channel.get(&Url::from_str("https://conda.anaconda.org/conda-forge").unwrap()).unwrap();
      assert_eq!(anaconda_config.disable_jlap, Some(false));
      assert_eq!(anaconda_config.disable_bzip2, Some(false));
      assert_eq!(anaconda_config.disable_zstd, Some(false));
      // TODO: disable-sharded is not set so default should be used
      // assert_eq!(anaconda_config.disable_sharded, Some(true));
  }

So a user can't do this right now and expect repodata-config to be used by default for the prefix channel:

[repodata-config."https://prefix.dev/conda-forge"]
disable-sharded = true

[repodata-config]
disable-jlap = true
disable-bzip2 = false
disable-zstd = true
disable-sharded = false

@ruben-arts
Copy link
Contributor

Yes correct you have to configure it on a per channel basis currently.

Can't rattler just ignore this error? It's not helpful for the user and they also shouldn't care about how the repodata is fetched, as long as the fastest version is used when possible.

@wolfv
Copy link
Member

wolfv commented Nov 13, 2024

I like the added config flexibility. I think I would really like it if we also support setting things based on the "prefix" of the URL, so that you can configure it for a wider set of channels. E.g. adding the option to set

[repodata-config."https://prefix.dev/"]
disable-sharded = false

for all channels on prefix.dev – this is also how the mirror config works :)

@baszalmstra
Copy link
Contributor Author

@ruben-arts @wolfv all your concerns should be addressed now. Could you take another look?

@baszalmstra
Copy link
Contributor Author

Btw @ruben-arts I changed your code. Instead of merging after deserialization its merged when passing to rattler. I think the datastructure in the config should be kept as close to the source as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants