-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bandersnatch size_project_metadata
plugin casuses some packages to not sync - e.g. pip + falcon
#1169
Comments
Will try and look into this over the weekend and see if I can reproduce ... |
Thanks for the help. |
So I was able to repro with using the Debug run with plugin enabled:
So I disabled the plugin falcon downloaded fine. cmd: Full repro commands
So we'd need to add more debugging info into the plugin code + plugin calling code to see what exactly is making it skip this package as a whole. Fixes welcome, I'm low on time to dig in and fix this plugin. As plugins are optional, I generally rely on contributions for them. I focus more on making core bandersnatch function (as I don't use bandersnatch + haven't for years and would really love to get a new maintainer) |
Thank you so much for the help guys. I am going to switch gears and put
the larger packages from the pystats in my config and go from there.
Thanks you so much for the support.
God bless open source!
…On Sat, Aug 6, 2022, 9:57 PM Cooper Lees ***@***.***> wrote:
So I was able to repro with using the size_project_metadata plugin ... So
the bug is in there ...
Debug run with plugin enabled:
crl-m1:~ cooper$ /tmp/tb/bin/bandersnatch -c /tmp/pypi/bandersnatch.conf --debug sync falcon 2>&1 | tee /tmp/bander_sync_falcon_debug
2022-08-06 18:50:11,894 DEBUG: Checking config for storage backend... (configuration.py:121)
2022-08-06 18:50:11,894 DEBUG: Found storage backend in config! (configuration.py:123)
2022-08-06 18:50:11,895 INFO: Selected storage backend: filesystem (configuration.py:129)
2022-08-06 18:50:11,895 DEBUG: Checking config for compare method... (configuration.py:161)
2022-08-06 18:50:11,895 DEBUG: Found compare method in config! (configuration.py:163)
2022-08-06 18:50:11,895 INFO: Selected compare method: hash (configuration.py:175)
2022-08-06 18:50:11,895 DEBUG: Checking config for alternative download mirror... (configuration.py:178)
2022-08-06 18:50:11,895 DEBUG: No alternative download mirror found in config. (configuration.py:183)
2022-08-06 18:50:11,895 DEBUG: Skip checking download-mirror-no-fallback because dependent optionis not set in config. (configuration.py:203)
2022-08-06 18:50:11,950 DEBUG: Initializing Master's aiohttp ClientSession (master.py:79)
2022-08-06 18:50:11,977 INFO: Initialized metadata plugin size_project_metadata to block projects > 104857600 bytes (metadata_filter.py:232)
2022-08-06 18:50:11,983 DEBUG: Adding json directories to bootstrap (mirror.py:536)
2022-08-06 18:50:11,983 INFO: Setting up mirror directory: /tmp/pypi/web/simple (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/packages (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/local-stats/days (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/json (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/pypi (mirror.py:546)
2022-08-06 18:50:11,984 DEBUG: Retrieving FileLock instance @ /tmp/pypi/.lock (filesystem.py:36)
2022-08-06 18:50:11,984 DEBUG: Acquiring FLock with timeout: 1 (mirror.py:551)
2022-08-06 18:50:11,984 INFO: Generation file missing. Reinitialising status files. (mirror.py:586)
2022-08-06 18:50:11,985 DEBUG: Modifying destination: /tmp/pypi/generation with: /tmp/pypi/generation.m6ggg53h (filesystem.py:122)
2022-08-06 18:50:11,985 INFO: Status file /tmp/pypi/status missing. Starting over. (mirror.py:608)
2022-08-06 18:50:11,985 INFO: Syncing with https://pypi.org. (mirror.py:59)
2022-08-06 18:50:11,985 INFO: No release filters are enabled. Skipping release filtering (mirror.py:80)
2022-08-06 18:50:11,985 INFO: No release file filters are enabled. Skipping release file filtering (mirror.py:82)
2022-08-06 18:50:11,985 DEBUG: Package syncer 0 started for duty (mirror.py:127)
2022-08-06 18:50:11,985 INFO: Fetching metadata for package: falcon (serial 0) (package.py:58)
2022-08-06 18:50:11,985 DEBUG: Getting /pypi/falcon/json (serial 0) (master.py:146)
2022-08-06 18:50:12,005 DEBUG: Package syncer 1 started for duty (mirror.py:127)
2022-08-06 18:50:12,005 DEBUG: Package syncer 1 emptied queue (mirror.py:134)
2022-08-06 18:50:12,005 DEBUG: Package syncer 2 started for duty (mirror.py:127)
2022-08-06 18:50:12,005 DEBUG: Package syncer 2 emptied queue (mirror.py:134)
2022-08-06 18:50:12,307 DEBUG: Package syncer 0 emptied queue (mirror.py:134)
2022-08-06 18:50:12,307 INFO: Generating global index page. (mirror.py:486)
2022-08-06 18:50:12,308 DEBUG: Writing temporary file /tmp/pypi/web/simple/.index.html.x64odmpl to target destination: /tmp/pypi/web/simple/index.html (filesystem.py:93)
2022-08-06 18:50:12,308 DEBUG: Closing Master's aiohttp ClientSession and waiting 0.1 seconds (master.py:99)
2022-08-06 18:50:12,410 INFO: 0 packages had changes (mirror.py:1051)
2022-08-06 18:50:12,410 INFO: Writing diff file to /tmp/pypi/mirrored-files (mirror.py:1061)
So I disabled the plugin falcon downloaded fine. cmd: /tmp/tb/bin/bandersnatch
-c /tmp/pypi/bandersnatch.conf --debug sync falcon
Full repro commands
mkdir /tmp/pypi
vim /tmp/pypi/bandersnatch.conf
- Changed dirs to be based out of /tmp/pypi
python3.10 -m venv /tmp/tb --upgrade-deps
/tmp/tb/bin/pip install bandersnatch==5.2.0
So we'd need to add more debugging info into the plugin code + plugin
calling code to see what exactly is making it skip this package as a whole.
Fixes welcome, I'm low on time to dig in and fix this plugin. As plugins
are optional, I generally rely on contributions for them. I focus more on
making core bandersnatch function (as I don't use bandersnatch + haven't
for years and would really love to get a new maintainer)
—
Reply to this email directly, view it on GitHub
<#1169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACYQTLQKZJPSIASHFPAWGILVX4JZNANCNFSM55WZWLUQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
size_project_metadata
plugin casuses some packages to not sync - e.g. pip + falcon
Took your advice and used the pypistats tool to generate a list of large projects. Seems to be working great. It would seem that that plugin looks at the size of all files in the package and if sum of all bytes of all versions in a package is greater than what you specify; it doesn't grab any of them. I was thinking it was .whl | .tar.gz individually because I did see a few pip .whl files that were >100MB so I used that number thinking it was a sane Maximum. The plugin recommended 1GB but even that been blocking the packages I did want. Anyways, Thanks again for the help and keep up the good work. You guys Rock! |
Thanks for digging in and explaining why things happened. |
I think we should advertise that we'd love a fix for the size_project_metadata plugin + it's a known issue. |
Is there an error? It does what it says. I just misunderstood the way it
worked. I was thinking individual file sizes instead of entire project
size.
…On Tue, Aug 16, 2022, 12:01 PM Cooper Lees ***@***.***> wrote:
I think we should advertise that we'd love a fix for the
*size_project_metadata* plugin + it's a known issue.
—
Reply to this email directly, view it on GitHub
<#1169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACYQTLTTEUGUJ27UWKZHXS3VZO3NTANCNFSM55WZWLUQ>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
O, so it SUMs() the whole project. I'll check if I can make documentation clearer than :) Cause I didn't get that from reading it either or missed it. Thanks for clearing that up too. |
Yeah definitely since you thought the same as I did and you are one of the
contributers.
Though having a plugin to blacklist/whitelist individual file sizes would
be handy though.
…On Tue, Aug 16, 2022, 12:28 PM Cooper Lees ***@***.***> wrote:
O, so it SUMs() the whole project. I'll check if I can make documentation
clearer than :) Cause I didn't get that from reading it either or missed
it. Thanks for clearing that up too.
—
Reply to this email directly, view it on GitHub
<#1169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACYQTLXSJOHYYDSSFLVMST3VZO6T5ANCNFSM55WZWLUQ>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
I want to begin with, I'm pretty sure this is a user error thing but can't figure out what I'm doing wrong on this. It is not obvious whatever is causing it and bandersnatch is not very helpful in identifying the issue. Thanks in advance for any support to fix this. I have been screwing with this for over 2 weeks now and almost done with all of this.
I am trying to create a complete offline pip repo and it seems like it is working but of course, out of thousands of packages that are online, two are not being updated; specifically pip, and falcon
I see '^pip\ " and "^falcon\ " names and many other files in the "todo" file after
bandersnatch mirror --force-check
runs.If I try to run
bandersnatch sync falcon
falcon is still not present inpip/pypi/web/simple/falcon
I recently turned on
json = true
and reranbandersnatch mirror --force-check
it created the json folder which does not contain the falcon or pip file?I am currently running
bandersnatch verify
now that I have a json folder which I guess will take a few days to finish so unfortunately I can't runbandersnatch sync --debug falcon
. From my memory the only thing that seemed different while running it with --debug is it mention filter rules; filter and file filter. Definitely nothing about how it couldn't download anything. It seems to think the files were already downloaded?Specs:
bandersnatch 5.2.0
OS: ubuntu 20.04
syncing to external ext4 drive
Config:
'''
[plugins]
enabled =
size_project_metadata
[size_project_metadata]
max_package_size = 100M
[mirror]
directory = /media/user/ExternalEXT4/pip/pypi
json = true
release-files = true
cleanup = false
master = https://pypi.org
timeout = 10
global-timeout = 1800
workers = 3
hash-index = false
stop-on-error = false
storage-backend = filesystem
verifiers = 3
compare-method = hash
diff-file = /media/user/ExternalEXT4/pip/pypi/mirrored-files
'''
The text was updated successfully, but these errors were encountered: