-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting list of all CPAN package names is broken #1961
Labels
Comments
Perhaps this will help. I updated the links for the #!/bin/bash
COUNT="5000"
OUTPUT_FILE="/tmp/metacpan-dists.jsonl"
echo "Request: 1";
JSON0="$(curl -s "https://fastapi.metacpan.org/v1/release/_search?scroll=1m&size=$COUNT&q=status:latest&fields=distribution")";
TOTAL=$( echo $JSON0 | jq '.hits.total' )
echo "Total dists: $TOTAL"
REQUESTS_N=$(( ( $TOTAL + $COUNT - 1 )/$COUNT ))
echo "Will make $REQUESTS_N requests total";
SCROLL_ID=$(echo $JSON0 | jq -r '._scroll_id');
echo $JSON0 | jq '.hits.hits | .[].fields.distribution' > $OUTPUT_FILE
for i in $( seq 2 $REQUESTS_N ); do
echo "Request: $i";
JSON="$(curl -s -XPOST 'https://fastapi.metacpan.org/v1/_search/scroll?scroll=1m' -d $SCROLL_ID)";
SCROLL_ID=$(echo $JSON | jq -r '._scroll_id');
echo $JSON | jq '.hits.hits | .[].fields.distribution | .[]' >> $OUTPUT_FILE;
done
sort -u $OUTPUT_FILE | wc -l |
zmughal
added a commit
to zmughal/libraries.io
that referenced
this issue
May 18, 2021
This uses the ElasticSearch scroll API to get all CPAN distributions <https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-scroll.html>. Fixes <librariesio#1961>.
zmughal
added a commit
to zmughal/libraries.io
that referenced
this issue
May 18, 2021
This uses the ElasticSearch scroll API to get all CPAN distributions <https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-scroll.html>. Fixes <librariesio#1961>.
zmughal
added a commit
to zmughal/libraries.io
that referenced
this issue
May 18, 2021
Reverts <librariesio@9c6d0f9>. Connects with <librariesio#1961>.
zmughal
added a commit
to zmughal/libraries.io
that referenced
this issue
May 18, 2021
This reverts commit 9c6d0f9. Connects with <librariesio#1961>.
zmughal
added a commit
to zmughal/libraries.io
that referenced
this issue
May 30, 2021
This uses the ElasticSearch scroll API to get all CPAN distributions <https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-scroll.html>. Fixes <librariesio#1961>.
zmughal
added a commit
to zmughal/libraries.io
that referenced
this issue
May 30, 2021
This reverts commit 9c6d0f9. Connects with <librariesio#1961>.
Definition of the money is the same as the best possible option for a good friend |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Paging through the CPAN releases API no longer works for results greater than 10,000
Code location: https://github.com/librariesio/libraries.io/blob/master/app/models/package_manager/cpan.rb#L17
Example url:
Error:
The docs suggest using the
scroll
api: https://github.com/metacpan/metacpan-api/blob/master/docs/API-docs.md#being-polite but the links to the docs are dead.More recent scroll api docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html but I couldn't seem to get it to accept
scroll_id
as a parameter:The text was updated successfully, but these errors were encountered: