Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry search at current cursor on total:None #548

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hornc
Copy link
Contributor

@hornc hornc commented Aug 22, 2022

This is more of a workaround for an issue I've run into on search paging results. I'm not sure why the client is sporadically receiving an unexpected response -- it seems like a problem with the search API which might need investigation.

Getting the following error occasionally on ia search with largish result sets (eg: 92669) part way through:

Traceback (most recent call last):
  File "/home/charles/.local/bin/ia", line 8, in <module>
    sys.exit(main())
  File "/home/charles/.local/lib/python3.8/site-packages/internetarchive/cli/ia.py", line 171, in main
    sys.exit(ia_module.main(argv, session))
  File "/home/charles/.local/lib/python3.8/site-packages/internetarchive/cli/ia_search.py", line 100, in main
    for result in search:
  File "/home/charles/.local/lib/python3.8/site-packages/internetarchive/search.py", line 279, in __next__
    return self.iterator.__next__()
  File "/home/charles/.local/lib/python3.8/site-packages/internetarchive/search.py", line 153, in _scrape
    num_found = int(j['total'])
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Example API JSON response (from adding some debug code to capture the unexpected response):

UNEXPECTED RESPONSE: {'items': [], 'count': 0, 'previous': 'W3siaWRlbn....REDACTEDjustincase...X0tONjgzIn1d', 'total': None}

This was occurring sporadically for me over the weekend at 10K multiples on various queries from 30K to 90K num_results.

The query I was using is admittedly a bit complex:

ia search "mediatype:texts AND boxid:IA* AND scribe3_search_catalog:isbn AND identifier:t*" -f "scribe3_search_id" -p scope:all

But I have in the past run 'worse' ones with more clauses, fields, and expected results without problems :)

I haven't tested yet whether the scope:all contributes to the buggy behaviour. I started filtering on identifier first letter to reduce the size of the queries.

The code change here simply checks total before assuming it is an int, and retries the 10K limit request at the current cursor if total is None -- which seems an unexpected result. Current code looks to be expecting either 0 or a positive integer.

This was enough to let me complete a number of large search tasks.

Possible issues:

  • This could get stuck in a loop -- there's no give-up clause if the API decides to consistently return total: None
  • I only checked very briefly whether 0 search results avoid this retry code -- searching for non existent categories with 0 results does not get stuck in a loop, but there may be other cases which need checking?
  • total:None looks like an API bug which would be masked by this workaround without understanding of the root cause.

@hornc hornc marked this pull request as ready for review August 23, 2022 08:29
internetarchive/search.py Outdated Show resolved Hide resolved
@maxz
Copy link
Contributor

maxz commented Aug 27, 2022

I'm not a huge fan of this band-aid fix, especially when considering the possibility of the infinite None loop.
But without taking a deeper look at the pagination code I can't suggest a better solution either.

@jjjake
Copy link
Owner

jjjake commented Aug 30, 2022

Thanks for the PR @hornc, and thanks for taking a look, maxz!

I think we should retry a few times when we run into total: None, and then throw a ReadTimeout exception like we do here if all retries fail/return None for total. I think we should back off for 1 second on each retry, based on feedback from our search team. We should probably also expose these values (retry count, back off delay) via optional parameters.

@hornc I'm happy to make these changes if you'd like. I have a few other projects that need tending to, but hopefully I can get to it soon! Thanks again.

@hornc
Copy link
Contributor Author

hornc commented Aug 31, 2022

Thanks @jjjake , that sounds great!

@maxz, thanks for the review. I'll remove my comment line, but adding the back off etc and doing it properly will be the better solution. Happy to close this in favour of a complete solution from @jjjake.

@cclauss
Copy link
Contributor

cclauss commented Jan 7, 2023

@mekarpeles This was the behavior we were seeing on Friday.

@mekarpeles
Copy link
Contributor

First off, I want to ➕ 1 @jjjake's proposal of having some sort of retry w/ exponential backoff.

However, today the existing solution is worse than the PR that @hornc offers in that it throws exceptions up stream.

I would support merging @hornc's changes to unbreak code that relies on this functionality and then also support an effort which adds retries + backoff and warnings.

Copy link
Contributor

@mekarpeles mekarpeles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer exponential backoff as well, though this would love problems purl is seeing (we are currently swallowing the exceptions on our side)

@hornc
Copy link
Contributor Author

hornc commented May 16, 2023

This problem can still occur on v3.2.0 -- I had been using a older patched version with the fix in this PR daily (without exp. backoff, and without problems), but recently upgraded.

It's a sporadic issue, but I've seen it under v3.2.0. I've now upgraded to v3.5.0. If it occurs again I'll rebase this PR and see about adding a backoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants