Skip to content

Commit

Permalink
Set openalex to retrieve 50 DOIs per batch
Browse files Browse the repository at this point in the history
  • Loading branch information
lwrubel committed Jul 5, 2024
1 parent 93d85f3 commit f4b7bef
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions rialto_airflow/harvest/openalex.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,13 @@ def publications_csv(dois: list, csv_file: str) -> None:
writer.writerow(pub)


def publications_from_dois(dois: list, batch_size=75):
def publications_from_dois(dois: list):
"""
Look up works by DOI in batches that fit within OpenAlex request size limits
"""
for doi_batch in batched(dois, batch_size):
for doi_batch in batched(dois, 50):
# Setting batch size to 50 to avoid 400 errors from OpenAlex API when GET query string is greater than 4096 characters
# Based on experimentation, 75 is too high. 50 is the default per_page size, so we could consider removing pagination in the future.
# TODO: do we need this to stay within 100,000 requests / day API quota?
time.sleep(1)

Expand Down

0 comments on commit f4b7bef

Please sign in to comment.