-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP Error with Volume() #48
Comments
Are specific IDs always broken, or just sometimes? That one works fine for me in a colab notebook.
Also possibly include the end of the error trace? hard to see what the http error code is here. |
Hmm, odd. I second Ben's question: when you say 'this doesn't happen with all HathiTrust IDs, only some of them': are the ones that succeed or fail consistently, or will the same ID sometimes fail and sometimes succeed? That uses rsync with a subprocess, which is why the error catching is so poor. I suspect the file is failing to download but Python isn't catching it and still trying to open the volume. By the way, if you're just loading metadata, there's also the |
Oh yeah that's right this is kind of a painful way to get metadata. There is some data that Hathi only distributes through here, not the Hathifiles (e.g. LC classification) -- @melaniewalsh send me an e-mail if this is what you're looking for, I believe that I have some stuff about parsing this sitting in my e-mail somewhere. |
Thanks @bmschmidt @organisciak! It's good to know about the HathiTrust Bib API. There are a few reasons that I'm trying to get metadata from the Hathi IDs. We specifically included Hathi IDs with all book data in the Post45 Data Collective (e.g. NYT bestsellers) to enable people to work with the full texts/bags of words in HathiTrust. But I recently realized that the Hathi IDs are basically also our only consistent unique identifier for books, so now I'm trying to retroactively add ISBN and OCLC numbers, so we can make the datasets interoperable with other data about the same books. Similarly, I want to add ISBN/OCLC numbers to some of the Hathi derived datasets, like the Geographic Locations data, to make them interoperable with data like the Seattle Public Library's collection or circulation data. Anyway, that's a long-winded way of saying that the HathiTrust Bib API sounds like it might be better for my metadata needs. But I would still like to create some notebooks and resources that demonstrate how you can take the Post45 Data Collective data and connect it with HathiTrust text data. I'm including the full error message below (it's long). I'm calling Error message 👇
|
For adding ISBN/OCLC/LCCN identifiers I would probably use Hathifiles. You can just download and parse the data in. The bibAPI can be slow, IIRC. link They have these columns. But 5k isn't that much, so the bibAPI is fine. I'd also just write ht-help--don't know if anyone there monitors this repo, but when I've had this kind of issue it tends to be because some of their servers are on the blink--I think there's some load-balancing for several or something like that. |
Thanks @bmschmidt. That's a good call about reaching out to ht-help (edit: I'm not actually getting the same error with the BibAPI — I'm getting a different error). But I will try out the Hathifiles — thanks for the tip! |
I'm trying to fetch HathiTrust metadata for books in a spreadsheet via their HathiTrust IDs and
Volume()
But I'm getting a lot of HTTP Errors like so, even though this URL does exist and contains HathiTrust data:
ERROR:root:HTTP Error accessing http://data.analytics.hathitrust.org/features-2020.03/mdp/31532/mdp.39015054033520.json.bz2
This issue seems similar to issue #45, but I'm using a Mac, not a Windows computer. Also this doesn't happen with all HathiTrust IDs, only some of them.
Any thoughts about what might be going wrong?
The text was updated successfully, but these errors were encountered: