ISBN search imports DVDs from Amazon.com #9879
Labels
Lead: @scottbarnes
Issues overseen by Scott (Community Imports)
Module: Import
Issues related to the configuration or use of importbot and other bulk import systems. [managed]
Priority: 3
Issues that we can consider at our leisure. [managed]
Type: Bug
Something isn't working. [managed]
Problem
The feature which allows Amazon.com records to be imported into OpenLibrary via searching for ISBNs will, unfortunately, import DVDs with ISBNs from Amazon.com, even if they are clearly indicated as not being books.
This was confirmed using the example of https://www.amazon.com/gp/product/1621064298
Reproducing the bug
Context
Breakdown
The solution will likely involve some minor modifications to https://github.com/internetarchive/openlibrary/blob/master/openlibrary/core/vendors.py so that DVDs don't return values to
get_products()
.For the above DVD/item,
get_products()
returns:[{'url': 'https://www.amazon.com/dp/1621064298/?tag=internetarchi-20', 'source_records': ['amazon:1621064298'], 'isbn_10': ['1621064298'], 'isbn_13': ['9781621064299'], 'price': '$15.19', 'price_amt': 1519, 'title': 'Homeland Insecurity: Films by Bill Brown', 'cover': 'https://m.media-amazon.com/images/I/41FuCUj3kUL._SL500_.jpg', 'authors': [{'name': 'Brown, Bill'}], 'publishers': ['Microcosm Publishing'], 'number_of_pages': None, 'edition_num': None, 'publish_date': 'Aug 01, 2007', 'product_group': 'DVD', 'physical_format': 'dvd'}]
.Of interest is
product_group
andphysical_format
. To complete this issue one would likely want to look at https://webservices.amazon.com/paapi5/documentation/ and determine why we should useproduct_group
,physical_format
, both, either, or something else to determine something is a DVD. Or maybe it's better to focus on what is allowed (e.g. books).In any event, we'll likely want to want to modify
serialize()
orget_product()
to filter out DVDs (or only allow whatever constitutes books, if the cases are clear).Requirements Checklist
product_group
,physical_format
, both, either, or something else to determine something is a DVD.physical_format
) such thatget_product()
orserialize()
will no longer return metadata for DVDS (e.g. if modifyingserialize()
, it should return{}
--serialize()
may be the better option).Related files
openlibrary/openlibrary/core/vendors.py
Lines 127 to 167 in d3bb158
openlibrary/openlibrary/core/vendors.py
Lines 123 to 125 in d3bb158
openlibrary/openlibrary/core/vendors.py
Lines 170 to 294 in d3bb158
Stakeholders
Instructions for Contributors
The text was updated successfully, but these errors were encountered: