Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing fails on PDF citations and empty results #116

Open
bifxcore opened this issue Dec 13, 2018 · 4 comments
Open

Parsing fails on PDF citations and empty results #116

bifxcore opened this issue Dec 13, 2018 · 4 comments

Comments

@bifxcore
Copy link

It looks like the underlying HTML changed and the script is throwing:
TypeError: slice indices must be integers or None or have an index method

I think I managed to fix it by changing the code around line 570 from:

            if str(tag).lower().find('.pdf'):
                if tag.find('div', {'class': 'gs_ttss'}):
                    self._parse_links(tag.find('div', {'class': 'gs_ttss'}))

to:

            if str(tag).lower().find('.pdf'):
                if isinstance(tag, NavigableString):
                    continue
                if isinstance(tag, Tag):                 
                    if tag.find('div', {'class': 'gs_or_ggsm'}):
                        self._parse_links(tag.find('div', {'class': 'gs_or_ggsm'}))
@GianniSalami
Copy link

How is NavigableString defined? Thank you for the fix!

@bifxcore
Copy link
Author

bifxcore commented Dec 20, 2018 via email

@bifxcore
Copy link
Author

@peterzjx it still works for me (beautifulsoup4==4.3.2)

@SvennoNito
Copy link

Thank you so much @bifxcore ! It works for me now. Apparently one year later the bug still exists. for everybody who is as new to Beautiful Soup as me, the library needs to be imported like this:

from bs4 import NavigableString, Tag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants