Parsing fails on PDF citations and empty results #116

bifxcore · 2018-12-13T19:16:44Z

It looks like the underlying HTML changed and the script is throwing:
TypeError: slice indices must be integers or None or have an index method

I think I managed to fix it by changing the code around line 570 from:

            if str(tag).lower().find('.pdf'):
                if tag.find('div', {'class': 'gs_ttss'}):
                    self._parse_links(tag.find('div', {'class': 'gs_ttss'}))

to:

            if str(tag).lower().find('.pdf'):
                if isinstance(tag, NavigableString):
                    continue
                if isinstance(tag, Tag):                 
                    if tag.find('div', {'class': 'gs_or_ggsm'}):
                        self._parse_links(tag.find('div', {'class': 'gs_or_ggsm'}))

The text was updated successfully, but these errors were encountered:

GianniSalami · 2018-12-18T18:26:05Z

How is NavigableString defined? Thank you for the fix!

bifxcore · 2018-12-20T02:32:42Z

How is NavigableString defined?

it needs to be imported from bs4 (BeautifulSoup library).

…

________________________________ From: GianniSalami <[email protected]> Sent: 18 December 2018 10:26 To: ckreibich/scholar.py Cc: Seattle BioMed; Author Subject: Re: [ckreibich/scholar.py] Parsing fails on PDF citations and empty results (#116) How is NavigableString defined? Thank you for the fix! - You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ckreibich_scholar.py_issues_116-23issuecomment-2D448320712&d=DwMCaQ&c=aBkXpkKi7gN5fe5MqrMaN-VmRugaRb1IDRfSv2xVRy0&r=wji2HRc6wNj6E_iDdlTq3VvbuGpzMddqJ0CgcExLMHEa1MZJM8LIAlikqG4pwOpR&m=iN6HV6mxWO3WWmduAa8AD6XUpGS8WKPHf8niaSpnLhQ&s=mByPq6DBVxAYTKtea__QJYIeVoR9yG7IxzN6I-oPaWE&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGWM-5FLxi1-5Fiy1isu-2DT29vMJgaFh5yRYoks5u6TNCgaJpZM4ZSQR-5F&d=DwMCaQ&c=aBkXpkKi7gN5fe5MqrMaN-VmRugaRb1IDRfSv2xVRy0&r=wji2HRc6wNj6E_iDdlTq3VvbuGpzMddqJ0CgcExLMHEa1MZJM8LIAlikqG4pwOpR&m=iN6HV6mxWO3WWmduAa8AD6XUpGS8WKPHf8niaSpnLhQ&s=dT0h8-46bCD4A7A8G_kDul9rURyl7r82Hgq2XI9c3Es&e=>. CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information protected by law. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

bifxcore · 2019-10-17T00:32:07Z

@peterzjx it still works for me (beautifulsoup4==4.3.2)

SvennoNito · 2020-01-18T22:42:07Z

Thank you so much @bifxcore ! It works for me now. Apparently one year later the bug still exists. for everybody who is as new to Beautiful Soup as me, the library needs to be imported like this:

from bs4 import NavigableString, Tag

peterzjx mentioned this issue May 29, 2019

It doesn't work #121

Open

SvennoNito mentioned this issue Jan 20, 2020

First Steps with Scholar.py #127

Open

simplecomplex-tech mentioned this issue May 15, 2020

Parser fails on results with only citation: TypeError: slice indices must be integers or None or have an __index__ method #118

Open

j3soon added a commit to j3soon/ckreibich-scholar.py that referenced this issue Dec 4, 2022

Fix issue ckreibich#116 and ckreibich#118

840b890

rafaelribeiro1510 added a commit to rafaelribeiro1510/scholar.py that referenced this issue Nov 13, 2023

Fix from ckreibich#116

c416063

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing fails on PDF citations and empty results #116

Parsing fails on PDF citations and empty results #116

bifxcore commented Dec 13, 2018

GianniSalami commented Dec 18, 2018

bifxcore commented Dec 20, 2018 via email

bifxcore commented Oct 17, 2019

SvennoNito commented Jan 18, 2020

Parsing fails on PDF citations and empty results #116

Parsing fails on PDF citations and empty results #116

Comments

bifxcore commented Dec 13, 2018

GianniSalami commented Dec 18, 2018

bifxcore commented Dec 20, 2018 via email

bifxcore commented Oct 17, 2019

SvennoNito commented Jan 18, 2020