Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent infinite recursion in get_article_urls #17360

Merged
merged 2 commits into from
Dec 24, 2024

Conversation

masci
Copy link
Member

@masci masci commented Dec 24, 2024

Description

Add a max_depth parameter to get_article_urls so that we prevent infinite recursion.

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Package has no tests.

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Dec 24, 2024
@masci masci requested a review from nerdai December 24, 2024 08:07
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 24, 2024
@logan-markewich logan-markewich merged commit 159ce48 into main Dec 24, 2024
11 checks passed
@logan-markewich logan-markewich deleted the massi/fix-infinite-recursion branch December 24, 2024 16:13
@@ -125,9 +127,10 @@ def scrape_article(
return {"title": title, "subtitle": subtitle, "body": body, "url": url}

def get_article_urls(
self, browser: Any, root_url: str, current_url: str
self, browser: Any, root_url: str, current_url: str, max_depth: int = 100
Copy link
Contributor

@jzhao62 jzhao62 Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we track the url we visited so we do not visit it again, instead of hardcoding depth ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed a quick fix but yeah, that would be a better solution, feel free to open a PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants