Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriter choke on a relative URL inside a querystring #380

Open
rgaudin opened this issue Aug 19, 2024 · 2 comments
Open

Rewriter choke on a relative URL inside a querystring #380

rgaudin opened this issue Aug 19, 2024 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@rgaudin
Copy link
Member

rgaudin commented Aug 19, 2024

From https://farm.zimit.kiwix.org/pipeline/899453d8-6002-46a5-8c36-cc2f1c4783ef/debug

Traceback (most recent call last):
  File "/usr/bin/zimit", line 8, in <module>
    sys.exit(zimit.zimit())
             ^^^^^^^^^^^^^
  File "/app/zimit/lib/python3.12/site-packages/zimit/zimit.py", line 695, in zimit
    run(sys.argv[1:])
  File "/app/zimit/lib/python3.12/site-packages/zimit/zimit.py", line 616, in run
    return warc2zim(warc2zim_args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/zimit/lib/python3.12/site-packages/warc2zim/main.py", line 168, in main
    return converter.run()
           ^^^^^^^^^^^^^^^
  File "/app/zimit/lib/python3.12/site-packages/warc2zim/converter.py", line 384, in run
    self.add_items_for_warc_record(record)
  File "/app/zimit/lib/python3.12/site-packages/warc2zim/converter.py", line 946, in add_items_for_warc_record
    payload_item = WARCPayloadItem(
                   ^^^^^^^^^^^^^^^^
  File "/app/zimit/lib/python3.12/site-packages/warc2zim/items.py", line 56, in __init__
    ).rewrite(pre_head_template, post_head_template)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/zimit/lib/python3.12/site-packages/warc2zim/content_rewriting/generic.py", line 108, in rewrite
    return self.rewrite_html(pre_head_template, post_head_template)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/zimit/lib/python3.12/site-packages/warc2zim/content_rewriting/generic.py", line 225, in rewrite_html
    rel_static_prefix = self.url_rewriter.get_document_uri(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/zimit/lib/python3.12/site-packages/warc2zim/url_rewriting.py", line 354, in get_document_uri
    PurePosixPath(item_url).relative_to(
  File "/usr/lib/python3.12/pathlib.py", line 684, in relative_to
    raise ValueError(f"'..' segment in {str(other)!r} cannot be walked")
ValueError: '..' segment in 'portalvwco.catalog.srv.br/common/nessComponents/xtree.html?css=../../prg' cannot be walked

Not sure if that's something we just cant support (if so, why are we crashing?) or a legitimate bug

@rgaudin rgaudin added bug Something isn't working question Further information is requested labels Aug 19, 2024
@benoit74 benoit74 removed the question Further information is requested label Sep 2, 2024
@benoit74 benoit74 changed the title Rewriter choke on an URL inside a querystring Rewriter choke on an relative URL inside a querystring Sep 2, 2024
@benoit74
Copy link
Collaborator

benoit74 commented Sep 2, 2024

Legitimate bug from my PoV, simply not something I imagined could happen ^^

@benoit74 benoit74 added this to the 2.2.0 milestone Sep 2, 2024
@benoit74 benoit74 changed the title Rewriter choke on an relative URL inside a querystring Rewriter choke on a relative URL inside a querystring Sep 2, 2024
@benoit74
Copy link
Collaborator

benoit74 commented Sep 2, 2024

(to be fixed in Python static rewriting and in JS dynamic rewriting for consistency)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants