Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download attached PDF documents when opening page #108

Open
redtux opened this issue Jul 17, 2024 · 5 comments
Open

Download attached PDF documents when opening page #108

redtux opened this issue Jul 17, 2024 · 5 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@redtux
Copy link

redtux commented Jul 17, 2024

Hi, I stumbled over a dead link in the ZIM file.

The iFixit German archive has just been downloaded on a fresh install of Kiwix.

Now I wanted to verify that the page is really missing in the wiki, but "unfortunately" it's not. 🙂

Here's the link:
https://de.ifixit.com/Device/Lenovo_ThinkPad_T460p

Original link:
/Document/PDhLL6RFxYZE3Hre/t460p_hmm_en_sp40k04964_02.pdf

(The above was copied from the Kiwix error.)

Screenshot_2024-07-17-22-27-51-432_org.kiwix.kiwixmobile.jpg

@redtux
Copy link
Author

redtux commented Jul 17, 2024

Okay, I should have checked before:

This issue occurs with many devices of various vendors. Does it make sense to make a list with working and broken pages?

@benoit74
Copy link
Collaborator

This is not a really bug, it is a limitation ^^

As mentioned in the error message, current scraper simply does not retrieve this kind of items into the ZIM. This would need additional efforts.

It is however new to me that we now have "Documents" in ifixit guides, we will need to check that. Thank you for reporting, and no need to list pages without this. Unless I missed something, I think that all "Documents" are just missing.

@redtux redtux changed the title [Bug] dead link de:Device/Lenovo_ThinkPad_T460p Download attached PDF documents when opening page Jul 17, 2024
@redtux
Copy link
Author

redtux commented Jul 17, 2024

Thank you for the quick reply, and for the clarification! I changed the title now, so this might be considered a feature request. 🙂

Would the respective wiki page be scraped if it contained no PDF attachment? There is also normal text missing; the PDF issue was just an additional information.

If downloading the PDF in addition to the ZIM is out of scope for this project (as that seems to be a client task), maybe PDFs (or rather all kinds of attachments) could be treated as external links?

That way my system is responsible for downloading the PDF and opening it with my default PDF reader.

@benoit74
Copy link
Collaborator

Do you have an example of normal text missing? This is not really expected.

Philosophy so far has been to focus on what is really important for an offline user (categories, guides, ...) and postpone to "later" what is less important: items (parts and tools), wikis, ...

I had a quick look and documents seems to become a very important part of iFixit now that some companies are providing these to iFixit. I think we should "urgently" add support for these. Most our users are offline and won't be able to use the external link. I cannot provide an ETA however, hopefully in the coming months.

You speak about other kind of attachments, do you have an example? Is it still a document (i.e. in a Documents section, and served on a /Document url)?

@redtux
Copy link
Author

redtux commented Jul 18, 2024

Sorry for the confusion, I did not have any other file formats in mind. 🙂 Pictures seem to be fetched fine, and I saw no videos or any other attachments than PDFs.

Concerning the missing text, I was referring to my initial link:
https://de.ifixit.com/Device/Lenovo_ThinkPad_T460p

This page contains at least a summary, a TOC, and some categories. As already shown by the above screenshot, this information seems to be missing. I could test this with some more pages if needed. 👍

And yes, I perfectly understand that external links are not a real solution (not even a workaround) — but as a short-term "hack" (until this gets solved) it might be considered more intuitive for new users than the information currently displayed (which I obviously did not fully understand without your explanation in this issue 🫢). What do you think?

Thanks for the great work — and in case I could help with some more testing, just let me know pls.

@kelson42 kelson42 added bug Something isn't working question Further information is requested labels Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants