Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Inside - Content within PDFs Library on Archive.org #222

Open
IngersolNorway opened this issue Jul 22, 2024 · 1 comment
Open

Search Inside - Content within PDFs Library on Archive.org #222

IngersolNorway opened this issue Jul 22, 2024 · 1 comment

Comments

@IngersolNorway
Copy link

Project Request Plan: Searching Titles and Content within PDFs Library on Archive.org

I am writing to request assistance with a project aimed at searching for titles and content within a set of PDF files hosted on Archive.org using the "Search Inside" feature. Below, I have outlined the key objectives, steps, and requirements for this project:

Key Features

  • Library Creation: Compile a library from provided PDF URLs.
  • Simple Title Search: Implement a straightforward title search within the compiled library.
  • Content Search: Utilize the "Search Inside" feature to search for specific keywords within each PDF.
  • Result Documentation: Record and organize the search results, including page numbers and relevant context, in a structured format for easy reference. For example, if a user searches for the word "Tamil," the results should show the number of occurrences of the word "Tamil" in each PDF along with the page numbers. Additionally, when opening a result, the words should be highlighted in the PDF, similar to the Archive.org "Search Inside" feature.

Search Methods

  1. Simple Title Search: Conduct a basic search for titles within the PDF library.
  2. Search Inside Content Search: Perform an in-depth search for specific content keywords within the text of each PDF using the "Search Inside" feature.

I kindly request your expertise in assisting with this project to ensure that the search process is thorough and the results are well-documented. Your prompt attention to this request would be greatly appreciated.

Please let me know if you need any additional information or if there are any specific details you would like to discuss further.

@tshrinivasan
Copy link
Member

We can not search in archive.org's web book viewer from any third party application.

All we can do is,

download all the text files from archive.org ( need the list of book URLs)
store them locally
search within those files
Give the book name and relevant arcchive.org URL

we can not get the page number on the PDF with the local text file search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants