-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a huge collection of all tamil books metadata #228
Comments
Some of the current datasets can be found here: https://github.com/KaniyamFoundation/tamilbooks_metadata/tree/main/data |
I'm Scrapping text from https://www.projectmadurai.org/ Edit 1: Free free to create github issues regarding the code |
I am scraping book details from https://www.panuval.com |
I'm scraping data from the Anna Centenary Library catalogue. GitHub link is here. |
I am scraping data from Panuval book store. |
Hello, Thanks. |
Hi, |
Got 15845 books from the Anna Centenary Library catalogue. |
Super. I visited the website. Also learned from your code. It is neat and
clean.
Thanks.
…On Thu, Sep 12, 2024 at 7:15 AM amotbeli ***@***.***> wrote:
Got 15845 books from the Anna Centenary Library catalogue.
See here. <https://github.com/amotbeli/acl_data/blob/main/acl_data.json>
—
Reply to this email directly, view it on GitHub
<#228 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BKZNVCPFE2TGLSQWIZJRIW3ZWDW3FAVCNFSM6AAAAABM4G32EWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBVGA4DSMZSGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thank you, rajkannan1978! |
We need a central place to store and display all the books metadata. Explored Omeka https://omeka.org/ and Islandora https://www.islandora.ca Omeka is missing features of customizing themes and appearance. custom fields is tough. can not store the files on a remote server. can not translate easily. less plugins. Islandora has more features, we can customise the display, it has bilingual capacity. As it is a drupal based project, all the drupal's powers comes along, with tons of drupal plugins. Here is the doc for lightweight islandora installation. Thanks to @Natkeeran for the guidance on the setup. Hence, going with islandora. seeing this video to learn the basics - https://www.youtube.com/watch?v=dfc7WUGAmow Will explore on how to add data . |
I’ve scraped 50,000+ book details from the www.noolulagam.com |
Wonderful. thanks.
Please scrap the cover images also.
|
I scraped 50,000+ books with their images. The images are stored in a separate folder, and the CSV file includes the paths to those images. Here’s the link to the images folder and the repository with the CSV: CSV: https://github.com/kamalaak/books_scraping/blob/main/with_books.csv Images: https://drive.google.com/file/d/19z8WCYSoNIxo1nOVzHJtEttFpKDHkeET/view?usp=drivesdk |
I scraped details of 20,000+ books, including images, from https://dialforbooks.in. However, some books are missing images. Here are the CSV and image links. https://github.com/kamalaak/books_scraping/blob/main/cleaned_file.csv |
There are many websites with tamil books details
etc
list all the websites that have all the books details.
scrap them all the data and publish.
create a portal with all the book's metadata.
Once the data is collected, we can publish as a website with Omeka S https://omeka.org/s/ or Islandora https://www.islandora.ca/
The text was updated successfully, but these errors were encountered: