Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a huge collection of all tamil books metadata #228

Open
tshrinivasan opened this issue Aug 21, 2024 · 17 comments
Open

Create a huge collection of all tamil books metadata #228

tshrinivasan opened this issue Aug 21, 2024 · 17 comments

Comments

@tshrinivasan
Copy link
Member

tshrinivasan commented Aug 21, 2024

There are many websites with tamil books details

etc

list all the websites that have all the books details.

scrap them all the data and publish.

create a portal with all the book's metadata.

Once the data is collected, we can publish as a website with Omeka S https://omeka.org/s/ or Islandora https://www.islandora.ca/

@Natkeeran
Copy link
Collaborator

Some of the current datasets can be found here:

https://github.com/KaniyamFoundation/tamilbooks_metadata/tree/main/data

@HariharanUmapathi
Copy link

HariharanUmapathi commented Aug 27, 2024

I'm Scrapping text from https://www.projectmadurai.org/
git hub link comming soon....

Edit 1:
Github : https://github.com/HariharanUmapathi/programmerlife/tree/kaniyam-book-list-scraper/Python/book-list-scrapper

Free free to create github issues regarding the code

@rajkannan1978
Copy link

I am scraping book details from https://www.panuval.com
Soon will create git hub link.

@amotbeli
Copy link

I'm scraping data from the Anna Centenary Library catalogue.

GitHub link is here.

@rajkannan1978
Copy link

I am scraping data from Panuval book store.
Posted my code to git.
GitHub link is https://github.com/rajkannan1978/web-scraping.git
Please let me know if any bugs there.
I welcome any suggestions to improve the code.

@rajkannan1978
Copy link

Hello,
Posting some of my python projects are here.
Web Scraping https://github.com/rajkannan1978/web-scraping.git
Grocery https://github.com/rajkannan1978/grocery.git
Number Guess Game https://github.com/rajkannan1978/number_guess_game.git

Thanks.

@rajkannan1978
Copy link

Hi,
Got 18708 books from panuval.com
Web Scraping https://github.com/rajkannan1978/web-scraping.git

@amotbeli
Copy link

Got 15845 books from the Anna Centenary Library catalogue.

See here.

@rajkannan1978
Copy link

rajkannan1978 commented Sep 12, 2024 via email

@amotbeli
Copy link

Thank you, rajkannan1978!

@tshrinivasan
Copy link
Member Author

We need a central place to store and display all the books metadata.

Explored Omeka https://omeka.org/ and Islandora https://www.islandora.ca
Installed both as https://omeka.kaniyam.cloudns.nz/ and https://islandora.kaniyam.cloudns.nz/

Omeka is missing features of customizing themes and appearance. custom fields is tough. can not store the files on a remote server. can not translate easily. less plugins.

Islandora has more features, we can customise the display, it has bilingual capacity. As it is a drupal based project, all the drupal's powers comes along, with tons of drupal plugins.

Here is the doc for lightweight islandora installation.
https://github.com/digitalutsc/islandora_lite_docs/wiki/7.-Installation

Thanks to @Natkeeran for the guidance on the setup.

Hence, going with islandora. seeing this video to learn the basics - https://www.youtube.com/watch?v=dfc7WUGAmow

Will explore on how to add data .

@kamalaak
Copy link

kamalaak commented Nov 8, 2024

I’ve scraped 50,000+ book details from the www.noolulagam.com
Here’s the repository: https://github.com/kamalaak/noolulagam_books_scraping.git

@tshrinivasan
Copy link
Member Author

tshrinivasan commented Nov 8, 2024 via email

@kamalaak
Copy link

I scraped 50,000+ books with their images. The images are stored in a separate folder, and the CSV file includes the paths to those images.

Here’s the link to the images folder and the repository with the CSV:

CSV: https://github.com/kamalaak/books_scraping/blob/main/with_books.csv

Images: https://drive.google.com/file/d/19z8WCYSoNIxo1nOVzHJtEttFpKDHkeET/view?usp=drivesdk

@kamalaak
Copy link

kamalaak commented Nov 12, 2024

I scraped details of 20,000+ books, including images, from https://dialforbooks.in. However, some books are missing images. Here are the CSV and image links.

https://github.com/kamalaak/books_scraping/blob/main/cleaned_file.csv
https://drive.google.com/drive/folders/17wB8tsKXU6tYPmEebn5sXwXnsbco_PaO?usp=drive_link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants