Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EN] Armenian Folklore from USC Digital Folklore Archives #27

Open
vvbabayan opened this issue Jun 20, 2023 · 2 comments
Open

[EN] Armenian Folklore from USC Digital Folklore Archives #27

vvbabayan opened this issue Jun 20, 2023 · 2 comments
Labels
extraction Task that require data extraction (scraping) skills simple topic-culture Tasks dedicatated Armenian culture, language and history

Comments

@vvbabayan
Copy link
Collaborator

Goal

The goal is to create a dataset with all Armenian-related subjects in the USC Digital Folklore Archives.

Tasks

You should collect entries from http://folklore.usc.edu website with the author, date, and tags, preferably with categories somehow indicated subheadings. Please, saved collected data in machine-readable formats such as JSON or csv files. Please save documents to any temporary public storage and provide link to transfer it to the permanent storage.

Context

USC Digital Folklore Archives is a database of folklore performances. Armenian-related topics can be found at http://folklore.usc.edu/search_gcse/?q=armenian.

Requirements

  • create a public GitHub repository to store code and data under one of the free and open licenses like Creative Commons license or MIT license

Wishes

Please write your code as reusable code that could be launched by someone else later since we could need to update this dataset later.

Resources

Prepared by

This task was prepared by the Open Data Armenia team

@vvbabayan vvbabayan added extraction Task that require data extraction (scraping) skills topic-culture Tasks dedicatated Armenian culture, language and history simple labels Jun 20, 2023
@MunGell
Copy link

MunGell commented Jul 4, 2023

A word of warning: not all articles on this page are related to Armenia (example).

The website uses Google for searching its content and sometimes outputs unrelated articles in the results due to how they were rendered to the Google bot.

@vvbabayan
Copy link
Collaborator Author

@MunGell Thanks for not staying aside! We make sure to validate the generated data, but things like this are always worth to remember about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extraction Task that require data extraction (scraping) skills simple topic-culture Tasks dedicatated Armenian culture, language and history
Projects
None yet
Development

No branches or pull requests

2 participants