keyword-extraction-with-greekBERT

ABSTRACT

BERT has recently emerged as a very effective language representation model. BERT is conceptually simple and empirically powerful. In this paper, we gather a novel dataset from biomedical greek websites and produce word embeddings using the fine tuned BERT model for the keyword extraction from the specific domain of greek biomedical texts. Experiments and evaluation conducted on already existing unsupervised keyword extraction methods compared to our approach shows that BERT can learn from greek biomedical texts. Code is publicly available at: https://github.com/CoGian/keyword-extraction-with-greekBERT and our fine tuned model is available at: https://drive.google.com/drive/folders/1xjzB9e7e-sZT7Qy3BnRyACqRgo7YXR8g?usp=sharing.

Scraping biomedical sites

To scrape biomedical sites

cd bioscrape

scrapy crawl name_of_the_spider -o name.json

e.g. for mednet: scrapy crawl mednet -o mednet.json

Extraction

Useful info can be found at keyword_extraction.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
bioscrape		bioscrape
preprocess		preprocess
.gitignore		.gitignore
Final_report.pdf		Final_report.pdf
README.md		README.md
evaluation.py		evaluation.py
finetune_greekbert_on_biomedical_data_with_mlm.py		finetune_greekbert_on_biomedical_data_with_mlm.py
key_word_extraction.py		key_word_extraction.py
keyword_extraction.ipynb		keyword_extraction.ipynb
requirements.txt		requirements.txt
stopwords-el.json		stopwords-el.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

keyword-extraction-with-greekBERT

ABSTRACT

Scraping biomedical sites

Extraction

About

Releases

Packages

Contributors 3

Languages

CoGian/keyword-extraction-with-greekBERT

Folders and files

Latest commit

History

Repository files navigation

keyword-extraction-with-greekBERT

ABSTRACT

Scraping biomedical sites

Extraction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages