Create dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR) #729

holylovenia · 2022-07-08T09:01:23Z

Hello, I noticed that this repo has pmc_patients for PMC-Patients Task 2: Patient-Patient Similarity (PPS), but there was no dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR), so I created this pull request for this addition. I also don't know if it's best to merge this addition to the previous dataloader (pmc_patients) or not, so for now I make this as a separate dataloader.

Regarding the dataloader schema, since the PMC-Patients PNR is not suitable for all the schemas that have been provided here, I followed @galtay's recommendation (via @SamuelCahyawijaya; thanks for relaying the info to me) to implement the source schema only and leave the _SUPPORTED_TASKS empty.

Please let me know if there's anything I can help.

Name: PMC-Patients PNR
Description: PMC-Patients dataset consists of 4 tasks. One of the task is Patient Note Recognition (PNR). PMC-Patients PNR dataset is modeled as a paragraph-level sequential labeling task, similar to the named entity recognition (NER) task. For each article, given input as a sequence of texts p1, p2, ..., pn, where n is the number of paragraphs, the output is a sequence of BIO tags t1, t2, ..., tn.
Paper: PMC-Patients: A Large-scale Dataset of Patient Notes and Relations Extracted from Case Reports in PubMed Central
Data: Google Drive

Checkbox

Confirm that this PR is linked to the dataset issue.
Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
Implement _info(), _split_generators() and _generate_examples() in dataloader script.
Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
Confirm dataloader script works with datasets.load_dataset function.
Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

…PNR)

phlobo · 2024-07-25T14:22:25Z

Hi @holylovenia! We are currently in the process of going through open issues and PRs and merging them whenever possible. Would it be doable for you to update your PR according to the latest contribution guide, i.e., making it compatible with the HF Hub?

If you don't have bandwidth for this, me and other maintainers would try to amend your PR by making the required changes ourselves, ideally still giving due credit to your contribution while doing that.

Please let me know what you think!

holylovenia added 3 commits July 8, 2022 16:42

Create dataloader for PMC-Patients Task 1: Patient Note Recognition (…

b4ff71e

…PNR)

Format code

366487d

Remove support tasks

f352073

holylovenia requested review from hakunanatasha, jason-fries, sunnnymskang, ruisi-su, galtay, leonweber, sg-wbi and debajyotidatta as code owners July 8, 2022 09:01

phlobo self-assigned this Jul 25, 2024

phlobo removed their assignment Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR) #729

Create dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR) #729

holylovenia commented Jul 8, 2022 •

edited

Loading

phlobo commented Jul 25, 2024

Create dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR) #729

Are you sure you want to change the base?

Create dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR) #729

Conversation

holylovenia commented Jul 8, 2022 • edited Loading

Checkbox

phlobo commented Jul 25, 2024

holylovenia commented Jul 8, 2022 •

edited

Loading