Create dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR) #729
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, I noticed that this repo has
pmc_patients
for PMC-Patients Task 2: Patient-Patient Similarity (PPS), but there was no dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR), so I created this pull request for this addition. I also don't know if it's best to merge this addition to the previous dataloader (pmc_patients
) or not, so for now I make this as a separate dataloader.Regarding the dataloader schema, since the PMC-Patients PNR is not suitable for all the schemas that have been provided here, I followed @galtay's recommendation (via @SamuelCahyawijaya; thanks for relaying the info to me) to implement the source schema only and leave the
_SUPPORTED_TASKS
empty.Please let me know if there's anything I can help.
Checkbox
biodatasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_BIGBIO_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneBigBioConfig
for the source schema and one for a bigbio schema.datasets.load_dataset
function.python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py
.