Releases: Coleridge-Initiative/rclc
release v1.0.8
- 1491 publications
- resolved more of the metadata quality issues within the graph
- entities in the graph now include datasets, providers, journals, publications
- entities have URL links to persistent identifiers, where possible; see https://github.com/Coleridge-Initiative/rclc/wiki/Corpus-Description#entity-definitions
- improved downloads for open access PDFs
- using Ray to parallelize downloads, etc.
v0.1.7 prelim
- adds a new journal entity into the graph
v0.1.6 prelim
- preliminary release, testing the metdata from
richcontext.scholapi
federated discovery APIs; see https://pypi.org/project/richcontext-scholapi/ - ~1700 publications, ~500 datasets
- we've filtered out many publications due to data quality issues upstream which are being fixed; will include those for a full release of the corpus
fixes and additions
The corpus now has 480+ publications with open access PDFs.
Changes include:
- bug fixes for 3 incorrect open access URLs
This intended for evaluation, prior to launching the Rich Context leaderboard competition.
minor update, v0.1.4
The corpus now has 350+ publications with open access PDFs.
This intended for evaluation, prior to launching the Rich Context leaderboard competition.
minor update, v0.1.3
The corpus now has 290+ publications with open access PDFs.
Changes include:
- URN no longer has
doi
prefix
This intended for evaluation, prior to launching the Rich Context leaderboard competition.
minor fixes
The corpus now has 250 publications with open access PDFs.
Changes include:
- fixed bug that caused some dataset references to be duplicated
- added more publications (mostly related to
NHANES
)
This intended for evaluation, prior to launching the Rich Context leaderboard competition.
minor update, v0.1.1
Changes include:
- replacing all SSRN links with other open access links
- added datasets:
NHANES
,ATLAS
,JHU PRC
, etc.
This intended for evaluation, prior to launching the Rich Context leaderboard competition.
initial corpus, post-RCC
This release of the RCC corpus is the first release after the initial competition.
Changes include:
- now available in both TTL and JSON-LD format
- some data quality issues have been fixed / are being fixed
- a broader range of publication cases
- all publications have URLs for open access PDFs
This intended for evaluation, prior to launching the Rich Context leaderboard competition.