Skip to content

Commit

Permalink
updated handle for OTA (20.500.12024 -> 20.500.14106) for all corpora
Browse files Browse the repository at this point in the history
  • Loading branch information
kreetrapper committed Mar 26, 2024
1 parent f1dc2da commit d11fe98
Show file tree
Hide file tree
Showing 11 changed files with 36 additions and 36 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ UH's English E-thesis corpus;http://urn.fi/urn:nbn:fi:lb-2016102401;English;200
The Royal Society Corpus;http://hdl.handle.net/21.11119/0000-0001-7E8B-6;English (late and early modern);32 million tokens;PoS-tagged, lemmatised, normalised, author and document metadata;CC BY;;"This corpus contains journal articles published in <a href=""http://rstl.royalsocietypublishing.org/"">Philosophical Transactions of the Royal Society of London</a> between 1665 and 1869.#SEPThe corpus is available for online querying through CQPweb and for download from the CLARIN-D repository of the University of Saarland.";Academic corpora;Concordancer#SEPDownload;http://fedora.clarin-d.uni-saarland.de/rsc_v4/access.html#cqpweb#SEPhttp://fedora.clarin-d.uni-saarland.de/rsc_v4/access.html#download;Kermes et al. 2016;http://www.lrec-conf.org/proceedings/lrec2016/summaries/792.html;
Corpus of Estonian scientific texts;http://hdl.handle.net/11297/1-00-0000-0000-0000-0002-4;Estonian;5 million words;;CLARIN ACA-NC;;This corpus contains scientific articles and PhD theses. The corpus data are in the P5 format.;Academic corpora;Download;http://hdl.handle.net/11297/1-00-0000-0000-0000-0002-4;;;
UH's Finnish E-thesis corpus;http://urn.fi/urn:nbn:fi:lb-2016090601;Finnish;12.5 million tokens;PoS-tagged, lemmatised;CC BY;;This corpus contains MA and PhD theses published between 1999 and 2016.#SEPThe corpus is available for online querying through the concordancer Korp (FIN-CLARIN distribution).;Academic corpora;Concordancer;http://urn.fi/urn:nbn:fi:lb-2016101801;;;
Chambers-Le Baron Corpus of Research Articles;http://hdl.handle.net/20.500.12024/2527;French;1 million words;No annotation;Oxford Text Archive licence (academic use);;This corpus contains research papers in the following disciplines:#SEP<ul><li>media/culture,</li><li>literature,</li><li>linguistics and language learning,</li><li>social anthropology,</li><li>law, economics,</li><li>sociology and social sciences,</li><li>philosophy,</li><li>history, and</li><li>communication.</li></ul>#SEPThe research papers were published between 1998 and 2006. This is a plain text corpus.#SEPThe corpus is available for download from the Oxford Text Archive.;Academic corpora;Download;http://hdl.handle.net/20.500.12024/2527;;;
Chambers-Le Baron Corpus of Research Articles;http://hdl.handle.net/20.500.14106/2527;French;1 million words;No annotation;Oxford Text Archive licence (academic use);;This corpus contains research papers in the following disciplines:#SEP<ul><li>media/culture,</li><li>literature,</li><li>linguistics and language learning,</li><li>social anthropology,</li><li>law, economics,</li><li>sociology and social sciences,</li><li>philosophy,</li><li>history, and</li><li>communication.</li></ul>#SEPThe research papers were published between 1998 and 2006. This is a plain text corpus.#SEPThe corpus is available for download from the Oxford Text Archive.;Academic corpora;Download;http://hdl.handle.net/20.500.14106/2527;;;
UH's French E-thesis corpus;http://urn.fi/urn:nbn:fi:lb-2016102806;French;580,000 tokens;;CC BY;;This corpus contains MA and PhD theses published between 1999 and 2016.#SEPThe corpus is available for online querying through the concordancer Korp (FIN-CLARIN distribution).;Academic corpora;Concordancer;http://urn.fi/urn:nbn:fi:lb-2016102803;;;
UH's German E-thesis corpus;http://urn.fi/urn:nbn:fi:lb-2016102807;German;560,000 tokens;No annotation;CC BY;;This corpus contains MA and PhD theses published between 1999 and 2016.#SEPThe corpus is available for online querying through the concordancer Korp (FIN-CLARIN distribution).;Academic corpora;Concordancer;http://urn.fi/urn:nbn:fi:lb-2016102802;;;
Modern Greek Dialects: scientific papers;http://hdl.grnet.gr/11500/KEG-0000-0000-2502-4;Greek;113,000 words;;CC-BY-SA;;This corpus contains scientific texts in linguistics and dialectology. This is a plain text corpus.#SEPThe corpus is available for download from the CLARIN:EL repository.;Academic corpora;Download;http://hdl.grnet.gr/11500/KEG-0000-0000-2502-4;;;
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Corpus;Corpus_URL;Language;Size;Annotation;Licence;Licence_URL;Description;Family;Buttons;Buttons_URL;Publication;Publication_URL;Note
"""PolDiLemma"" Middle Polish Diachrone Lemmatised Corpus";http://hdl.handle.net/11858/00-246C-0000-0023-8C44-B;Czech, German, Latin, Polish;7 million tokens;tokenised, lemmatised;CC BY-NC-SA 4.0;;This corpus contains political, religious and scientific texts from the 16th to the 18th century.#SEPThe corpus is available for download from the CLARIN-D repository.;Historical corpora;Download;http://hdl.handle.net/11858/00-246C-0000-0023-8C44-B;;;
Medieval Charter Sections Corpus;http://hdl.handle.net/11234/1-1952;Czech, Latin;57 chapters;manually-tagged, named entities;CC-BY-NC-SA 4.0;;This corpus contains Latin charters created in the era of John the Bling, King of Bohemia.#SEPThe corpus is available for download from LINDAT.;Historical corpora;Download;http://hdl.handle.net/11234/1-1952;Galuščáková and Neužilová (2018).;https://www.clarin.eu/resource-families/historical-corpora#Galu%C5%A1%C4%8D%C3%A1kov%C3%A1%20and%C2%A0Neu%C5%BEilov%C3%A1%202018;
Anthology of Middle English texts / Santiago Gonzalez y Fernandez-Corugedo;http://hdl.handle.net/20.500.12024/1398;English (Middle), Hebrew;4000 words;no linguistic annotation;Oxford Text Archive licence;;This corpus contains literary texts from 1100 to 1400.#SEPThe corpus is available for download from the Oxford Text Archive.;Historical corpora;Download;http://hdl.handle.net/20.500.12024/1398;;;
Dictionary of Old English Corpus in Electronic Form (DOEC);http://hdl.handle.net/20.500.12024/2488;English (Old), Latin;;no linguistic annotation;Oxford Text Archive licence;;This corpus contains 3037 texts from 600 to 1150.#SEPThe corpus is available for download from the Oxford Text Archive.;Historical corpora;Download;http://hdl.handle.net/20.500.12024/2488;;;
The York-Toronto-Helsinki Parsed Corpus of Old English prose (YCOE);http://hdl.handle.net/20.500.12024/2462;English (Old), Latin;1.5 million words;syntactically-parsed;Oxford Text Archive licence;;This corpus contains fictional texts from 600 to 1150.#SEPThe corpus is available for download from the Oxford Text Archive.;Historical corpora;Download;http://hdl.handle.net/20.500.12024/2462;;;
Anthology of Middle English texts / Santiago Gonzalez y Fernandez-Corugedo;http://hdl.handle.net/20.500.14106/1398;English (Middle), Hebrew;4000 words;no linguistic annotation;Oxford Text Archive licence;;This corpus contains literary texts from 1100 to 1400.#SEPThe corpus is available for download from the Oxford Text Archive.;Historical corpora;Download;http://hdl.handle.net/20.500.14106/1398;;;
Dictionary of Old English Corpus in Electronic Form (DOEC);http://hdl.handle.net/20.500.14106/2488;English (Old), Latin;;no linguistic annotation;Oxford Text Archive licence;;This corpus contains 3037 texts from 600 to 1150.#SEPThe corpus is available for download from the Oxford Text Archive.;Historical corpora;Download;http://hdl.handle.net/20.500.14106/2488;;;
The York-Toronto-Helsinki Parsed Corpus of Old English prose (YCOE);http://hdl.handle.net/20.500.14106/2462;English (Old), Latin;1.5 million words;syntactically-parsed;Oxford Text Archive licence;;This corpus contains fictional texts from 600 to 1150.#SEPThe corpus is available for download from the Oxford Text Archive.;Historical corpora;Download;http://hdl.handle.net/20.500.14106/2462;;;
Hamburg Corpus of Old Swedish with Syntactic Annotations (HaCOSSA);http://hdl.handle.net/11022/0000-0000-9D16-7;English, German, Latin, Old Norse, Swedish;128,000 words;MSD-tagged, syntactically parsed;CLARIN RES;;This corpus contains texts written in the Late Old Swedish period (from 1375 to 1550).#SEPThe corpus is available for download from the repository of the University of Hamburg.;Historical corpora;Download;http://hdl.handle.net/11022/0000-0000-9D16-7;;;
The Electronic Text Corpus of Sumerian Literature. Revised edition;http://hdl.handle.net/20.500.12024/2518;English, Sumerian;5,151,373 words;Each word form in the composite transliterations has been assigned to a lexeme which is specified by a citation form, word class information and basic English translation.;CC-BY-NC-SA 3.0;;This corpus contains transliterations and English translations of 394 Sumerian compositions from approximately 2100 to 1700 BCE.#SEPThe corpus is available for download from the Oxford Text Archive.;Historical corpora;Download;http://hdl.handle.net/20.500.12024/2518;;;
The Electronic Text Corpus of Sumerian Literature. Revised edition;http://hdl.handle.net/20.500.14106/2518;English, Sumerian;5,151,373 words;Each word form in the composite transliterations has been assigned to a lexeme which is specified by a citation form, word class information and basic English translation.;CC-BY-NC-SA 3.0;;This corpus contains transliterations and English translations of 394 Sumerian compositions from approximately 2100 to 1700 BCE.#SEPThe corpus is available for download from the Oxford Text Archive.;Historical corpora;Download;http://hdl.handle.net/20.500.14106/2518;;;
Finnish Folk Poetry;http://urn.fi/urn:nbn:fi:lb-2014052712;Finnish, Karelian, Ludian, Latin, Swedish, Olonets, Izhorian, Votic;7.1 million words;normalised (added diacritics);CC-BY-NC;;This corpus contains poems from 1564 to 1939.#SEPThe corpus is available through the concordancer Korp.;Historical corpora;Concordancer;http://urn.fi/urn:nbn:fi:lb-2014052711;;;
Corpus of Early Modern Finnish, Kielipankki Version;http://urn.fi/urn:nbn:fi:lb-20140730147;Finnish, Russian, German, Latin;8.6 million words;no linguistic annotation;EUPL v.1.1 SA;;This corpus contains texts from 1809 to 1899.#SEPThe corpus is available through the concordancer Korp.;Historical corpora;Concordancer;http://urn.fi/urn:nbn:fi:lb-2016081203;;;
Aleksis Kivi Corpus (SKS);http://urn.fi/urn:nbn:fi:lb-201405274;Finnish, Swedish;413,700 words;MSD-tagged, syntactically parsed;CC-BY-NC;;This corpus contains the works by Finnish author Aleksis Kivi from 1855 to 1871.#SEPThe corpus is available through the concordancer Korp.;Historical corpora;Concordancer;http://urn.fi/urn:nbn:fi:lb-201405273;;;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
"Corpus";"Corpus_URL";"Language";"Size";"Annotation";"Licence";"Licence_URL";"Description";"Family";"Buttons";"Buttons_URL";"Publication";"Publication_URL";"Note"
"CzeSL – Czech as a Second Language";"http://hdl.handle.net/11234/1-162";"Czech";"0.9 million words";"tokenised, PoS-tagged, lemmatised, error labels";"CC-BY";;"This corpus contains essays written in 2013 by learners from 54 L1 backgrounds.#SEPThe corpus is available for download from LINDAT.";"L2 learner corpora";"Download";"http://hdl.handle.net/11234/1-162";"Rosen (2016).";"https://www.clarin.eu/resource-families/L2-corpora#Rosen%202016";
"British Academic Written English Corpus";"http://hdl.handle.net/20.500.12024/2539";"English";"2761 texts";;"CC-BY";;"This is primarily a L1 corpus although it also contains L2 texts.#SEPThe corpus is available for download from the University of Oxford Text Archive.";"L2 learner corpora";"Download";"http://hdl.handle.net/20.500.12024/2539";;;
"British Academic Written English Corpus";"http://hdl.handle.net/20.500.14106/2539";"English";"2761 texts";;"CC-BY";;"This is primarily a L1 corpus although it also contains L2 texts.#SEPThe corpus is available for download from the University of Oxford Text Archive.";"L2 learner corpora";"Download";"http://hdl.handle.net/20.500.14106/2539";;;
"CORYL (Corpus of Young Learner Language)";"http://hdl.handle.net/11495/D985-A8CA-A02D-4";"English";"191,568 tokens";"tokenised, anonymised, error labels, linked to CEFR levels";"CC-BY";;"This corpus contains English texts written yb Norwegian primary school pupils (7th, 10th, and 11th grade).#SEPThe corpus is available through the Browse Corpuscle provided by CLARINO.";"L2 learner corpora";"Browse";"http://clarino.uib.no/korpuskel/landing-page?identifier=coryl&view=short";;;
"ETS Corpus of Non-Native Written English";"https://catalog.ldc.upenn.edu/LDC2014T06";"English";"12,100 essays (1100 / language)";;"restricted";;"This corpus contains texts written by learners from 11 L1 backgrounds as part of an international text of academic English proficiency. Prompts as well as proficiency level are part of the metadata.#SEPThe corpus is available for download from the LDC catalogue.";"L2 learner corpora";"Download";"http://catalog.ldc.upenn.edu/LDC2014T06";;;
"ICLE International Corpus of Learner English";"http://hdl.handle.net/11372/LRT-859";"English";"3 million words";;;;"This corpus contains texts written by learners of English from 14 L1 backgrounds.#SEPThe corpus can be <a href=""https://www.i6doc.com/en/collections/cdicle/>purchased on CD-ROM</a> and a new version ""(ICLE v.3) is in development.";"L2 learner corpora";;;;;
"The Hanken Corpus of Academic Writing";"http://urn.fi/urn:nbn:fi:lb-2016081504";"English";"500,000 words";;"CC-BY";;"This corpus contains academic texts  written by Finnish and Swedish native speakers.#SEPThe corpus is still under development.";"L2 learner corpora";;;;;
"The Uppsala Student English corpus";"http://hdl.handle.net/20.500.12024/2457";"English";"1.2 million tokens";"tokenised";"CC-BY";;"This corpus contains essays written during the first three semesters of English studies at Uppsala University; most of the essays were written during the first semester. The corpus contains text files, each with a student ID and text ID including the course level, and information about the different prompts are available.#SEPThe corpus is available for download from the University of Oxford Text Archive.";"L2 learner corpora";"Download";"http://hdl.handle.net/20.500.12024/2457";;;
"The Uppsala Student English corpus";"http://hdl.handle.net/20.500.14106/2457";"English";"1.2 million tokens";"tokenised";"CC-BY";;"This corpus contains essays written during the first three semesters of English studies at Uppsala University; most of the essays were written during the first semester. The corpus contains text files, each with a student ID and text ID including the course level, and information about the different prompts are available.#SEPThe corpus is available for download from the University of Oxford Text Archive.";"L2 learner corpora";"Download";"http://hdl.handle.net/20.500.14106/2457";;;
"International Corpus of Learner Finnish (ICLFI) Corpus";"http://urn.fi/urn:nbn:fi:lb-20140730163";"Finnish";"1 million words";"MSD-tagged";"CLARIN RES";;"This corpus contains fictional (e.g., letters, narratives) and non-fictional (e.g., essays) texts.#SEPThe corpus provides information on a large number of variables concerning the linguistic background of the learner, the learning task, the learning context, etc. It is available through the Browse Korp.";"L2 learner corpora";"Browse";"https://korp.csc.fi/#?cqp=[]&corpus=iclfi";"Jantunen (2011).";"https://www.clarin.eu/resource-families/L2-corpora#Jantunen%202011";
"Testipiste Corpus";"http://urn.fi/urn:nbn:fi:lb-2017020701";"Finnish";"840,000 tokens";"tokenised";"CLARIN RES";;"This corpus contains essays written by adult migrants from various L1 backgrounds.#SEPThe corpus will be made available through the Browse Korp.";"L2 learner corpora";;;;;
"The Advanced Finnish Learners’ Corpus";"http://urn.fi/urn:nbn:fi:lb-201407167";"Finnish";"288,000 tokens";"tokenised, MSD-tagged, lemmatised";"CLARIN RES";;"This corpus contains academic texts written by MA students and collected in 2009.#SEPThe corpus consists of two subcorpora - <a href=""http://urn.fi/urn:nbn:fi:lb-2016041401"">The Exam Essays Subcorpus</a> and the <a href=""http://urn.fi/urn:nbn:fi:lb-2016041402"">Course Papers Subcorpus</a>, both of which are also available through Korp.";"L2 learner corpora";"Browse#SEPDownload#SEPDownload";"https://korp.csc.fi/#?cqp=%5B%5D&corpus=las2_tentit,las2_esseet&prequery_within=sentence#SEPhttp://urn.fi/urn:nbn:fi:lb-2016041401#SEPhttp://urn.fi/urn:nbn:fi:lb-2016041402";;;
Expand Down
Loading

0 comments on commit d11fe98

Please sign in to comment.