diff --git a/_config.yml b/_config.yml index 8f01685..7266033 100644 --- a/_config.yml +++ b/_config.yml @@ -173,7 +173,7 @@ authors: web: https://amor.cms.hu-berlin.de/~schwabmi/ orcid: 0000-0001-5569-6568 twitter: - description: Research fellow at Humboldt University of Berlin. + description: Research fellow at Humboldt-Universität zu Berlin.. cmil: name: Carsten Milling display_name: Carsten @@ -222,7 +222,7 @@ authors: github: orcid: 0000-0002-0417-4054 twitter: - description: Research fellow at Humboldt University of Berlin. + description: Research fellow at Humboldt-Universität zu Berlin. paginate: 10 diff --git a/_drafts/2023-11-28-Key-Passages.markdown b/_drafts/2023-11-28-Key-Passages.markdown index 447b094..71c583f 100644 --- a/_drafts/2023-11-28-Key-Passages.markdown +++ b/_drafts/2023-11-28-Key-Passages.markdown @@ -1,22 +1,54 @@ --- title: "Working Title: Key Passages in Literary Works" layout: post -author: [robert, frederik] +author: [frederik, robert] comments: true date: 2023-11-28 --- # Context -In the DFG project [What matters? Key Passages in Literary Works](https://www.projekte.hu-berlin.de/en/schluesselstellen/what-matters-key-passages-in-literary-works) which started with its first phase in 2020 and has been in the second phase, titled [Is Expert Knowledge Key? Scholarly Interpretations as Resource for the Analysis of Literary Texts in Computational Literary Studies](https://www.projekte.hu-berlin.de/en/schluesselstellen/index.html), since August 2023, we (who is we?) set out to identify and characterize key passages in literary works. We understand key passages as passages that are particularly important to expert readers when interpreting texts. In a mixed-methods approach, we primarily want to investigate empirically which textual characteristics of literary genres can be revealed through patterns of citation and quotation. +In the +[DFG-funded](https://gepris.dfg.de/gepris/projekt/424207720?language=en) +project [What matters? Key Passages in Literary +Works](https://www.projekte.hu-berlin.de/en/schluesselstellen/what-matters-key-passages-in-literary-works) +(which is part of the special priority programm [Computational +Literary Studies](https://dfg-spp-cls.github.io/)) we set out to +identify and characterize key passages in literary works. + +We understand key passages as passages that are particularly important +to expert readers when interpreting texts. In a mixed-methods +approach, we investigate empirically which textual characteristics of +literary genres can be revealed through patterns of citation and +quotation. # Corpus -Our main corpus in the first project phase consisted of two literary works _Die Judenbuche_ by Annette von Droste-Hülshoff and _Michael Kohlhaas_ by Heinrich von Kleist with 44 and 49 scholarly articles, respectively. Fortunately, we could build on the previous work of the [ArguLIT project](https://gepris.dfg.de/gepris/projekt/372804438?language=en) and their annotations of all direct quotations. +Our main corpus consists of two literary works [Die +Judenbuche](https://en.wikipedia.org/wiki/Die_Judenbuche) by Annette +von Droste-Hülshoff and [Michael +Kohlhaas](https://en.wikipedia.org/wiki/Michael_Kohlhaas) by Heinrich +von Kleist with 44 and 49 scholarly articles, +respectively. Fortunately, we could build on the previous work of the +[ArguLIT +project](https://gepris.dfg.de/gepris/projekt/372804438?language=en) +and their annotation of all direct quotations. # Automatic Identification of Quotations -Scholarly texts contain a number of different types of quotations. For example, verbatim quotes from short lengths of single words to longer quotations spanning multiple sentences, and indirect quotations in the form of summarizations or re-narrations. In the first phase of the project, we focused on the automatic identification of linking of direct quotations starting with quotations of a length of five or more words. In [Lotte and Annette: A Framework for Finding and Exploring Key Passages in Literary Works](https://aclanthology.org/2021.nlp4dh-1.7.pdf)[^1], we outline the current landscape for text reuse detection and the development of our tool [Quid](https://hu.berlin/quid). Although there are a number of existing tools, we found that all had limitations for our specific use case. We evaluated Quid and compared it to the existing tools. +Scholarly texts contain different types of quotations. For example, +verbatim quotes of single words to longer quotations spanning multiple +sentences, and indirect quotations in the form of summarizations or +re-narrations. In the first phase of the project, we focused on the +automatic identification and linking of direct quotations starting +with quotations of a length of five or more words. In [Lotte and +Annette: A Framework for Finding and Exploring Key Passages in +Literary Works](https://aclanthology.org/2021.nlp4dh-1.7.pdf)[^1], we +outline the current landscape for text reuse detection and the +development of our tool [Quid](https://hu.berlin/quid). Although there +are a number of existing tools, we found that all had limitations for +our specific use case. We evaluated Quid and compared it to the +existing tools. [^1]: Lotte and Annette have since been renamed to Quid and QuidEx, respectively. @@ -85,13 +117,28 @@ Scholarly texts contain a number of different types of quotations. For example, -Considerably more difficult to identify are quotations which are shorter than 5 words. In _A Novel Approach for Identification and Linking of Short Quotations in Scholarly Texts and Literary Works_[^2], we develop and compare two approaches to tackle this challenge, _ProQuo_ and _ProQuoLM_. +**diese Tabelle kurz erklären** + +Considerably more difficult to identify are quotations which are +shorter than 5 words. In _A Novel Approach for Identification and +Linking of Short Quotations in Scholarly Texts and Literary +Works_[^2], we develop and compare two approaches to tackle this +challenge, _ProQuo_ and _ProQuoLM_. [^2]: Accepted at JCLS 2023 and soon to be published. -For ProQuo, we use the (page) references for long quotations as examples to tell apart (page) references for short quotations from other text in parenthesis. This includes references like those to the Bible or other literary works. We then relate short quotes to their source in the literary work by figuring out the relationships between the quotes and references. We also use the positions of long quotes as guides to link short quotations to the correct passage of the literary work. +For ProQuo, we use the (page) references for long quotations as +examples to tell apart (page) references for short quotations from +other text in parenthesis. This includes references like those to the +Bible or other literary works. We then relate short quotes to their +source in the literary work by figuring out the relationships between +the quotes and references. We also use the positions of long quotes as +guides to link short quotations to the correct passage of the literary +work. -For our second approach, ProQuoLM, we fine-tune a German BERT for classification. First, we identify potential short quotes, and then use the fine-tuned model to filter them. +For our second approach, ProQuoLM, we fine-tune a German BERT +**verlinken** for classification. First, we identify potential short +quotes, and then use the fine-tuned model to filter them.