From 55942ef11d5274a669d39e790c8142fdf17f1ee7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Robert=20J=C3=A4schke?= Date: Thu, 14 Dec 2023 08:28:34 +0100 Subject: [PATCH] worked on the text, added suggestions --- _config.yml | 4 +- _drafts/2023-11-28-Key-Passages.markdown | 103 ++++++++++++++++++++--- 2 files changed, 93 insertions(+), 14 deletions(-) diff --git a/_config.yml b/_config.yml index 8f01685..7266033 100644 --- a/_config.yml +++ b/_config.yml @@ -173,7 +173,7 @@ authors: web: https://amor.cms.hu-berlin.de/~schwabmi/ orcid: 0000-0001-5569-6568 twitter: - description: Research fellow at Humboldt University of Berlin. + description: Research fellow at Humboldt-Universität zu Berlin.. cmil: name: Carsten Milling display_name: Carsten @@ -222,7 +222,7 @@ authors: github: orcid: 0000-0002-0417-4054 twitter: - description: Research fellow at Humboldt University of Berlin. + description: Research fellow at Humboldt-Universität zu Berlin. paginate: 10 diff --git a/_drafts/2023-11-28-Key-Passages.markdown b/_drafts/2023-11-28-Key-Passages.markdown index 447b094..71c583f 100644 --- a/_drafts/2023-11-28-Key-Passages.markdown +++ b/_drafts/2023-11-28-Key-Passages.markdown @@ -1,22 +1,54 @@ --- title: "Working Title: Key Passages in Literary Works" layout: post -author: [robert, frederik] +author: [frederik, robert] comments: true date: 2023-11-28 --- # Context -In the DFG project [What matters? Key Passages in Literary Works](https://www.projekte.hu-berlin.de/en/schluesselstellen/what-matters-key-passages-in-literary-works) which started with its first phase in 2020 and has been in the second phase, titled [Is Expert Knowledge Key? Scholarly Interpretations as Resource for the Analysis of Literary Texts in Computational Literary Studies](https://www.projekte.hu-berlin.de/en/schluesselstellen/index.html), since August 2023, we (who is we?) set out to identify and characterize key passages in literary works. We understand key passages as passages that are particularly important to expert readers when interpreting texts. In a mixed-methods approach, we primarily want to investigate empirically which textual characteristics of literary genres can be revealed through patterns of citation and quotation. +In the +[DFG-funded](https://gepris.dfg.de/gepris/projekt/424207720?language=en) +project [What matters? Key Passages in Literary +Works](https://www.projekte.hu-berlin.de/en/schluesselstellen/what-matters-key-passages-in-literary-works) +(which is part of the special priority programm [Computational +Literary Studies](https://dfg-spp-cls.github.io/)) we set out to +identify and characterize key passages in literary works. + +We understand key passages as passages that are particularly important +to expert readers when interpreting texts. In a mixed-methods +approach, we investigate empirically which textual characteristics of +literary genres can be revealed through patterns of citation and +quotation. # Corpus -Our main corpus in the first project phase consisted of two literary works _Die Judenbuche_ by Annette von Droste-Hülshoff and _Michael Kohlhaas_ by Heinrich von Kleist with 44 and 49 scholarly articles, respectively. Fortunately, we could build on the previous work of the [ArguLIT project](https://gepris.dfg.de/gepris/projekt/372804438?language=en) and their annotations of all direct quotations. +Our main corpus consists of two literary works [Die +Judenbuche](https://en.wikipedia.org/wiki/Die_Judenbuche) by Annette +von Droste-Hülshoff and [Michael +Kohlhaas](https://en.wikipedia.org/wiki/Michael_Kohlhaas) by Heinrich +von Kleist with 44 and 49 scholarly articles, +respectively. Fortunately, we could build on the previous work of the +[ArguLIT +project](https://gepris.dfg.de/gepris/projekt/372804438?language=en) +and their annotation of all direct quotations. # Automatic Identification of Quotations -Scholarly texts contain a number of different types of quotations. For example, verbatim quotes from short lengths of single words to longer quotations spanning multiple sentences, and indirect quotations in the form of summarizations or re-narrations. In the first phase of the project, we focused on the automatic identification of linking of direct quotations starting with quotations of a length of five or more words. In [Lotte and Annette: A Framework for Finding and Exploring Key Passages in Literary Works](https://aclanthology.org/2021.nlp4dh-1.7.pdf)[^1], we outline the current landscape for text reuse detection and the development of our tool [Quid](https://hu.berlin/quid). Although there are a number of existing tools, we found that all had limitations for our specific use case. We evaluated Quid and compared it to the existing tools. +Scholarly texts contain different types of quotations. For example, +verbatim quotes of single words to longer quotations spanning multiple +sentences, and indirect quotations in the form of summarizations or +re-narrations. In the first phase of the project, we focused on the +automatic identification and linking of direct quotations starting +with quotations of a length of five or more words. In [Lotte and +Annette: A Framework for Finding and Exploring Key Passages in +Literary Works](https://aclanthology.org/2021.nlp4dh-1.7.pdf)[^1], we +outline the current landscape for text reuse detection and the +development of our tool [Quid](https://hu.berlin/quid). Although there +are a number of existing tools, we found that all had limitations for +our specific use case. We evaluated Quid and compared it to the +existing tools. [^1]: Lotte and Annette have since been renamed to Quid and QuidEx, respectively. @@ -85,13 +117,28 @@ Scholarly texts contain a number of different types of quotations. For example, -Considerably more difficult to identify are quotations which are shorter than 5 words. In _A Novel Approach for Identification and Linking of Short Quotations in Scholarly Texts and Literary Works_[^2], we develop and compare two approaches to tackle this challenge, _ProQuo_ and _ProQuoLM_. +**diese Tabelle kurz erklären** + +Considerably more difficult to identify are quotations which are +shorter than 5 words. In _A Novel Approach for Identification and +Linking of Short Quotations in Scholarly Texts and Literary +Works_[^2], we develop and compare two approaches to tackle this +challenge, _ProQuo_ and _ProQuoLM_. [^2]: Accepted at JCLS 2023 and soon to be published. -For ProQuo, we use the (page) references for long quotations as examples to tell apart (page) references for short quotations from other text in parenthesis. This includes references like those to the Bible or other literary works. We then relate short quotes to their source in the literary work by figuring out the relationships between the quotes and references. We also use the positions of long quotes as guides to link short quotations to the correct passage of the literary work. +For ProQuo, we use the (page) references for long quotations as +examples to tell apart (page) references for short quotations from +other text in parenthesis. This includes references like those to the +Bible or other literary works. We then relate short quotes to their +source in the literary work by figuring out the relationships between +the quotes and references. We also use the positions of long quotes as +guides to link short quotations to the correct passage of the literary +work. -For our second approach, ProQuoLM, we fine-tune a German BERT for classification. First, we identify potential short quotes, and then use the fine-tuned model to filter them. +For our second approach, ProQuoLM, we fine-tune a German BERT +**verlinken** for classification. First, we identify potential short +quotes, and then use the fine-tuned model to filter them. @@ -140,16 +187,48 @@ For our second approach, ProQuoLM, we fine-tune a German BERT for classification
-# QuidEx - Visualization and Exploration +**Tabelle erklären** -To allow for exploration of the results, we created [QuidEx](https://hu.berlin/quidex), a visualization and exploration website. +# QuidEx – Visualization and Exploration -On the left, there's a heatmap that displays the distribution of quoted passages in the entire literary text. The darker the area, the more frequently it has been quoted, suggesting its significance. Right beside the heatmap is the literary work itself. The grayscale indicates how many scholarly works quote any part of a crucial passage. This means the color remains constant for the entire key passage. The font size is adjusted based on how often a minimal segment is quoted. At the bottom, alongside the literary text, there's a list of all scholarly works. +We created [QuidEx](https://hu.berlin/quidex), a website for +visualization and exploration of the results, which is shown in this +screenshot:
Key passages, website
-In summary, we developed the tools to identify, link, visualize and explore direct quotations of all lengths. We are currently working on identifying and linking indirect quotations, that is summarizations and re-narrations. +On the left, there's a heatmap that displays the distribution of +quoted passages in the entire literary work. The darker the area, the +more frequently it has been quoted, suggesting its significance. Right +beside the heatmap is the literary work itself. The grayscale +indicates how many scholarly works quote any part of a crucial +passage. This means the level of gray remains constant for the entire +key passage. The font size is adjusted based on how often a minimal +segment is quoted. At the bottom, alongside the literary text, there's +a list of all scholarly works. + +**hier noch kurz andere Use-Case zeigen/verlinken? Uni Stuttgart?** + +**Quellcode-Repo verlinken?!** + +**Repo des SPP-CLS verlinken?** + +**Foto vom Banner (Würzburg? HU Hauptgeb? DOR26? oder gleich eine Dreiercollage?)** + +# Summary and Outlook + +In the first phase of the project, we developed tools to identify, +link, visualize, and explore direct quotations of all lengths. In +August 2023, the project went into its second phase, titled [Is Expert +Knowledge Key? Scholarly Interpretations as Resource for the Analysis +of Literary Texts in Computational Literary +Studies](https://www.projekte.hu-berlin.de/en/schluesselstellen/). +One important task we are currently working on, is the identification +and linking of indirect quotations, that is, summarizations and +re-narrations. -For daily key passages, follow us on [Bluesky](https://bsky.app/profile/fredr0id.bsky.social) or try Quid online with our [web interface](https://hu.berlin/quidweb). \ No newline at end of file +For daily key passages, follow us on +[Bluesky](https://bsky.app/profile/fredr0id.bsky.social) or try Quid +online with our [web interface](https://hu.berlin/quidweb).