Skip to content

Commit

Permalink
worked on the text, added suggestions
Browse files Browse the repository at this point in the history
  • Loading branch information
rjoberon committed Dec 14, 2023
1 parent 7c7bf33 commit 55942ef
Show file tree
Hide file tree
Showing 2 changed files with 93 additions and 14 deletions.
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ authors:
web: https://amor.cms.hu-berlin.de/~schwabmi/
orcid: 0000-0001-5569-6568
twitter:
description: Research fellow at Humboldt University of Berlin.
description: Research fellow at Humboldt-Universität zu Berlin..
cmil:
name: Carsten Milling
display_name: Carsten
Expand Down Expand Up @@ -222,7 +222,7 @@ authors:
github:
orcid: 0000-0002-0417-4054
twitter:
description: Research fellow at Humboldt University of Berlin.
description: Research fellow at Humboldt-Universität zu Berlin.

paginate: 10

Expand Down
103 changes: 91 additions & 12 deletions _drafts/2023-11-28-Key-Passages.markdown
Original file line number Diff line number Diff line change
@@ -1,22 +1,54 @@
---
title: "Working Title: Key Passages in Literary Works"
layout: post
author: [robert, frederik]
author: [frederik, robert]
comments: true
date: 2023-11-28
---

# Context

In the DFG project [What matters? Key Passages in Literary Works](https://www.projekte.hu-berlin.de/en/schluesselstellen/what-matters-key-passages-in-literary-works) which started with its first phase in 2020 and has been in the second phase, titled [Is Expert Knowledge Key? Scholarly Interpretations as Resource for the Analysis of Literary Texts in Computational Literary Studies](https://www.projekte.hu-berlin.de/en/schluesselstellen/index.html), since August 2023, we (who is we?) set out to identify and characterize key passages in literary works. We understand key passages as passages that are particularly important to expert readers when interpreting texts. In a mixed-methods approach, we primarily want to investigate empirically which textual characteristics of literary genres can be revealed through patterns of citation and quotation.
In the
[DFG-funded](https://gepris.dfg.de/gepris/projekt/424207720?language=en)
project [What matters? Key Passages in Literary
Works](https://www.projekte.hu-berlin.de/en/schluesselstellen/what-matters-key-passages-in-literary-works)
(which is part of the special priority programm [Computational
Literary Studies](https://dfg-spp-cls.github.io/)) we set out to
identify and characterize key passages in literary works.

We understand key passages as passages that are particularly important
to expert readers when interpreting texts. In a mixed-methods
approach, we investigate empirically which textual characteristics of
literary genres can be revealed through patterns of citation and
quotation.

# Corpus

Our main corpus in the first project phase consisted of two literary works _Die Judenbuche_ by Annette von Droste-Hülshoff and _Michael Kohlhaas_ by Heinrich von Kleist with 44 and 49 scholarly articles, respectively. Fortunately, we could build on the previous work of the [ArguLIT project](https://gepris.dfg.de/gepris/projekt/372804438?language=en) and their annotations of all direct quotations.
Our main corpus consists of two literary works [Die
Judenbuche](https://en.wikipedia.org/wiki/Die_Judenbuche) by Annette
von Droste-Hülshoff and [Michael
Kohlhaas](https://en.wikipedia.org/wiki/Michael_Kohlhaas) by Heinrich
von Kleist with 44 and 49 scholarly articles,
respectively. Fortunately, we could build on the previous work of the
[ArguLIT
project](https://gepris.dfg.de/gepris/projekt/372804438?language=en)
and their annotation of all direct quotations.

# Automatic Identification of Quotations

Scholarly texts contain a number of different types of quotations. For example, verbatim quotes from short lengths of single words to longer quotations spanning multiple sentences, and indirect quotations in the form of summarizations or re-narrations. In the first phase of the project, we focused on the automatic identification of linking of direct quotations starting with quotations of a length of five or more words. In [Lotte and Annette: A Framework for Finding and Exploring Key Passages in Literary Works](https://aclanthology.org/2021.nlp4dh-1.7.pdf)[^1], we outline the current landscape for text reuse detection and the development of our tool [Quid](https://hu.berlin/quid). Although there are a number of existing tools, we found that all had limitations for our specific use case. We evaluated Quid and compared it to the existing tools.
Scholarly texts contain different types of quotations. For example,
verbatim quotes of single words to longer quotations spanning multiple
sentences, and indirect quotations in the form of summarizations or
re-narrations. In the first phase of the project, we focused on the
automatic identification and linking of direct quotations starting
with quotations of a length of five or more words. In [Lotte and
Annette: A Framework for Finding and Exploring Key Passages in
Literary Works](https://aclanthology.org/2021.nlp4dh-1.7.pdf)[^1], we
outline the current landscape for text reuse detection and the
development of our tool [Quid](https://hu.berlin/quid). Although there
are a number of existing tools, we found that all had limitations for
our specific use case. We evaluated Quid and compared it to the
existing tools.

[^1]: Lotte and Annette have since been renamed to Quid and QuidEx, respectively.

Expand Down Expand Up @@ -85,13 +117,28 @@ Scholarly texts contain a number of different types of quotations. For example,
</tbody>
</table>

Considerably more difficult to identify are quotations which are shorter than 5 words. In _A Novel Approach for Identification and Linking of Short Quotations in Scholarly Texts and Literary Works_[^2], we develop and compare two approaches to tackle this challenge, _ProQuo_ and _ProQuoLM_.
**diese Tabelle kurz erklären**

Considerably more difficult to identify are quotations which are
shorter than 5 words. In _A Novel Approach for Identification and
Linking of Short Quotations in Scholarly Texts and Literary
Works_[^2], we develop and compare two approaches to tackle this
challenge, _ProQuo_ and _ProQuoLM_.

[^2]: Accepted at JCLS 2023 and soon to be published.

For ProQuo, we use the (page) references for long quotations as examples to tell apart (page) references for short quotations from other text in parenthesis. This includes references like those to the Bible or other literary works. We then relate short quotes to their source in the literary work by figuring out the relationships between the quotes and references. We also use the positions of long quotes as guides to link short quotations to the correct passage of the literary work.
For ProQuo, we use the (page) references for long quotations as
examples to tell apart (page) references for short quotations from
other text in parenthesis. This includes references like those to the
Bible or other literary works. We then relate short quotes to their
source in the literary work by figuring out the relationships between
the quotes and references. We also use the positions of long quotes as
guides to link short quotations to the correct passage of the literary
work.

For our second approach, ProQuoLM, we fine-tune a German BERT for classification. First, we identify potential short quotes, and then use the fine-tuned model to filter them.
For our second approach, ProQuoLM, we fine-tune a German BERT
**verlinken** for classification. First, we identify potential short
quotes, and then use the fine-tuned model to filter them.

<table>
<thead>
Expand Down Expand Up @@ -140,16 +187,48 @@ For our second approach, ProQuoLM, we fine-tune a German BERT for classification
</tbody>
</table>

# QuidEx - Visualization and Exploration
**Tabelle erklären**

To allow for exploration of the results, we created [QuidEx](https://hu.berlin/quidex), a visualization and exploration website.
# QuidEx – Visualization and Exploration

On the left, there's a heatmap that displays the distribution of quoted passages in the entire literary text. The darker the area, the more frequently it has been quoted, suggesting its significance. Right beside the heatmap is the literary work itself. The grayscale indicates how many scholarly works quote any part of a crucial passage. This means the color remains constant for the entire key passage. The font size is adjusted based on how often a minimal segment is quoted. At the bottom, alongside the literary text, there's a list of all scholarly works.
We created [QuidEx](https://hu.berlin/quidex), a website for
visualization and exploration of the results, which is shown in this
screenshot:

<figure style="text-align:center;">
<img src="/images/key-passages-website.jpg" alt="Key passages, website" style="width:900px; border: 1px solid transparent; border-color: black;" />
</figure>

In summary, we developed the tools to identify, link, visualize and explore direct quotations of all lengths. We are currently working on identifying and linking indirect quotations, that is summarizations and re-narrations.
On the left, there's a heatmap that displays the distribution of
quoted passages in the entire literary work. The darker the area, the
more frequently it has been quoted, suggesting its significance. Right
beside the heatmap is the literary work itself. The grayscale
indicates how many scholarly works quote any part of a crucial
passage. This means the level of gray remains constant for the entire
key passage. The font size is adjusted based on how often a minimal
segment is quoted. At the bottom, alongside the literary text, there's
a list of all scholarly works.

**hier noch kurz andere Use-Case zeigen/verlinken? Uni Stuttgart?**

**Quellcode-Repo verlinken?!**

**Repo des SPP-CLS verlinken?**

**Foto vom Banner (Würzburg? HU Hauptgeb? DOR26? oder gleich eine Dreiercollage?)**

# Summary and Outlook

In the first phase of the project, we developed tools to identify,
link, visualize, and explore direct quotations of all lengths. In
August 2023, the project went into its second phase, titled [Is Expert
Knowledge Key? Scholarly Interpretations as Resource for the Analysis
of Literary Texts in Computational Literary
Studies](https://www.projekte.hu-berlin.de/en/schluesselstellen/).
One important task we are currently working on, is the identification
and linking of indirect quotations, that is, summarizations and
re-narrations.

For daily key passages, follow us on [Bluesky](https://bsky.app/profile/fredr0id.bsky.social) or try Quid online with our [web interface](https://hu.berlin/quidweb).
For daily key passages, follow us on
[Bluesky](https://bsky.app/profile/fredr0id.bsky.social) or try Quid
online with our [web interface](https://hu.berlin/quidweb).

0 comments on commit 55942ef

Please sign in to comment.