Skip to content

Commit

Permalink
Merge pull request #190 from amir-zeldes/dev
Browse files Browse the repository at this point in the history
V10.1.0
  • Loading branch information
amir-zeldes authored May 16, 2024
2 parents e23a7c3 + 354b486 commit 9df08e9
Show file tree
Hide file tree
Showing 1,017 changed files with 15,156 additions and 14,713 deletions.
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Two documents from each completed genre are reserved for testing and devlopment,

## Citing

To cite this corpus in general, please refer to the following article, or see different citations for specific aspects below:
The best paper to cite depends on the data you are using. To cite the corpus in general, please refer to the following article (but note that the corpus has changed and grown a lot in the time since); otherwise see different citations for specific aspects below:

Zeldes, Amir (2017) "The GUM Corpus: Creating Multilayer Resources in the Classroom". Language Resources and Evaluation 51(3), 581–612.

Expand Down Expand Up @@ -68,6 +68,20 @@ If you are using the **Reddit** subset of GUM in particular, please use this cit
}
```

For papers focusing on the discourse relations, discourse markers or other discourse signal annotations, please cite [the eRST paper](https://arxiv.org/abs/2403.13560):

```
@misc{ZeldesEtAl2024,
title={{eRST}: A Signaled Graph Theory of Discourse Relations and Organization},
author={Amir Zeldes and Tatsuya Aoyama and Yang Janet Liu and Siyao Peng and Debopam Das and Luke Gessler},
year={2024},
eprint={2403.13560},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2403.13560}
}
```

If you are using the OntoNotes schema version of the coreference annotations (a.k.a. OntoGUM data in `coref/ontogum/`), please cite this paper instead:

```
Expand Down
5 changes: 3 additions & 2 deletions _build/KM_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@


use_cuda = torch.cuda.is_available()
#use_cuda = False
if use_cuda:
torch_t = torch.cuda
def from_numpy(ndarray):
if float(sys.version[:3]) <= 3.6:
if float(sys.version[:3]) <= 3.6 and float(sys.version[:3]) > 3.1:
return eval('torch.from_numpy(ndarray).pin_memory().cuda(async=True)')
else:
return torch.from_numpy(ndarray).pin_memory().cuda(non_blocking=True)
Expand All @@ -32,7 +33,7 @@ def from_numpy(ndarray):
Sub_Head = "<H>"
No_Head = "<N>"

DTYPE = torch.uint8 if float(sys.version[:3]) < 3.7 else torch.bool
DTYPE = torch.uint8 if float(sys.version[:3]) < 3.7 and float(sys.version[:3]) > 3.1 else torch.bool

TAG_UNK = "UNK"

Expand Down
2 changes: 1 addition & 1 deletion _build/build_gum.py
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ def check_diff(xml, ptb, docname):
os.makedirs(gum_target + "coref" + os.sep + "conll" + os.sep)

try:
pepper_params = io.open("utils" + os.sep + "pepper" + os.sep + "merge_gum.pepperparams", encoding="utf8").read().replace("\r","")
pepper_params = io.open(pepper_home + "merge_gum.pepperparams", encoding="utf8").read().replace("\r","")
except:
sys.__stdout__.write("x Can't find pepper template at: "+"utils" + os.sep + "pepper" + os.sep + "merge_gum.pepperparams"+"\n Aborting...")
sys.exit()
Expand Down
2 changes: 1 addition & 1 deletion _build/src/const/GUM_conversation_grounded.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,7 @@
(ROOT (INTJ (UH Oh) (. .)))

(ROOT
(S (NP (PRP$ Her) (NN mom)) (VP (VB call) (NP (PRP you))) (, ,)))
(S (NP (PRP$ Her) (NN mom)) (VP (VB call) (NP (PRP you))) (: —)))

(ROOT (FRAG (JJ Right) (, ,) (JJ right) (. .)))

Expand Down
2 changes: 1 addition & 1 deletion _build/src/const/GUM_court_fire.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -998,7 +998,7 @@
(VP
(VBP are)
(VP
(VBN mixed)
(JJ mixed)
(NP
(NP
(CD two)
Expand Down
7 changes: 3 additions & 4 deletions _build/src/const/GUM_letter_arendt.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@
(VP
(VBZ is)
(VP
(VBN married)
(JJ married)
(PP
(IN to)
(NP
Expand All @@ -240,7 +240,7 @@
(VBZ is)
(ADJP
(RB quite)
(VBN wasted)
(JJ wasted)
(PP (IN on) (NP (DT this) (NN royalist))))))))))))
(. .)))

Expand Down Expand Up @@ -519,8 +519,7 @@
(VP
(VBP are)
(ADJP
(VBN
crowed))))))))))))))))))))))))))))
(JJ crowed))))))))))))))))))))))))))))
(. .)))

(ROOT
Expand Down
2 changes: 1 addition & 1 deletion _build/src/const/GUM_letter_gorbachev.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,7 @@
(NN fact)
(SBAR
(IN that)
(S (NP (NN land)) (VP (VBZ is) (ADJP (VBN limited))))))))))
(S (NP (NN land)) (VP (VBZ is) (ADJP (JJ limited))))))))))
(. .)))

(ROOT
Expand Down
13 changes: 6 additions & 7 deletions _build/src/const/GUM_letter_mandela.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,7 @@
(ROOT
(S
(ADVP
(IN Outside)
(RB Outside)
(PP (IN of) (NP (DT the) (NNP Nationalist) (NNP Party))))
(, ,)
(NP
Expand Down Expand Up @@ -1111,12 +1111,11 @@
(S
(NP (PRP We))
(VP
(VP
(VBP urge)
(NP (PRP you))
(ADVP (RB strongly))
(S (VP (TO to) (VP (VB speak) (PRT (RP out))))))
(RB now))(. .)))
(VBP urge)
(NP (PRP you))
(ADVP (RB strongly))
(S (VP (TO to) (VP (VB speak) (PRT (RP out)) (ADVP (RB now))))))
(. .)))

(ROOT
(S
Expand Down
2 changes: 1 addition & 1 deletion _build/src/const/GUM_news_election.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
(JJ anti-establishment)
(CC and)
(JJ pro-Beijing)
(NN campus)))))))
(NNS campus)))))))
(. .)))

(ROOT (NP (NN Election) (NNS Results)))
Expand Down
16 changes: 7 additions & 9 deletions _build/src/const/GUM_podcast_addiction.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@
(CC but)
(S
(NP (PRP they))
(VP (VBP are) (ADVP (RB still)) (ADJP (VBN addicted))))
(VP (VBP are) (ADVP (RB still)) (ADJP (JJ addicted))))
(. .)))

(ROOT (INTJ (JJ Right) (. .)))
Expand Down Expand Up @@ -241,7 +241,7 @@
(NP (PRP I))
(VP
(VBP 'm)
(ADJP (VBN addicted) (PP (IN to) (NP (NN sugar))))))
(ADJP (JJ addicted) (PP (IN to) (NP (NN sugar))))))
(CC or)
(S
(NP (PRP I))
Expand Down Expand Up @@ -954,7 +954,7 @@
(NP (PRP you))
(VP
(MD can)
(VP (VB get) (VP (VBN addicted) (PP (IN to))))))))
(VP (VB get) (VP (JJ addicted) (PP (IN to))))))))
(CC or)
(NP
(NP (NN anything))
Expand All @@ -965,7 +965,7 @@
(NP (PRP you))
(VP
(MD can)
(VP (VB get) (VP (VBN addicted) (PP (IN to)))))))))
(VP (VB get) (VP (JJ addicted) (PP (IN to)))))))))
(. ?)))

(ROOT (NP (DT A) (JJ great) (NN amount) (. .)))
Expand Down Expand Up @@ -1025,7 +1025,7 @@
(VP
(VBP are)
(PRN (, ,) (INTJ (NN quote)) (, ,))
(ADJP (VBN addicted) (PP (IN to))))))))))))
(ADJP (JJ addicted) (PP (IN to))))))))))))
(. .)))

(ROOT
Expand Down Expand Up @@ -1056,7 +1056,7 @@
(VP
(VBZ is)
(ADJP
(VBN addicted)
(JJ addicted)
(PP
(IN to)
(S (VP (VBG licking) (NP (PRP$ her) (NN cat))))))))))
Expand Down Expand Up @@ -1285,9 +1285,7 @@
(SQ
(VBZ is)
(NP (DT this) (NN person))
(ADJP
(VBN addicted)
(PP (IN to) (NP (NN hand) (NN washing))))))
(ADJP (JJ addicted) (PP (IN to) (NP (NN hand) (NN washing))))))
(. ?)))

(ROOT
Expand Down
2 changes: 1 addition & 1 deletion _build/src/const/GUM_podcast_collaboration.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -816,7 +816,7 @@
(NP (PRP I))
(VP
(VBD was)
(ADJP (VBN done) (PP (IN with) (NP (PRP it)))))))))
(ADJP (JJ done) (PP (IN with) (NP (PRP it)))))))))
(, ,)
(NP (EX there))
(VP
Expand Down
2 changes: 1 addition & 1 deletion _build/src/const/GUM_podcast_wrestling.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -733,7 +733,7 @@
(NP (DT the) (NN rose) (VBN colored) (NNS glasses))
(VP (VBP are) (ADVP (RB off))))
(, ,)
(S (NP (DT the) (NN nostalgia)) (VP (VBZ is) (VP (VBN gone))))
(S (NP (DT the) (NN nostalgia)) (VP (VBZ is) (ADJP (JJ gone))))
(. .)))

(ROOT
Expand Down
2 changes: 1 addition & 1 deletion _build/src/const/GUM_vlog_covid.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -602,7 +602,7 @@
(VP
(VBD was)
(PP
(QP (ADVP (RB around) (UH like)) (CD 6:00))
(QP (ADVP (IN around) (UH like)) (CD 6:00))
(NN pm)))))))))
(. .)))

Expand Down
4 changes: 2 additions & 2 deletions _build/src/const/GUM_voyage_york.ptb
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@
(ROOT
(S
(NP
(NP (NNP Constantine) (DT the) (JJ Great))
(NP (NNP Constantine) (DT the) (NNP Great))
(: -)
(ADJP
(ADJP
Expand All @@ -184,7 +184,7 @@
(ADVP (RB first))
(VP
(VBN proclaimed)
(S (NP (NN Emperor)))
(S (NP (NNP Emperor)))
(PP (IN in) (NP (DT the) (NN city)))))
(. .)))

Expand Down
2 changes: 1 addition & 1 deletion _build/src/dep/GUM_academic_exposure.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -633,7 +633,7 @@
29 directly _ _ _ _ 30 advmod _ _
30 measured _ _ _ _ 25 acl _ _
31 , _ _ _ _ 35 punct _ _
32 except _ _ _ _ 35 mark _ _
32 except _ _ _ _ 35 case _ _
33 by _ _ _ _ 35 case _ _
34 [ _ _ _ _ 35 punct _ _
35 24 _ _ _ _ 30 obl:agent _ _
Expand Down
2 changes: 1 addition & 1 deletion _build/src/dep/GUM_academic_lighting.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -656,7 +656,7 @@
1 In _ _ _ _ 2 case _ _
2 addition _ _ _ _ 0 root _ _
3 to _ _ _ _ 6 case _ _
4 the _ _ _ _ 5 det _ _
4 the _ _ _ _ 6 det _ _
5 harmful _ _ _ _ 6 amod _ _
6 effects _ _ _ _ 2 nmod _ _
7 of _ _ _ _ 8 case _ _
Expand Down
4 changes: 2 additions & 2 deletions _build/src/dep/GUM_bio_chao.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@
33 Chinese _ _ _ _ 31 obl _ _
34 Bernhard _ _ _ _ 38 nmod:poss _ _
35 Karlgren _ _ _ _ 34 flat _ _
36 's _ _ _ _ 35 case _ _
36 's _ _ _ _ 34 case _ _
37 monumental _ _ _ _ 38 amod _ _
38 Etudes _ _ _ _ 29 obj _ _
39 sur _ _ _ _ 38 flat _ _
Expand Down Expand Up @@ -486,7 +486,7 @@
21 became _ _ _ _ 10 acl:relcl _ _
22 Agassiz _ _ _ _ 23 compound _ _
23 Professor _ _ _ _ 21 xcomp _ _
24 of _ _ _ _ 25 case _ _
24 of _ _ _ _ 26 case _ _
25 Oriental _ _ _ _ 26 amod _ _
26 Languages _ _ _ _ 23 nmod _ _
27 . _ _ _ _ 7 punct _ _
Expand Down
4 changes: 2 additions & 2 deletions _build/src/dep/GUM_bio_enfant.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -565,7 +565,7 @@
10 the _ _ _ _ 12 det _ _
11 City _ _ _ _ 12 compound _ _
12 Hall _ _ _ _ 9 obj _ _
13 in _ _ _ _ 14 case _ _
13 in _ _ _ _ 15 case _ _
14 New _ _ _ _ 15 amod _ _
15 York _ _ _ _ 12 nmod _ _
16 for _ _ _ _ 19 case _ _
Expand Down Expand Up @@ -674,7 +674,7 @@
20 the _ _ _ _ 22 det _ _
21 Grand _ _ _ _ 22 amod _ _
22 Lodge _ _ _ _ 28 nsubj _ _
23 of _ _ _ _ 24 case _ _
23 of _ _ _ _ 25 case _ _
24 New _ _ _ _ 25 amod _ _
25 York _ _ _ _ 22 nmod _ _
26 F&AM _ _ _ _ 22 appos _ _
Expand Down
4 changes: 2 additions & 2 deletions _build/src/dep/GUM_bio_galois.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -762,8 +762,8 @@
7 newspaper _ _ _ _ 8 compound _ _
8 clippings _ _ _ _ 17 nsubj _ _
9 from _ _ _ _ 13 case _ _
10 only _ _ _ _ 12 advmod _ _
11 a _ _ _ _ 12 det _ _
10 only _ _ _ _ 13 advmod _ _
11 a _ _ _ _ 13 det _ _
12 few _ _ _ _ 13 amod _ _
13 days _ _ _ _ 8 nmod _ _
14 after _ _ _ _ 16 case _ _
Expand Down
2 changes: 1 addition & 1 deletion _build/src/dep/GUM_bio_jespersen.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -972,7 +972,7 @@
5 from _ _ _ _ 7 case _ _
6 Columbia _ _ _ _ 7 compound _ _
7 University _ _ _ _ 2 obl _ _
8 in _ _ _ _ 9 case _ _
8 in _ _ _ _ 10 case _ _
9 New _ _ _ _ 10 amod _ _
10 York _ _ _ _ 7 nmod _ _
11 ( _ _ _ _ 12 punct _ _
Expand Down
2 changes: 1 addition & 1 deletion _build/src/dep/GUM_bio_moreau.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@
14 spending _ _ _ _ 9 advcl _ _
15 vacations _ _ _ _ 14 obj _ _
16 at _ _ _ _ 20 case _ _
17 the _ _ _ _ 18 det _ _
17 the _ _ _ _ 20 det _ _
18 paternal _ _ _ _ 20 amod _ _
19 ancestral _ _ _ _ 20 amod _ _
20 village _ _ _ _ 14 obl _ _
Expand Down
4 changes: 2 additions & 2 deletions _build/src/dep/GUM_conversation_grounded.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@
2 mom _ _ _ _ 3 nsubj _ _
3 call _ _ _ _ 0 root _ _
4 you _ _ _ _ 3 obj _ _
5 , _ _ _ _ 3 punct _ _
5 _ _ _ _ 3 punct _ _

1 Right _ _ _ _ 0 root _ _
2 , _ _ _ _ 3 punct _ _
Expand Down Expand Up @@ -531,7 +531,7 @@
8 do _ _ _ _ 10 aux _ _
9 n't _ _ _ _ 10 advmod _ _
10 know _ _ _ _ 2 parataxis _ _
11 how _ _ _ _ 12 mark _ _
11 how _ _ _ _ 12 advmod _ _
12 many _ _ _ _ 13 amod _ _
13 times _ _ _ _ 10 obj _ _
14 I _ _ _ _ 15 nsubj _ _
Expand Down
8 changes: 4 additions & 4 deletions _build/src/dep/GUM_conversation_zero.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -1076,11 +1076,11 @@
3 okay _ _ _ _ 1 discourse _ _
4 . _ _ _ _ 1 punct _ _

1 Where _ _ _ _ 4 advmod _ _
2 's _ _ _ _ 4 cop _ _
1 Where _ _ _ _ 0 root _ _
2 's _ _ _ _ 1 cop _ _
3 the _ _ _ _ 4 det _ _
4 test _ _ _ _ 0 root _ _
5 ? _ _ _ _ 4 punct _ _
4 test _ _ _ _ 1 nsubj _ _
5 ? _ _ _ _ 1 punct _ _

1 There _ _ _ _ 2 expl _ _
2 ai _ _ _ _ 0 root _ _
Expand Down
4 changes: 2 additions & 2 deletions _build/src/dep/GUM_essay_distraction.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -498,11 +498,11 @@
30 day _ _ _ _ 21 obl _ _
31 , _ _ _ _ 33 punct _ _
32 no _ _ _ _ 33 det _ _
33 matter _ _ _ _ 21 obl:tmod _ _
33 matter _ _ _ _ 21 advcl _ _
34 what _ _ _ _ 37 nsubj _ _
35 else _ _ _ _ 34 advmod _ _
36 is _ _ _ _ 37 aux _ _
37 going _ _ _ _ 33 acl _ _
37 going _ _ _ _ 33 csubj _ _
38 on _ _ _ _ 37 compound:prt _ _
39 . _ _ _ _ 6 punct _ _

Expand Down
Loading

0 comments on commit 9df08e9

Please sign in to comment.