From d2b3c3cf3a71ce39a8f186852d21729011ad8670 Mon Sep 17 00:00:00 2001 From: Nathan Schneider Date: Mon, 18 Nov 2024 22:37:45 -0500 Subject: [PATCH] INDEX.md: include category/lexical info in notes --- INDEX.md | 190 +++++++++++++++++++++--------------------- scripts/make_index.py | 6 +- 2 files changed, 98 insertions(+), 98 deletions(-) diff --git a/INDEX.md b/INDEX.md index b91200d..8328d57 100644 --- a/INDEX.md +++ b/INDEX.md @@ -455,101 +455,101 @@ See also: [STATS.md](STATS.md) # Node Notes -- ''clitic'' (`Tree IMeanYeahOK-0`) -- 'Wrong answers only version' (https://twitter.com/DailySyntaxTree/status/1351714293071409157) (`Tree WomanRuledDead2-0`) -- 'a full two seconds'; cf. 'mere' ([lunged](datasets/oneoff/pdf/lunged.pdf)) -- 'all over': treating 'all' as modifier (see CGEL p. 645) (`newsgroup-groups.google.com_Meditation20052_06390a5f75b2e1f2_ENG_20050316_091700-0036`) -- 'all the while' idiom with a relative clause (not serving as modifier in a clause) ([mutantfleas](datasets/oneoff/pdf/mutantfleas.pdf)) -- 'all' as modifier in PP: p. 645 ([insectspecies](datasets/oneoff/pdf/insectspecies.pdf)) -- 'enough' as post-head modifier: p. 397 (`reviews-124163-0001`) -- 'only' phrase triggering subj-aux inversion (pp. 95-96) ([vaporization](datasets/oneoff/pdf/vaporization.pdf)) -- Adj-of-BodyPart cxn ([uniform](datasets/oneoff/pdf/uniform.pdf)) -- I_x dreamt I_x/y was Beyonce_y and I_x kissed me_y (`Tree IdreamtIwasBeyoncé-0`) -- Looks like a case where either PP or fused relative analysis is possible (p. 1078); go with PP for simplicity (`reviews-299169-0003`) -- PP as subject (pp. 646-647) (`reviews-122564-0002`) -- PP in lieu of DP, p. 433 (`newsgroup-groups.google.com_INTPunderground_b2c62e87877e4a22_ENG_20050906_165900-0085`) -- TIME DURATION + earlier/later: pp. 632, 698 ([swingingbed](datasets/oneoff/pdf/swingingbed.pdf)) -- TODO: belongs under NP? (`email-enronsent39_01-0060`) -- TODO: flat? (`email-enronsent13_01-0092`) -- TODO: stative depictive: Comp or PredComp? ([uniform](datasets/oneoff/pdf/uniform.pdf)) -- TODO: xpos ([usc-rule2002a](datasets/oneoff/pdf/usc-rule2002a.pdf)) -- TODO: xpos ([usc-rule2002a](datasets/oneoff/pdf/usc-rule2002a.pdf)) -- This is the combining form -like, which is fairly productive (p. 1711). An argument could be made for :Mod as Nom not NP as this resembles attributive modification in the N-N compound construction: 'anchovy(-like) pizza' > 'anchovies(-like) pizza' (`answers-20111108104724AAuBUR7_ans-0087`) -- X, not Y (pp. 1313-1314) (`answers-20111107164802AAq8nhF_ans-0007`) -- absolute ([newlife](datasets/oneoff/pdf/newlife.pdf)) -- adverb (p. 566) (`answers-20111108103333AA3eSCk_ans-0019`) -- ambiguous attachment 'of the department...' ([usc34-1](datasets/oneoff/pdf/usc34-1.pdf) `Title 34-1`) -- awkward treatment of subject and copula ellipsis in diary style: (I am) now outside (ideally 'now' would be a Mod, but without a verb there is no VP to host it) (`Tree NowOutsideInZero-0`) -- bare age (`weblog-blogspot.com_aggressivevoicedaily_20060811122000_ENG_20060811_122000-0033`) -- can't decide whether transitive or intransitive PP ([atonement](datasets/oneoff/pdf/atonement.pdf)) -- central adjunct following aux; cf. 'the question *really* is' (`Tree ImLegitWritingIt-0`) -- central adjunct preceding aux, p. 780; clause-oriented adjunct pp. 575-578; unclear whether it should be regarded as inside the VP or not (`Tree AThirdWaveIsPreventable-0`) -- cf. p. 331 [10b] ([insectspecies](datasets/oneoff/pdf/insectspecies.pdf)) -- cf. until later, p. 640 ([swingingbed](datasets/oneoff/pdf/swingingbed.pdf)) -- city-state construction (`answers-20111108084416AAoPgBv_ans-0004`) -- comparative (`Tree IdidntRealize-0`) -- conditional inversion (p. 96) ([bakhmut](datasets/oneoff/pdf/bakhmut.pdf)) -- depictive supplement ([lunged](datasets/oneoff/pdf/lunged.pdf)) -- depictive supplement cf. p. 1265 [4iia]? ([insectspecies](datasets/oneoff/pdf/insectspecies.pdf)) -- directional preposition modifier (p. 645) (`answers-20111108091921AAaLK4e_ans-0070`) -- directional preposition modifier in PP (p. 645) (`email-enronsent35_01-0010`) -- dislocation (`answers-20111108082304AAEbrNs_ans-0016`) -- double preposition stranding ([kindoffriend](datasets/oneoff/pdf/kindoffriend.pdf)) -- double-complement PP; cf. 'from Boston to Providence', p. 641. Also a spatial PP taking the place of an object in clause structure ([waistup](datasets/oneoff/pdf/waistup.pdf)) -- elementary property NP as determiner (p. 357) ([whatcolorsocks](datasets/oneoff/pdf/whatcolorsocks.pdf)) -- ellipsis of auxiliary (does)? (`Tree WhoDidYouSee-0`) -- enumerated proper name ([usc-acreage](datasets/oneoff/pdf/usc-acreage.pdf) `Title 34-2`) -- enumerated proper name ([usc-rule2002a](datasets/oneoff/pdf/usc-rule2002a.pdf)) -- enumerated proper name ([usc-rule2002a](datasets/oneoff/pdf/usc-rule2002a.pdf)) -- enumerated proper name (`reviews-020992-0003`) -- exclamative ([bedtime](datasets/oneoff/pdf/bedtime.pdf)) -- exclamative ([newlife](datasets/oneoff/pdf/newlife.pdf)) -- exclamative ([schumer](datasets/oneoff/pdf/schumer.pdf)) -- exclamative (`Tree IdidntRealize-0`) -- exclamatory-interrogative ([howstupid](datasets/oneoff/pdf/howstupid.pdf)) -- format-italics-emphasis ([vaporization](datasets/oneoff/pdf/vaporization.pdf)) -- format-italics-publication ([mutantfleas](datasets/oneoff/pdf/mutantfleas.pdf)) -- fronted partitive PP: p. 903 discusses 'which' + partitives but not with the partitive fronted (`answers-20111107155815AA6LXXJ_ans-0001`) -- fully-gapped-ok ([howstupid](datasets/oneoff/pdf/howstupid.pdf)) -- hollow to-infinitival as indirect complement licensed by adjective: see p. 1249 ([bakhmut](datasets/oneoff/pdf/bakhmut.pdf)) -- imperative (`twitter-etc-trial-0007`) -- impersonal construction (p. 960) (`Tree ItIsntThat-0`) -- implicit partitive fused-head quantificational adjunct, pp. 413, 428 (`Tree WeReAllFriends-0`) -- implied 'is' (headlinese) ([xkcd-garden-path](datasets/oneoff/pdf/xkcd-garden-path.pdf)) -- interpreted as compound pronoun (p. 427) (`Tree I-mMutingMyself-0`) -- interrogative ([bedtime](datasets/oneoff/pdf/bedtime.pdf)) -- interrogative: cf. p. 1077 29[iii] ([dinner](datasets/oneoff/pdf/dinner.pdf)) -- it-cleft (`twitter-etc-trial-0008`) -- it-cleft as a question (`twitter-etc-trial-0009`) -- medial adverb (`Tree ThisMustNowBe-0`) -- medial adverb modifier within nominal. see Payne/Huddleston/Pullum (2010), 'The distribution and category status of adjectives and adverbs' (`Tree TheArrivalRecentlyOf-0`) -- metalinguistic mention ([uniform](datasets/oneoff/pdf/uniform.pdf)) -- metalinguistic mention ([uniform](datasets/oneoff/pdf/uniform.pdf)) -- more than (number): p. 432 (`twitter-etc-trial-0002`) -- multi-gaps-ok ([leisure](datasets/oneoff/pdf/leisure.pdf)) -- multi-gaps-ok: across-the-board extraction from coordinated subject-relative and object-relative (`twitter-etc-trial-0005`) -- multi-gaps-ok: wh-extraction from an it-cleft (`reviews-206303-0004`) -- nonreferential distributive indefinite (cf. '50 miles an hour'; pp. 408, 446) ([insectspecies](datasets/oneoff/pdf/insectspecies.pdf)) -- not only: pp. 1314-1315 (`reviews-008585-0004`) -- p. 1660 says this construction produces compound adjectives that 'readily convert to nouns', though arguably '6 year' could be treated like an attributive Nom in a compound (`Tree UpdatingAnApp-0`) -- p. 385: degree modifier in clause structure, AmE (cf. %That wouldn't help us any.) ([appetite](datasets/oneoff/pdf/appetite.pdf)) -- p. 445 (`answers-20111108103333AA3eSCk_ans-0019`) -- passive ([leisure](datasets/oneoff/pdf/leisure.pdf)) -- post-head 'each' meaning 'apiece': akin to post-head internal modifier 'one day more' (p. 445) (`newsgroup-groups.google.com_hiddennook_e21e429b3ad58235_ENG_20050830_214700-0010`) -- post-head modifier of compound determinative, p. 423 (`answers-20111108084416AAoPgBv_ans-0004`) -- predeterminer ([mutantfleas](datasets/oneoff/pdf/mutantfleas.pdf)) -- predeterminer (`Tree AllYourBase-0`) -- predeterminer (`Tree AllYourTreeDiagrams-0`) -- predeterminer (`Tree ThatIsSuchA-0`) -- prepositional passive: SIEG2 p. 367 (`twitter-etc-trial-0007`) -- pseudo-cleft with content clause; cf. p. 1421 [26i] ([mutantfleas](datasets/oneoff/pdf/mutantfleas.pdf)) -- punishment sense of 'for' - an argument could be made for Comp (`Tree DeltaHasBanned-0`) -- resumptive pronoun (`Tree Here-sThePaper-0`) -- structure of 'along with' is unclear; treating 'along' as head (`newsgroup-groups.google.com_eHolistic_2dd76f31ceb6bfe8_ENG_20050513_224200-0022`) -- surprising coordinate (a relative clause would be expected) (`twitter-etc-trial-0010`) -- temperature expression (measurement) (`answers-20111108104957AAsMzvU_ans-0006`) -- tough-adjective with hollow to-infinitival as indirect complement (pp. 1248-1249) (`answers-20111106015552AAj6rCu_ans-0001`) -- with-absolute construction; possibly should be considered a verbless clause? (`email-enronsent07_01-0061`) +- `Adj` _full_ 'a full two seconds'; cf. 'mere' ([lunged](datasets/oneoff/pdf/lunged.pdf)) +- `Adj` _good_ tough-adjective with hollow to-infinitival as indirect complement (pp. 1248-1249) (`answers-20111106015552AAj6rCu_ans-0001`) +- `Adj` _like_ This is the combining form -like, which is fairly productive (p. 1711). An argument could be made for :Mod as Nom not NP as this resembles attributive modification in the N-N compound construction: 'anchovy(-like) pizza' > 'anchovies(-like) pizza' (`answers-20111108104724AAuBUR7_ans-0087`) +- `AdjP` Adj-of-BodyPart cxn ([uniform](datasets/oneoff/pdf/uniform.pdf)) +- `AdjP` p. 445 (`answers-20111108103333AA3eSCk_ans-0019`) +- `AdjP` predeterminer (`Tree ThatIsSuchA-0`) +- `Adv` _alone_ cf. p. 331 [10b] ([insectspecies](datasets/oneoff/pdf/insectspecies.pdf)) +- `Adv` _earlier_ TIME DURATION + earlier/later: pp. 632, 698 ([swingingbed](datasets/oneoff/pdf/swingingbed.pdf)) +- `Adv` _legit_ central adjunct following aux; cf. 'the question *really* is' (`Tree ImLegitWritingIt-0`) +- `Adv` _really_ central adjunct preceding aux, p. 780; clause-oriented adjunct pp. 575-578; unclear whether it should be regarded as inside the VP or not (`Tree AThirdWaveIsPreventable-0`) +- `Adv` _recently_ medial adverb modifier within nominal. see Payne/Huddleston/Pullum (2010), 'The distribution and category status of adjectives and adverbs' (`Tree TheArrivalRecentlyOf-0`) +- `Adv` _sometimes_ adverb (p. 566) (`answers-20111108103333AA3eSCk_ans-0019`) +- `AdvP` medial adverb (`Tree ThisMustNowBe-0`) +- `AdvP` not only: pp. 1314-1315 (`reviews-008585-0004`) +- `Clause` awkward treatment of subject and copula ellipsis in diary style: (I am) now outside (ideally 'now' would be a Mod, but without a verb there is no VP to host it) (`Tree NowOutsideInZero-0`) +- `Clause` comparative (`Tree IdidntRealize-0`) +- `Clause` conditional inversion (p. 96) ([bakhmut](datasets/oneoff/pdf/bakhmut.pdf)) +- `Clause` depictive supplement ([lunged](datasets/oneoff/pdf/lunged.pdf)) +- `Clause` depictive supplement cf. p. 1265 [4iia]? ([insectspecies](datasets/oneoff/pdf/insectspecies.pdf)) +- `Clause` ellipsis of auxiliary (does)? (`Tree WhoDidYouSee-0`) +- `Clause` exclamative ([bedtime](datasets/oneoff/pdf/bedtime.pdf)) +- `Clause` exclamative ([newlife](datasets/oneoff/pdf/newlife.pdf)) +- `Clause` exclamative ([schumer](datasets/oneoff/pdf/schumer.pdf)) +- `Clause` exclamative (`Tree IdidntRealize-0`) +- `Clause` exclamatory-interrogative ([howstupid](datasets/oneoff/pdf/howstupid.pdf)) +- `Clause` hollow to-infinitival as indirect complement licensed by adjective: see p. 1249 ([bakhmut](datasets/oneoff/pdf/bakhmut.pdf)) +- `Clause` imperative (`twitter-etc-trial-0007`) +- `Clause` impersonal construction (p. 960) (`Tree ItIsntThat-0`) +- `Clause` interrogative ([bedtime](datasets/oneoff/pdf/bedtime.pdf)) +- `Clause` interrogative: cf. p. 1077 29[iii] ([dinner](datasets/oneoff/pdf/dinner.pdf)) +- `Clause` it-cleft (`twitter-etc-trial-0008`) +- `Clause` it-cleft as a question (`twitter-etc-trial-0009`) +- `Clause` multi-gaps-ok: wh-extraction from an it-cleft (`reviews-206303-0004`) +- `Clause` passive ([leisure](datasets/oneoff/pdf/leisure.pdf)) +- `Clause` prepositional passive: SIEG2 p. 367 (`twitter-etc-trial-0007`) +- `Clause` surprising coordinate (a relative clause would be expected) (`twitter-etc-trial-0010`) +- `Clause_rel` multi-gaps-ok ([leisure](datasets/oneoff/pdf/leisure.pdf)) +- `Clause_rel` multi-gaps-ok: across-the-board extraction from coordinated subject-relative and object-relative (`twitter-etc-trial-0005`) +- `Coordination` I_x dreamt I_x/y was Beyonce_y and I_x kissed me_y (`Tree IdreamtIwasBeyoncé-0`) +- `Coordination` X, not Y (pp. 1313-1314) (`answers-20111107164802AAq8nhF_ans-0007`) +- `Coordination` ambiguous attachment 'of the department...' ([usc34-1](datasets/oneoff/pdf/usc34-1.pdf) `Title 34-1`) +- `D` _a_ nonreferential distributive indefinite (cf. '50 miles an hour'; pp. 408, 446) ([insectspecies](datasets/oneoff/pdf/insectspecies.pdf)) +- `D` _all_ 'all' as modifier in PP: p. 645 ([insectspecies](datasets/oneoff/pdf/insectspecies.pdf)) +- `D` _all_ format-italics-emphasis ([vaporization](datasets/oneoff/pdf/vaporization.pdf)) +- `D` _all_ implicit partitive fused-head quantificational adjunct, pp. 413, 428 (`Tree WeReAllFriends-0`) +- `D` _all_ predeterminer ([mutantfleas](datasets/oneoff/pdf/mutantfleas.pdf)) +- `D` _any_ p. 385: degree modifier in clause structure, AmE (cf. %That wouldn't help us any.) ([appetite](datasets/oneoff/pdf/appetite.pdf)) +- `D` _enough_ 'enough' as post-head modifier: p. 397 (`reviews-124163-0001`) +- `DP` more than (number): p. 432 (`twitter-etc-trial-0002`) +- `DP` post-head 'each' meaning 'apiece': akin to post-head internal modifier 'one day more' (p. 445) (`newsgroup-groups.google.com_hiddennook_e21e429b3ad58235_ENG_20050830_214700-0010`) +- `DP` predeterminer (`Tree AllYourBase-0`) +- `DP` predeterminer (`Tree AllYourTreeDiagrams-0`) +- `N` _2002(a)_ TODO: xpos ([usc-rule2002a](datasets/oneoff/pdf/usc-rule2002a.pdf)) +- `N` _6-year-old_ p. 1660 says this construction produces compound adjectives that 'readily convert to nouns', though arguably '6 year' could be treated like an attributive Nom in a compound (`Tree UpdatingAnApp-0`) +- `N` _Indians_ metalinguistic mention ([uniform](datasets/oneoff/pdf/uniform.pdf)) +- `N` _NDIAN_ metalinguistic mention ([uniform](datasets/oneoff/pdf/uniform.pdf)) +- `N` _Sun_ format-italics-publication ([mutantfleas](datasets/oneoff/pdf/mutantfleas.pdf)) +- `N` _clitic_ ''clitic'' (`Tree IMeanYeahOK-0`) +- `N` _while_ 'all the while' idiom with a relative clause (not serving as modifier in a clause) ([mutantfleas](datasets/oneoff/pdf/mutantfleas.pdf)) +- `N` _§707(a)(3)_ TODO: xpos ([usc-rule2002a](datasets/oneoff/pdf/usc-rule2002a.pdf)) +- `NP` bare age (`weblog-blogspot.com_aggressivevoicedaily_20060811122000_ENG_20060811_122000-0033`) +- `NP` city-state construction (`answers-20111108084416AAoPgBv_ans-0004`) +- `NP` dislocation (`answers-20111108082304AAEbrNs_ans-0016`) +- `NP` elementary property NP as determiner (p. 357) ([whatcolorsocks](datasets/oneoff/pdf/whatcolorsocks.pdf)) +- `NP` pseudo-cleft with content clause; cf. p. 1421 [26i] ([mutantfleas](datasets/oneoff/pdf/mutantfleas.pdf)) +- `NP+Clause` absolute ([newlife](datasets/oneoff/pdf/newlife.pdf)) +- `N_pro` _it_ resumptive pronoun (`Tree Here-sThePaper-0`) +- `N_pro` _you all_ interpreted as compound pronoun (p. 427) (`Tree I-mMutingMyself-0`) +- `Nom` 'Wrong answers only version' (https://twitter.com/DailySyntaxTree/status/1351714293071409157) (`Tree WomanRuledDead2-0`) +- `Nom` TODO: flat? (`email-enronsent13_01-0092`) +- `Nom` enumerated proper name ([usc-acreage](datasets/oneoff/pdf/usc-acreage.pdf) `Title 34-2`) +- `Nom` enumerated proper name ([usc-rule2002a](datasets/oneoff/pdf/usc-rule2002a.pdf)) +- `Nom` enumerated proper name ([usc-rule2002a](datasets/oneoff/pdf/usc-rule2002a.pdf)) +- `Nom` enumerated proper name (`reviews-020992-0003`) +- `Nom` temperature expression (measurement) (`answers-20111108104957AAsMzvU_ans-0006`) +- `P` _along_ structure of 'along with' is unclear; treating 'along' as head (`newsgroup-groups.google.com_eHolistic_2dd76f31ceb6bfe8_ENG_20050513_224200-0022`) +- `P` _back_ directional preposition modifier (p. 645) (`answers-20111108091921AAaLK4e_ans-0070`) +- `P` _over_ can't decide whether transitive or intransitive PP ([atonement](datasets/oneoff/pdf/atonement.pdf)) +- `P` _over_ directional preposition modifier in PP (p. 645) (`email-enronsent35_01-0010`) +- `P` _until_ cf. until later, p. 640 ([swingingbed](datasets/oneoff/pdf/swingingbed.pdf)) +- `PP` 'all over': treating 'all' as modifier (see CGEL p. 645) (`newsgroup-groups.google.com_Meditation20052_06390a5f75b2e1f2_ENG_20050316_091700-0036`) +- `PP` 'only' phrase triggering subj-aux inversion (pp. 95-96) ([vaporization](datasets/oneoff/pdf/vaporization.pdf)) +- `PP` Looks like a case where either PP or fused relative analysis is possible (p. 1078); go with PP for simplicity (`reviews-299169-0003`) +- `PP` PP as subject (pp. 646-647) (`reviews-122564-0002`) +- `PP` PP in lieu of DP, p. 433 (`newsgroup-groups.google.com_INTPunderground_b2c62e87877e4a22_ENG_20050906_165900-0085`) +- `PP` TODO: belongs under NP? (`email-enronsent39_01-0060`) +- `PP` TODO: stative depictive: Comp or PredComp? ([uniform](datasets/oneoff/pdf/uniform.pdf)) +- `PP` double-complement PP; cf. 'from Boston to Providence', p. 641. Also a spatial PP taking the place of an object in clause structure ([waistup](datasets/oneoff/pdf/waistup.pdf)) +- `PP` fronted partitive PP: p. 903 discusses 'which' + partitives but not with the partitive fronted (`answers-20111107155815AA6LXXJ_ans-0001`) +- `PP` post-head modifier of compound determinative, p. 423 (`answers-20111108084416AAoPgBv_ans-0004`) +- `PP` punishment sense of 'for' - an argument could be made for Comp (`Tree DeltaHasBanned-0`) +- `PP` with-absolute construction; possibly should be considered a verbless clause? (`email-enronsent07_01-0061`) +- `VP` double preposition stranding ([kindoffriend](datasets/oneoff/pdf/kindoffriend.pdf)) +- `VP` fully-gapped-ok ([howstupid](datasets/oneoff/pdf/howstupid.pdf)) +- `VP` implied 'is' (headlinese) ([xkcd-garden-path](datasets/oneoff/pdf/xkcd-garden-path.pdf)) # Infrequent Categories diff --git a/scripts/make_index.py b/scripts/make_index.py index 436ff9f..6a1e3e8 100644 --- a/scripts/make_index.py +++ b/scripts/make_index.py @@ -52,7 +52,7 @@ def main(cgelpaths): meta[m].add(sentId) for node in tree.tokens.values(): if node.note: - notes.append((node.note, sentId)) + notes.append((node.constituent, node.lexeme, node.note, sentId)) cats[node.constituent].add(sentId) fxns[node.deprel].add(sentId) print() @@ -63,8 +63,8 @@ def main(cgelpaths): print(f'- `{m}` ({count}/{nTrees})' + (' ' + ', '.join(sorted(meta[m])) + '' if count({sentId})') + for nodecat,nodelex,note,sentId in sorted(notes): + print(f'- `{nodecat}`' + (f' _{nodelex}_ ' if nodelex else '') + f' {note} ({sentId})') print() print('# Infrequent Categories\n') print(f'Of {nTrees} trees, the following occurred in fewer than 5% ({thresh}):\n')