Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some glosses are repeated #33

Open
arademaker opened this issue Nov 25, 2022 · 4 comments
Open

some glosses are repeated #33

arademaker opened this issue Nov 25, 2022 · 4 comments

Comments

@arademaker
Copy link
Member

arademaker commented Nov 25, 2022

We have glosses repeated in PWN 3.0 and PWN 3.1.

  • 376 cases in PWN30 where 289 gloses have been repeated twice.
  • 363 cases in PWN31, where 275 sentences were repeated twice.

One example:

  1. http://wn.mybluemix.net/synset?id=01156302-a (mellow)
  2. http://wn.mybluemix.net/synset?id=01492061-a (mellowed • mellow)
ar@tenis glosstag-kg % cat ../WordNet-3.0/dict/data.* | awk -F "|" '$0 ~ /^[0-9]/ {print $2}' | sort | uniq -c | sort -nr | head
  23  a variety of aster
  18  a branch of the Tai languages
  13  one species
  13  one of the British colonies that formed the United States
  11  a genus of bacteria
   9  an artificial language
   9  a radioactive transuranic element
   9  a genus of Mustelidae
   9  a Chadic language spoken south of Lake Chad
   8  a genus of Psittacidae

% cat ../WordNet-3.0/dict/data.* | awk -F "|" '$0 ~ /^[0-9]/ {print $2}' | sort | uniq -c | sort -nr | awk '$1 > 1 {print $1}' | sort | uniq -c
   1 11
   2 13
   1 18
 289 2
   1 23
  40 3
  18 4
   9 5
   5 6
   5 7
   1 8
   4 9

For PWN 3.1

% cat ../WordNet-3.1-dict/data.* | awk -F "|" '$0 ~ /^[0-9]/ {print $2}' | sort | uniq -c | sort -nr | head
  23  a variety of aster
  18  a branch of the Tai languages
  13  one species
  13  one of the British colonies that formed the United States
  11  a genus of bacteria
   9  an artificial language
   9  a radioactive transuranic element
   9  a genus of Mustelidae
   9  a Chadic language spoken south of Lake Chad
   8  a genus of Psittacidae

% cat ../WordNet-3.1-dict/data.* | awk -F "|" '$0 ~ /^[0-9]/ {print $2}' | sort | uniq -c | sort -nr | awk '$1 > 1 {print $1}' | sort | uniq -c
   1 11
   2 13
   1 18
 275 2
   1 23
  40 3
  18 4
   9 5
   6 6
   5 7
   1 8
   4 9
@arademaker
Copy link
Member Author

we have two problems with the repetitions:

  1. extra effort on annotation
  2. possible inconsistencies in the analyses of the same sentence
CL-USER> (main-1)
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((pos . VBN) (senses attain%2:38:01::) (tag . man))
 ((pos . VBN) (tag . un))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((pos . NN) (senses gentleness%1:07:00::) (tag . man))
 ((pos . NN) (tag . un))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((pos . NN) (senses age%1:28:01::) (tag . man))
 ((pos . NN) (tag . un))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses mellow%5:00:00:mature:02) (tag . auto))
 ((senses mellow%5:00:00:soft:02) (tag . auto))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses mellow%5:00:00:mature:02) (tag . auto))
 ((senses mellow%5:00:00:soft:02) (tag . auto))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses age%1:28:00::) (sep . ) (tag . man))
 ((sep . ) (tag . un))
D shaped like a sausage
 ((pos . VBN) (senses shape%2:36:00::) (tag . man))
 ((pos . VBN) (senses shape%2:30:00::) (tag . man))
D greatly desired
 ((pos . JJ) (senses desire%2:37:00:: desired%5:00:00:wanted:00) (sep . )
  (tag . man))
 ((pos . JJ) (senses desired%5:00:00:wanted:00) (sep . ) (tag . man))
D copperheads
 ((pos . NN) (senses copperhead%1:05:01::) (sep . ) (tag . man))
 ((pos . NN) (senses copperhead%1:05:02::) (sep . ) (tag . man))
D pearl oysters
 ((senses pearl_oyster%1:05:00::) (tag . man))
 ((senses pearl_oyster%1:05:00::) (tag . auto))
D fur seals
 ((senses fur_seal%1:05:02::) (tag . man))
 ((senses fur_seal%1:05:01::) (tag . man))
D fungus gnats
 ((senses fungus_gnat%1:05:02::) (tag . man))
 ((senses fungus_gnat%1:05:01::) (tag . man))
D moths whose larvae are armyworms
 ((pos . NN) (senses armyworm%1:05:02:: armyworm%1:05:01:: armyworm%1:05:03::)
  (sep . ) (tag . man))
 ((pos . NN) (senses armyworm%1:05:02:: armyworm%1:05:01::) (sep . )
  (tag . man))
D mole rats
 ((senses mole_rat%1:05:01::) (tag . man))
 ((senses mole_rat%1:05:03::) (tag . man))
D ribbonfishes
 ((pos . NN) (senses ribbonfish%1:05:01::) (sep . ) (tag . man))
 ((pos . NN) (senses ribbonfish%1:05:02::) (sep . ) (tag . man))
D snappers
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses snapper%1:05:01::) (sep . ) (tag . man))
D a white crystalline compound used as an analgesic and also as an antipyretic
 ((pos . JJ) (senses analgesic%1:06:00::) (tag . man))
 ((pos . JJ) (senses analgesic%5:00:00:moderating:00) (tag . man))
D the quality of being inaccurate and having errors
 ((pos . NNS) (senses error%1:07:00::) (sep . ) (tag . man))
 ((pos . NNS) (senses error%1:04:02:: error%1:10:00::) (sep . ) (tag . man))
D a dark purplish-red color
 ((pos . JJ) (tag . un))
 ((pos . JJ) (senses dark%3:00:02::) (tag . man))
D (physics) one of the six flavors of quark
 ((senses physics%1:09:00::) (sep . ) (tag . man))
 ((sep . ) (tag . un))
D a conventional expression of greeting or farewell
 ((pos . NN) (senses expression%1:10:00:: expression%1:10:04::) (tag . man))
 ((pos . NN) (senses expression%1:10:04:: expression%1:10:00::) (tag . man))
D a Chadic language spoken south of Lake Chad
 ((pos . VBN) (senses spoken%3:00:00::) (tag . man))
 ((pos . VBN) (senses speak%2:32:02::) (tag . man))
D milk from which some of the cream has been removed
 ((pos . NN) (senses milk%1:13:01::) (tag . man))
 ((pos . NN) (tag . un))
D milk from which some of the cream has been removed
 ((pos . NN) (senses cream%1:13:00::) (tag . man))
 ((pos . NN) (tag . un))
D a state in northern Mexico; mostly high plateau
 ((pos . NN) (tag . un))
 ((pos . NN) (senses state%1:15:01::) (tag . man))
D a Mid-Atlantic state; one of the original 13 colonies
 ((pos . NN) (senses state%1:15:01::) (sep . ) (tag . man))
 ((pos . NN) (sep . ) (tag . un))
D a Mid-Atlantic state; one of the original 13 colonies
 ((pos . NN) (senses state%1:15:01::) (sep . ) (tag . man))
 ((pos . NN) (sep . ) (tag . un))
D an African river; flows into the Indian Ocean
 ((pos . VBZ) (senses flow%2:38:00::) (tag . man))
 ((pos . VBZ) (tag . un))
D a member of a group of Siouan people who constituted a division of the Teton Sioux
 ((pos . VB) (senses constitute%2:42:00:: constitute%2:42:03::) (tag . man))
 ((pos . VB) (senses constitute%2:42:03:: constitute%2:42:00::) (tag . man))
D a member of a group of Siouan people who constituted a division of the Teton Sioux
 ((pos . VB) (senses constitute%2:42:00:: constitute%2:42:03::) (tag . man))
 ((pos . VB) (senses constitute%2:42:03:: constitute%2:42:00::) (tag . man))
D oxeye
 ((pos . NN) (senses oxeye%1:20:02::) (sep . ) (tag . man))
 ((pos . NN) (senses oxeye%1:20:01::) (sep . ) (tag . man))
D a grain of barley
 ((pos . NN) (senses grain%1:20:00::) (tag . man))
 ((pos . NN) (senses grain%1:13:00::) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D having more than one husband at a time
 ((pos . RBR) (tag . ignore))
 ((pos . JJR) (tag . ignore))
D having more than one husband at a time
 ((pos . NN) (senses time%1:28:05:: time%1:11:00::) (sep . ) (tag . man))
 ((pos . NN) (senses time%1:11:00:: time%1:28:05::) (sep . ) (tag . man))
D having more than one wife at a time
 ((pos . RBR) (tag . ignore))
 ((pos . JJR) (tag . ignore))
D hardened clay
 ((pos . NN) (senses clay%1:27:00::) (sep . ) (tag . man))
 ((pos . NN) (senses clay%1:27:02::) (sep . ) (tag . man))

@arademaker arademaker changed the title duplicated sentences some glosses are repeated Nov 25, 2022
arademaker added a commit that referenced this issue Feb 6, 2023
@arademaker
Copy link
Member Author

arademaker commented Feb 7, 2023

We are considering possible approaches to remove duplicated sentences. Before merging equal strings, the question is if we we have any situation where the same string (definition or example) could be sense tagged in a different way depending on the synset is using it? That is, the context of its usage.

both with the gloss a grain of barley

  1. http://wn.mybluemix.net/synset?id=12123648-n (noun plant)
  2. http://wn.mybluemix.net/synset?id=07803093-n (noun food)

how to annotate barley?

  1. http://wn.mybluemix.net/synset?id=12123244-n (noun plant)
  2. http://wn.mybluemix.net/synset?id=07803093-n (noun food = same above)

arademaker added a commit that referenced this issue Feb 7, 2023
1. two cases from #34
2. removed inconsistencies of two cases
@arademaker
Copy link
Member Author

arademaker commented Feb 7, 2023

To answer the question above, I manually removed all spurious inconsistencies in the sense tagging of the duplicated sentences. The list now is reduced to:

  1. 01156302-a mellow | doce
    (having attained to kindliness or gentleness through age and experience; "mellow wisdom"; "the peace of mellow age")
  2. 01492061-a mellowed, mellow
    (having attained to kindliness or gentleness through age and experience; "mellow wisdom"; "the peace of mellow age")

The two senses of mellow have the same definition and examples, each example annotated with the respective sense:

D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses mellow%5:00:00:mature:02) (tag . auto))
 ((senses mellow%5:00:00:soft:02) (tag . auto))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses mellow%5:00:00:mature:02) (tag . auto))
 ((senses mellow%5:00:00:soft:02) (tag . auto))

Some complex problems that are showing some possible problematic relations or duplicated synsets:

D copperheads
 ((pos . NN) (senses copperhead%1:05:01::) (sep . ) (tag . man))
 ((pos . NN) (senses copperhead%1:05:02::) (sep . ) (tag . man))
D pearl oysters
 ((senses pearl_oyster%1:05:00::) (tag . man))
 ((senses pearl_oyster%1:05:00::) (tag . auto))
D fur seals
 ((senses fur_seal%1:05:02::) (tag . man))
 ((senses fur_seal%1:05:01::) (tag . man))
D fungus gnats
 ((senses fungus_gnat%1:05:02::) (tag . man))
 ((senses fungus_gnat%1:05:01::) (tag . man))
D moths whose larvae are armyworms
 ((pos . NN) (senses armyworm%1:05:02:: armyworm%1:05:01:: armyworm%1:05:03::)
  (sep . ) (tag . man))
 ((pos . NN) (senses armyworm%1:05:02:: armyworm%1:05:01::) (sep . )
  (tag . man))
D mole rats
 ((senses mole_rat%1:05:01::) (tag . man))
 ((senses mole_rat%1:05:03::) (tag . man))
D ribbonfishes
 ((pos . NN) (senses ribbonfish%1:05:01::) (sep . ) (tag . man))
 ((pos . NN) (senses ribbonfish%1:05:02::) (sep . ) (tag . man))
D snappers
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses snapper%1:05:01::) (sep . ) (tag . man))
D oxeye
 ((pos . NN) (senses oxeye%1:20:02::) (sep . ) (tag . man))
 ((pos . NN) (senses oxeye%1:20:01::) (sep . ) (tag . man))
D a grain of barley
 ((pos . NN) (senses grain%1:20:00::) (tag . man))
 ((pos . NN) (senses grain%1:13:00::) (tag . man))
D hardened clay
 ((pos . NN) (senses clay%1:27:00::) (sep . ) (tag . man))
 ((pos . NN) (senses clay%1:27:02::) (sep . ) (tag . man))

only POS tag

D having more than one husband at a time
 ((pos . RBR) (tag . ignore))
 ((pos . JJR) (tag . ignore))
D having more than one wife at a time
 ((pos . RBR) (tag . ignore))
 ((pos . JJR) (tag . ignore))

arademaker added a commit that referenced this issue Feb 7, 2023
this commit fixed all spurios inconsistencies in the sense annotation
of the duplicated sentences.

the remain cases need some discussion.
@fcbond
Copy link
Collaborator

fcbond commented Feb 7, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants