Skip to content

Commit

Permalink
add category/function matrix to STATS.md
Browse files Browse the repository at this point in the history
  • Loading branch information
nschneid committed Nov 23, 2024
1 parent d2b3c3c commit 32bafef
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 0 deletions.
32 changes: 32 additions & 0 deletions STATS.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,3 +211,35 @@ Analyzing 42 files:
| (VP :Head V :Particle PP :Comp Clause) | 1 |
| (PP :Head P :Obj NP :Comp PP) | 1 |
| (VP :Head V :Comp PP_strand :Comp PP_strand) | 1 |

## Nonlexical Categories by Function (excluding nonce categories)

| | Nom | NP | VP | Clause | PP | DP | AdjP | AdvP | GAP | Clause_rel | Coordination | PP_strand | IntP |
|:------------------|------:|-----:|-----:|---------:|-----:|-----:|-------:|-------:|------:|-------------:|---------------:|------------:|-------:|
| Head | 1847 | 72 | 1406 | 227 | 23 | 1 | 40 | 4 | 32 | 84 | 56 | | |
| Mod | 205 | 28 | 21 | 36 | 277 | 25 | 207 | 219 | 35 | 91 | 20 | | |
| Comp | | 2 | | 478 | 289 | | 4 | 1 | 18 | | 14 | 10 | |
| Obj | | 682 | | | | | | | 57 | | 30 | | |
| Det | | 108 | | | 4 | 465 | | | | | | | |
| Subj | | 494 | | 4 | 1 | | | | 51 | | 4 | | |
| Coordinate | 46 | 99 | 59 | 110 | 4 | 2 | 37 | 6 | | 3 | 4 | | |
| (root) | 1 | 21 | 1 | 227 | | | 2 | | | | 41 | | |
| PredComp | | 64 | | 9 | 13 | | 72 | | 20 | | 8 | | |
| Supplement | | 43 | 2 | 24 | 46 | | | 17 | | 17 | 4 | | 8 |
| Det-Head | | 2 | | | | 87 | | | | | | | |
| Prenucleus | | 38 | 1 | 2 | 14 | | 10 | 10 | | | | | |
| Postnucleus | 1 | 10 | | 5 | 5 | | | | | 3 | 1 | | |
| Head-Prenucleus | | 14 | | | 1 | | 1 | 2 | | | | | |
| Comp_ind | | | | 9 | 7 | | | | | | | | |
| Particle | | | | | 15 | | | | | | | | |
| DisplacedSubj | | 14 | | | | | | | | | 1 | | |
| Obj_ind | | 12 | | | | | | | 1 | | | | |
| Obj_dir | | 12 | | | | | | | 1 | | | | |
| ExtraposedSubj | | | | 12 | | | | | | | | | |
| Mod-Head | | | 1 | | 1 | | 6 | | | | | | |
| Obj+Mod | | | | | | | | | | | 4 | | |
| Vocative | | 4 | | | | | | | | | | | |
| Compounding | | | 3 | | 1 | | | | | | | | |
| Marker | | | | | | 2 | | | | | | | |
| Obj+PredComp/Comp | | | | | | | | | | | 1 | | |
| ExtraposedObj | | | | 1 | | | | | | | | | |
17 changes: 17 additions & 0 deletions analysis/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ def analyse_pos(trees: list[cgel.Tree], mode='code' or 'tex' or 'markdown'):
lemmas = Counter()
cats = Counter()
fxns = Counter()
catsbyfxn = defaultdict(Counter)
high_valencies = Counter()
poses_by_lemma = defaultdict(set)
ambig_class = defaultdict(set)
Expand Down Expand Up @@ -83,6 +84,8 @@ def analyse_pos(trees: list[cgel.Tree], mode='code' or 'tex' or 'markdown'):

else:
cats[node.constituent] += 1
if '+' not in node.constituent:
catsbyfxn[node.constituent][node.deprel or '(root)'] += 1
if node.constituent!='Coordination':
ch_nonsupp = [cgel_tree.tokens[c] for c in cgel_tree.children[n] if cgel_tree.tokens[c].deprel not in ('Supplement','Vocative')]
if len(ch_nonsupp)>2:
Expand Down Expand Up @@ -162,6 +165,20 @@ def analyse_pos(trees: list[cgel.Tree], mode='code' or 'tex' or 'markdown'):
df = pd.DataFrame.from_records(high_valencies.most_common(), columns=['valency','count'])
print(df.to_markdown(index=False))

print('\n## Nonlexical Categories by Function (excluding nonce categories)\n')
if mode=='code':
print(catsbyfxn)
else:
df = pd.DataFrame.from_records(catsbyfxn).fillna(0).astype(int)
# sort columns by total
s = df.sum()
df = df[s.sort_values(ascending=False).index]
# sort rows by total
df = (df.assign(sum=df.sum(axis=1)) # Add temporary 'sum' column to sum rows.
.sort_values(by='sum', ascending=False) # Sort by row sum descending order.
.iloc[:, :-1]) # Remove temporary `sum` column.
print(df.to_markdown(index=True).replace(' 0 |', ' |'))

def main():
parser = argparse.ArgumentParser()
parser.add_argument('--type', help='type of analysis', type=str, default='both', required=False)
Expand Down

0 comments on commit 32bafef

Please sign in to comment.