A little question about Cluprocessor #240

dvzhang · 2021-01-16T05:35:44Z

A little question.
If I use Cluprocessor to make my index, I can't use the rule like:

name: pets_adoption
type: event
label: Adoption
priority: 2 # will run in the second round of extraction, can reference priority 1 rules
pattern: |
trigger = [lemma=adopt]
adopter = >nsubj [] # note: we didn't specify the label, so any token will work
pet: Pet = >dobj []

to extract information?
what's more, I try the rule of document.event_queries.priorities, and the next.
http://gh.lum.ai/odinson/event_queries.html#priorities

all of them can't work if I use Cluprocessor to make my index.

If I use fastnlp, it takes really long time to annotate the text.

so, maybe, we can do some work to make it fit the syntactic parse of Cluprocessor .

BeckySharp · 2021-03-04T18:39:54Z

New changes to cluprocessor may give us what we we lacking.

profile the annotation/indexing on a given corpus
switch to coreclu as default for extra/AnnotateText
add a mkPartialAnnotation method as default (configurable?), particularly since we can't currently use the expensive cross-sentence annotations anyway
add cool new fields to the defaults (e.g., enhanced SRLs)
profile again, celebrate or cry

This is the plan, I'll try to work on it as I can. PLEASE all (@myedibleenso , @marcovzla , @dvzhang ) feel free to comment/complain/add/subtract

myedibleenso · 2021-03-04T19:03:48Z

enhanced SRLs

@BeckySharp , are you talking about indexing multiple graphs or just some new token attributes? @marcovzla and I discussed indexing multiple graphs per sentence before, but opted not to do it at that time. Eventually, though, I think we'll want to do this and use a field in the rule to tell us which graph to match against. One proposal was to index one merged graph. I think both approaches (one vs many) have advantages and disadvantages.

BeckySharp · 2021-03-04T19:44:03Z

are you talking about indexing multiple graphs or just some new token attributes?

Great point, I wasn't fully thinking it through, but now that you mention it I remember this and several discussions about how to approach it. I think the lowest hanging fruit is to merge (e.g., deps and SRLs), configurable of course, as it would also give access to mixed rules without needing to make a change to the rule syntax.
That said -- looking longer term, IMO we should be able to write rules against multiple graphs, to be able to add more complex constraints. However, IDK when that will be needed vs simple anticipated as useful...

How do you feel about the merged plan for now?
And really thanks @myedibleenso for thinking it through for/with me!

myedibleenso · 2021-03-04T19:56:40Z

An option under the conf for extra related to graph merging is definitely the path of least resistance. Seems like it would be useful. It's straightforward to just combine and distinct edges: List[Edge] and roots: Set[Int] from each processors doc's Sentence to construct a single DirectedGraph and then replace the GraphMap for each processor Sentence before creating the Odinson doc.

Looks like there's already an entry in the processors GraphMap that merges a couple of the graphs (it looks as if it excludes basic):

https://github.com/clulab/processors/blob/b8fb35a38d146a67817b6fad00031ab6732ee53a/main/src/main/scala/org/clulab/struct/GraphMap.scala#L17

I sort of think that we might as well merge them all, though...

BeckySharp · 2021-03-04T20:03:09Z

I sort of think that we might as well merge them all, though...

i'd be ok merging them all unless there are overlapping labels between the basic and enhanced that point to diff things, any easy way to check that you think?

myedibleenso · 2021-03-04T20:10:39Z

My guess is that it's most likely to happen with adpositions/case relations. Here is where the conversion from basic -> enhanced takes place in processors:

https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/utils/ToEnhancedDependencies.scala

I didn't have a hand in that code, but I know it is a deterministic process.

I guess one strategy that still keeps things fairly simple is to just merge in a way that prefers enhanced (I assume that's what you want?) where any edge in basic that has the same source and relation label should be filtered out of the merged.

BeckySharp · 2021-03-04T20:14:17Z

that strategy would account for one issue (that TBH I didn't think about), but I was thinking more that idk, something like >dep in basic would go to word_3 and in enhanced it would go to word_4, and when we wrote the rules we were thinking about enhanced (which we're more familiar with), and so we would expect/want word_4 not word_3....
maybe i'm worried about nothing tho

myedibleenso · 2021-03-04T20:26:48Z

If we want to prefer X over Y, I think we can achieve that using the filtering step I described. The downside is we lose some info.

I think we're really just trying to come up with reasonable default behavior.

If a user wants something more customized, that person should create custom Odinson doc JSON and simply index. This annotation runnable is really just to get started quickly with something that is (hopefully) useful to many.

At this point, anyone who wants to index non-English docs or use custom components (ex. third-party sequence taggers for NER) must roll their own Odinson doc creation pipeline.

One of these days I'm going to provide an example that shows how to use spaCy to create Odinson doc JSON.

BeckySharp · 2021-03-04T21:49:09Z

on it :)

BeckySharp self-assigned this Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A little question about Cluprocessor #240

A little question about Cluprocessor #240

dvzhang commented Jan 16, 2021 •

edited

Loading

BeckySharp commented Mar 4, 2021

myedibleenso commented Mar 4, 2021

BeckySharp commented Mar 4, 2021

myedibleenso commented Mar 4, 2021

BeckySharp commented Mar 4, 2021

myedibleenso commented Mar 4, 2021

BeckySharp commented Mar 4, 2021

myedibleenso commented Mar 4, 2021 •

edited

Loading

BeckySharp commented Mar 4, 2021

A little question about Cluprocessor #240

A little question about Cluprocessor #240

Comments

dvzhang commented Jan 16, 2021 • edited Loading

BeckySharp commented Mar 4, 2021

myedibleenso commented Mar 4, 2021

BeckySharp commented Mar 4, 2021

myedibleenso commented Mar 4, 2021

BeckySharp commented Mar 4, 2021

myedibleenso commented Mar 4, 2021

BeckySharp commented Mar 4, 2021

myedibleenso commented Mar 4, 2021 • edited Loading

BeckySharp commented Mar 4, 2021

dvzhang commented Jan 16, 2021 •

edited

Loading

myedibleenso commented Mar 4, 2021 •

edited

Loading