-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A little question about Cluprocessor #240
Comments
New changes to cluprocessor may give us what we we lacking.
This is the plan, I'll try to work on it as I can. PLEASE all (@myedibleenso , @marcovzla , @dvzhang ) feel free to comment/complain/add/subtract |
@BeckySharp , are you talking about indexing multiple graphs or just some new token attributes? @marcovzla and I discussed indexing multiple graphs per sentence before, but opted not to do it at that time. Eventually, though, I think we'll want to do this and use a field in the rule to tell us which graph to match against. One proposal was to index one merged graph. I think both approaches (one vs many) have advantages and disadvantages. |
Great point, I wasn't fully thinking it through, but now that you mention it I remember this and several discussions about how to approach it. I think the lowest hanging fruit is to merge (e.g., deps and SRLs), configurable of course, as it would also give access to mixed rules without needing to make a change to the rule syntax. How do you feel about the merged plan for now? |
An option under the conf for Looks like there's already an entry in the processors I sort of think that we might as well merge them all, though... |
i'd be ok merging them all unless there are overlapping labels between the basic and enhanced that point to diff things, any easy way to check that you think? |
My guess is that it's most likely to happen with adpositions/case relations. Here is where the conversion from basic -> enhanced takes place in processors: I didn't have a hand in that code, but I know it is a deterministic process. I guess one strategy that still keeps things fairly simple is to just merge in a way that prefers enhanced (I assume that's what you want?) where any edge in basic that has the same source and relation label should be filtered out of the merged. |
that strategy would account for one issue (that TBH I didn't think about), but I was thinking more that idk, something like >dep in basic would go to word_3 and in enhanced it would go to word_4, and when we wrote the rules we were thinking about enhanced (which we're more familiar with), and so we would expect/want word_4 not word_3.... |
If we want to prefer X over Y, I think we can achieve that using the filtering step I described. The downside is we lose some info. I think we're really just trying to come up with reasonable default behavior. If a user wants something more customized, that person should create custom Odinson doc JSON and simply index. This annotation runnable is really just to get started quickly with something that is (hopefully) useful to many. At this point, anyone who wants to index non-English docs or use custom components (ex. third-party sequence taggers for NER) must roll their own Odinson doc creation pipeline. One of these days I'm going to provide an example that shows how to use spaCy to create Odinson doc JSON. |
on it :) |
A little question.
If I use Cluprocessor to make my index, I can't use the rule like:
type: event
label: Adoption
priority: 2 # will run in the second round of extraction, can reference priority 1 rules
pattern: |
trigger = [lemma=adopt]
adopter = >nsubj [] # note: we didn't specify the label, so any token will work
pet: Pet = >dobj []
to extract information?
what's more, I try the rule of document.event_queries.priorities, and the next.
http://gh.lum.ai/odinson/event_queries.html#priorities
all of them can't work if I use Cluprocessor to make my index.
If I use fastnlp, it takes really long time to annotate the text.
so, maybe, we can do some work to make it fit the syntactic parse of Cluprocessor .
The text was updated successfully, but these errors were encountered: