You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Stanford NER tagger tags individual words as SMO or not. For example, Occupy Wall Street is returned as [('Occupy', 'ORGANIZATION'), ('Wall', 'ORGANIZATION'), ('Street', 'ORGANIZATION')].
To parse this into a single string I've made the assumption that all consecutive organization tags indicate the same SMO. Does this seem like a reasonably robust approach, or should we try to come up with something else?
It seems to work as long as punctuation is included as separate tokens (i.e. a list of SMOs is separated by non-organization tagged commas), but I probably haven't thought about all edge cases.
The text was updated successfully, but these errors were encountered:
On Mon, Jun 12, 2017 at 12:57 PM, Erle Holgersen ***@***.***> wrote:
The Stanford NER tagger tags individual words as SMO or not. For example,
Occupy Wall Street is returned as [('Occupy', 'ORGANIZATION'), ('Wall',
'ORGANIZATION'), ('Street', 'ORGANIZATION')].
To parse this into a single string I've made the assumption that all
consecutive organization tags indicate the same SMO. Does this seem like a
reasonably robust approach, or should we try to come up with something else?
It seems to work as long as punctuation is included as separate tokens
(i.e. a list of SMOs is separated by non-organization tagged commas), but I
probably haven't thought about all edge cases.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAwvDU3nfYHNiwmn7tAp8-yEFjrvBw4Dks5sDW3ogaJpZM4N3ZD2>
.
The Stanford NER tagger tags individual words as SMO or not. For example, Occupy Wall Street is returned as
[('Occupy', 'ORGANIZATION'), ('Wall', 'ORGANIZATION'), ('Street', 'ORGANIZATION')]
.To parse this into a single string I've made the assumption that all consecutive organization tags indicate the same SMO. Does this seem like a reasonably robust approach, or should we try to come up with something else?
It seems to work as long as punctuation is included as separate tokens (i.e. a list of SMOs is separated by non-organization tagged commas), but I probably haven't thought about all edge cases.
The text was updated successfully, but these errors were encountered: