-
-
Notifications
You must be signed in to change notification settings - Fork 50
Migrating from 0.1 to 1.0
I don't like breaking backwards compatibility, but to be able to add new features I felt I had to. This means that updating from 0.1 to 1.0 might require code changes.
- Slimmer public API
- Returning TokenLists and TokenTrees instead of lists
- Parsing of ID:s now include ranges and decimals
Previously init.py included a lot of public methods. Some of these have NOT moved:
from conllu import parse
from conllu import parse_tree
These two work just like they did before. But they now return a TokenList
or TokenTree
instead of a raw list. See next heading on how to handle this.
-from conllu.parser import parse
-from conllu.parser import parse_tree
+from conllu import parse
+from conllu import parse_tree
Importing parse
and parse_tree
for conllu.parser
is no longer supported. Remove ".parser" and the imports will work again.
-from conllu.parser import parse_with_comments
parse_with_comments is now removed. When using parse
comments are automatically included. You can access them with by accessing the new metadata
property on the returned TokenList.
-from conllu.parser import serialize_tree
-from conllu.tree_helpers import print_tree
These two methods have been moved to TokenTree
that is returned from parse_tree
. serialize_tree
is now tree.serialize()
, and print_tree
is now tree.print_tree()
.
The return values from both parse and parse_tree have changed.
sentences = parse(raw_conllu_str)
sentence = sentences[0]
for token in sentence:
print(token)
This code will keep working since TokenList has a getitem defined that makes it work like a list. If you relied on some other part the return value behaving like a list, you might have to change that.
sentences = parse_tree(raw_conllu_str)
root = sentences[0]
-print(root.data, root.children)
+print(root.token, root.children)
When switching from TreeNode
to TokenTree
I've also changed data
to instead be token
. So you have to change all places where you access .data
to access .token
instead.
Previously only ID:s in the form of positive integers where recognized. Now conllu has support for ranges ("1-3") and decimals ("3.1") too. If your code relied on those numbers being returned as None
, you need to change that to say isinstance(value, int)
instead..
"1" -> 1
-"1-3" -> None
+"1-3" -> (1, "-", 3)
-"3.1" -> None
+"3.1" -> (3, ".", 1)