The main aim of the UD Turkic Group is to harmonize the annotations of the existing Universal Dependencies treebanks of Turkic languages as well as to provide general guidelines for future initiatives. The group is part of the ongoing COST Action 211167, titled 'Universality, diversity and idiosyncrasy in language technology' (UniDive). In this document, we present the background, history and the current status of the group's work.
Currently, UD has 20 treebanks representing several Turkic languages, 9 of which have Turkish language annotations. The annotations of these treebanks may vary considerably both at the level of morphology (lemmatization, segmentation, POS tags) and of syntax (labels of syntactic relations). The two main reasons for the differences are that (i) the exact function, grammatical behavior, and/or analysis of certain structures or phenomena is not known at present; (ii) certain constructions are analyzed differently due to the different views available in the linguistic literature.
The group was formed at the UD Turkic Workshop 2023, held in conjunction with the UniDive 2nd WG3 meeting in Istanbul on 8 September 2023. The workshop aimed at bringing together people working on Turkic treebanks and/or languages in order to discuss how to achieve consistent annotations across different treebanks and also linguistic phenomena found in Turkic languages that are difficult to annotate within the frame of the current UD guidelines.
As part of the dissemination of the work, the group presented an abstract, titled 'Unifying the Annotations in Turkic Universal Dependencies Treebanks', in the 2nd UniDive Workshop, held in Naples, Italy on 8 February 2024; and published a paper, titled 'Strategies for the Annotation of Pronominalised Locatives in Turkic Universal Dependency Treebanks', in MWE-UD 2024, colocated with LREC-COLING 2024 in Turin, Italy on 25 May 2024.
- Azerbaijani (1)
- Kazakh (1)
- Kyrgyz (2)
- Old Turkish (1)
- Ottoman Turkish (2)
- Tatar (1)
- Turkish (9)
- Turkish German (1)
- Uyghur (1)
- Yakut (1)