This file supports a contribution to the Workshop "Data Science in Climate and Climate Impact Research" taking place on 20-21 August 2020 in Zurich and online.
Multilingual Structured Climate Research Data in Wikidata - The Data Perspective
Climate research — like research in general — takes place in a sociotechnical ecosystem that connects researchers, institutions, funders, databases, locations, publications, methodologies and related concepts with the objects of study and the natural and cultural worlds around them.
Mechanisms for describing concepts related to climate research are growing in breadth and depth, number and popularity. In parallel, more and more climate-related data — and particularly metadata — are being made available under open licenses, which facilitates discoverability, reproducibility and reuse, as well as data integration.
Wikidata is a community-curated open knowledge base in which concepts covered in any Wikipedia — and beyond — can be described in a structured and FAIR fashion that can be mapped to RDF and queried using SPARQL as well as various other means. Its community of over 20,000 monthly contributors oversees a corpus of currently over 80 million ‘items’ for concepts that are linked amongst each other, to external databases or to specific values via over 7000 'properties'. Items and properties have persistent unique identifiers, to which labels, descriptions and dedicated lexemes and their forms and senses can be attached in over 300 natural languages.
A range of open-source tools is available to interact with Wikidata — to enter information, curate and query it. In this presentation — available via https://github.com/Daniel-Mietchen/events/blob/master/data-science-in-climate-and-climate-impact-research.md — we will outline a range of tools that allow to explore Wikidata content through frontends tailored to specific communities. In particular, we will take a look at Scholia, which is available via https://tools.wmflabs.org/scholia/ and allows to generate and explore scholarly profiles of authors, institutions, funders and other parts of the research ecosystem, as well as of the world in which it is embedded, from geomorphological features to economic indicators and environmental policies, from natural ecosystems and disasters to biogeochemical cycles.
Mietchen, Daniel, & Sarasua, Cristina. (2020, August). Multilingual Structured Climate Research Data in Wikidata - The Data Perspective. Zenodo. http://doi.org/10.5281/zenodo.3994266.
The submission was co-authored by Cristina Sarasua from University of Zurich and submitted on 6 April 2020 before the extended deadline. On 20 April, I was notified of its acceptance.
- There is an accompanying submission (also accepted on 20 April) led by Cristina and co-authored by me, entitled "Multilingual Structured Climate Research Data in Wikidata - The Community Perspective". Its abstract reads as follows:
-
Empirical sciences experience a transformation enabled by a myriad of technological solutions that facilitate collecting, sharing and analyzing large- and small-scale research data. Citation networks can be mined, scientific workflows can be reproduced and extended, and data-driven search portals allow scientists to dive into a sea with millions of data sets. While technology is crucial, the success of this transformation heavily depends on social change and commitment. At the core of such a social response, the Open Science movement promotes values such as participation and collaboration. Tightly connected to Open Science, the Free Knowledge initiative advocated by Wikimedia has succeeded in bringing scientific output (and general human knowledge) closer to the global population through platforms like Wikipedia. Wikidata is a community-supported knowledge base, where thousands of volunteers enter, complete, link, monitor and correct data. Wikidata is connected to Wikipedia articles and images in Wikimedia Commons, and it can be queried as machine-readable Linked Data. In this presentation, we would like to showcase Wikidata’s special features in terms of collaborative knowledge management. We will demonstrate how ranks and references allow Wikidata to portray a plural reality in which contradictory statements might have been published by different sources. We will also demonstrate the way federated queries can facilitate data comparison. Moreover, we will describe the process that editors follow to address schema and data quality management collectively, as well as human-bot cooperation. We will also talk about the possibility of transferring many of Wikidata’s features to self-organized communities via Wikibase. Through concrete examples and descriptive statistics, we aim to show the benefits that a community-based data management cycle can provide to many disciplines, including the field of Climate Research.
-
- Citation: Sarasua, Cristina, & Mietchen, Daniel. (2020, August). Multilingual Structured Climate Research Data in Wikidata - The Community Perspective. Zenodo. http://doi.org/10.5281/zenodo.3994272.
- Slides for both talks
- Zenodo snapshots:
- Scholia entry for the event
- Etherpad for the event — contains links to several slide sets
- I gave a presentation "Visualizing the research ecosystem of ecosystem research via Wikidata" at the 10th International Conference on Ecological Informatics (ICEI 2018) on 27 September 2018 in Jena, co-authored by Finn Årup Nielsen and Egon Willighagen. That event was also attended by Markus Reichstein, one of the keynote speakers at the climate workshop.
- Presentation "Visualizing the research ecosystem of neuroscience research via Wikidata" at neuromatch 3.0 in October 2020