Skip to content
Benedikt Hitz-Gamper edited this page Dec 9, 2023 · 62 revisions

The first step allows you to upload one or more CSV files, that serve as input of your Cube, and from which you define a set of output tables which will be part of the cube. The overall operation of this first step is called CSV Mapping (i.e. mapping from CSV to Cube).

In this step it is necessary to create multiple tables based on the CSV input. It is necessary to define at least one table which gives the cube its structure, this table is called the Cube Table. The fields of the Cube Table will define all Dimensions of the Cube. With additional tables it is possible to create Multilingual Concepts used by the Cube Table.

CSV Upload

With "+" you can upload a new syntactically valid CSV file. After the upload you should see a preview of the columns and per column the first three rows of the input data on the left side of the screen. It is possible to replace the uploaded CSV later with a newer version.

Creating tables

To create a new table you can select on the left input side of the screen which columns of a CSV you like to transform into a table. It is always possible to add, change or remove the column mappings. You can only use the input from one CSV per table. It is possible on the other hand to create multiple tables based on one input CSV (E.g. to create the concepts tables based on multilingual fields).

Cube table

Every cube has one main table to represent the observations, with their multiple dimensions.
This table is called the cube table and is created by checking the "Cube table" checkbox.
Additional tables are possible to be created to provide (multilingual) concepts connected to the observation table.

In the observation table it is important to distinguish between, key dimensions and measurement dimensions.

If a dimension in the cube table is based on Concepts you can first create the concept tables and afterwards add links to the concept table in the mapping of the cube table.

By default, columns are mapped as literal attributes. If you want to treat a column as a concept (a resource with a unique identifier) you have to create a Concept table from that column and link to it using the "Link to another table" feature.

Concept table

The concept table provides multilingual labels and additional information (for grouping or external identifiers) for concepts.
A concept table is created without checking the "Cube table" checkbox.

To be used in the generated cube, each concept table must be linked through a Link to another table from either the Cube table or another Concept Table.

Edit table

Identifier template

A unique URI string identifying every row of a dimension.

For each table, an identifier template is needed to build a unique identifier for each row of the table. If this notion is too technical or if you don't fully understand it, please just leave the content of the field empty to get an auto-generated identifier.

Best practice: When designing an identifier template, a good practice is to first have a fixed prefix corresponding to the table name (automatically proposed by the tool), and then a list of columns from the CSV that allows to uniquely identify each row of that table.

In the field, a column name must be written in-between curly brackets ({}) and each column is separated from the other with a slash (/). An identifier will be generated as a URL, it must thus generate a URL-Safe String.

If a key dimension does not provide an unique identifier, preferably use the english name of the concept as an identifier.

This field has an auto-complete that will show up when you write an opening curly bracket ({). This will help you to choose the columns from the CSV and thus avoid misspellings.

Warning: if your project contains multiple tables, a table's identifier template must also avoid collisions between the rows of different tables. This is why the table name is proposed as a fixed prefix.

Display color

The display color is used only inside the Cube Creator to visually connect the CSV inputs columns on the left to the mapped table rows.

Edit column mapping

Target Property

Cube Table

A target property is proposed based on the input data. If you change the target property it must be URI-Safe (best only letters and numbers without spaces).

If you know specific ontologies you like to re-use in your datacube, an auto-complete is available and will appear when you start writing the name of a common ontology (e.g. "schema").

Commonly used target properties:

Property Description Notes
schema:identifier To add identifiers also valid outside the data set. optional

Concept Table

To define multilingual concepts the Target Properties commonly used are:

Property Description Notes
schema:name The name of the concept. mandatory, needs a language tag
schema:description A description of the concept. optional, needs a language tag
schema:position for positions for concepts used in ordinal scales mandatory
schema:identifier for identifiers also valid outside the data set optional

See the language and translations paragraph about how to map multilingual values.

It is possible to further attach information relevant to the concept, e.g. geographical coordinates, categorizations, and even link to other concept tables. Try to reuse already existing properties e.g. schema.org.

Semantically relevant properties are:

Property Description Notes
schema:latitude WGS84 coordinate. mandatory for symbols on a map visualization
schema:longitude WGS84 coordinate. mandatory for symbols on a map visualization

Data Types

The correct data type allows the software consuming your cubes to decide the correct presentation. It is mostly used and important for measurement dimensions. Based on the data type cube creator will check if the syntax of all the input data of the mapped column is correct.

The following data types are available for your input data:

Data type Format Example
boolean "true" or "false" ("0" and "1" aren't supported by jan-2021) true, false
date YYYY-MM-DD 1879-04-19
dateTime YYYY-MM-DDThh:mm:ss 1972-06-25T22:30:00
decimal Decimal separator is ., thousands separators are not allowed 123.456, +1234.456, -.456
int thousands separators are not allowed -2147483648, 0, -0000000000000000000005 or 2147483647
string separator char must be in ", " inside quoted strings must be "" Müller, "Müller, Hans", "Hans ""Johnny"" Müller"
time hh:mm:ss 21:32:52, 21:32:52+02:00, 19:32:52Z, 19:32:52+00:00 and 21:32:52.12679

If the transformation encounters a type mismatch (i.e. a value in the CSV doesn't match the type defined in the mapping), it will fail and an error message will be displayed in the logs of the transform job.

It is always possible to let a transformation run without any data type attached. Be aware that the final data consuming application might not behave correctly in the case there is no data type specified.

Language and translations

For dimensions which provide strings, a language should be specified.

For strings with translations in different languages, ideally no strings are directly attached to the Cube table, but a Concept table should be created. The normal way to handle translations is to have the distinction in the original CSV, one language per column. Those columns are then mapped to the Concept table with the same schema:name target property, a "string" datatype and the corresponding language.

Caution: schema:name must always be provided inside a concept table, otherwise the visualization will fail. Also the schema:description should be provided in multiple languages where used.

In case there is only one language available, still provide the correct language tag. (It will be used as a fallback in other language settings.)

Default value

If an empty or missing values in your column has a specific meaning (e.g. being equal 0) you can define the default value to be written in the cube instead of a missing value. If a default value is set, no explicitly missing values are put in the cube.