Skip to content

Latest commit

 

History

History
269 lines (240 loc) · 17 KB

README.md

File metadata and controls

269 lines (240 loc) · 17 KB

Toy Data

This repository contains toy data created for the sole purpose of testing the pipeline.

Breweries from DBpedia

Retrievable from: https://dbpedia.org/sparql

Query:

construct where {?s ?p ?o. ?s a dbo:Brewery. FILTER(!isLiteral(?o)) } LIMIT 10000

Storage stats: 37923 triples (6205 distinct subjects, 369 distinct predicates, 14772 distinct objects)

Files:

Linked Geo Data 2015-11-02

This dataset is larger, i.e.:

Storage stats: 37923 triples (6205 distinct subjects, 369 distinct predicates, 14772 distinct objects)

Files at http://downloads.linkedgeodata.org/releases/2015-11-02/

[- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Abutters.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Abutters.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerialwayThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerialwayThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerowayThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerowayThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Amenity.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Amenity.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-BarrierThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-BarrierThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Boundary.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Boundary.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Craft.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Craft.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-CyclewayThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-CyclewayThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-EmergencyThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-EmergencyThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-HistoricThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-HistoricThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Leisure.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Leisure.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-LockThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-LockThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-ManMadeThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-ManMadeThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-MilitaryThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-MilitaryThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Office.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Office.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Place.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Place.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PowerThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PowerThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PublicTransportThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PublicTransportThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RailwayThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RailwayThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RouteThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RouteThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Shop.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Shop.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-SportThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-SportThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-TourismThing.node.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-TourismThing.way.sorted.nt.bz2
- http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-18-CyclewayThing.node.sorted.nt.bz2
](http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Abutters.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Abutters.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerialwayThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerialwayThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerowayThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerowayThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Amenity.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Amenity.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-BarrierThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-BarrierThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Boundary.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Boundary.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Craft.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Craft.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-CyclewayThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-CyclewayThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-EmergencyThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-EmergencyThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-HistoricThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-HistoricThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Leisure.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Leisure.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-LockThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-LockThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-ManMadeThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-ManMadeThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-MilitaryThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-MilitaryThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Office.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Office.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Place.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Place.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PowerThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PowerThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PublicTransportThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PublicTransportThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RailwayThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RailwayThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RouteThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RouteThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Shop.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Shop.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-SportThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-SportThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-TourismThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-TourismThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-18-CyclewayThing.node.sorted.nt.bz2)http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Abutters.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Abutters.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerialwayThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerialwayThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerowayThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-AerowayThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Amenity.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Amenity.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-BarrierThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-BarrierThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Boundary.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Boundary.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Craft.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Craft.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-CyclewayThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-CyclewayThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-EmergencyThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-EmergencyThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-HistoricThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-HistoricThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Leisure.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Leisure.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-LockThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-LockThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-ManMadeThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-ManMadeThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-MilitaryThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-MilitaryThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Office.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Office.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Place.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Place.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PowerThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PowerThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PublicTransportThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-PublicTransportThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RailwayThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RailwayThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RouteThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-RouteThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Shop.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-Shop.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-SportThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-SportThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-TourismThing.node.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-02-TourismThing.way.sorted.nt.bz2
http://downloads.linkedgeodata.org/releases/2015-11-02/2015-11-18-CyclewayThing.node.sorted.nt.bz2

Script to download them like bash download_script.sh lgd_links.txt:

#!/bin/bash

# Check if the file path is provided as an argument
if [ -z "$1" ]; then
    echo "Please provide the file path as an argument."
    exit 1
fi

# Read the file line by line
while IFS= read -r link; do
    # Skip empty lines
    if [ -z "$link" ]; then
        continue
    fi

    # Download the file using wget
    echo "Downloading: $link"
    wget "$link"
    echo "Download complete!"

done < "$1"

Extract them:

bzip2 -d *

Concat them:

cat *.nt > lgd-2015-11-02-all.nt

This dataset has the shortcoming that it is not provided in rdf but must be converted. Our converter covers only parts of it at the moment and does not create any ontological data.

Download:

Convert it to java with the following Python3 script:

import gzip


def escape_turtle_string(input_string):
    escape_characters = {
        '\\': '\\\\',
        '"': '\\"',
        '\n': '\\n',
        '\t': '\\t',
        '\r': '\\r',
    }
    escaped_string = ""
    for char in input_string:
        if char in escape_characters:
            escaped_string += escape_characters[char]
        else:
            escaped_string += char
    return escaped_string


output_path = "wikidata5m.ttl"

with open(output_path, 'w') as of:
    input_gzip_path = "wikidata5m_text.txt.gz"
    prefixes = {"wd": "http://www.wikidata.org/entity/",
                "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
                "ent": "http://www.wikidata.org/entity/",
                "prop": "http://www.wikidata.org/wiki/Property:"}

    for k, v in prefixes.items():
        of.write(f"@prefix {k}: <{v}> .\n")
        # print(f"@prefix {k}: <{v}> .")

    with gzip.open("wikidata5m_all_triplet.txt.gz", 'rt') as input:

        for line in input:
            s, p, o = line.split()
            l = [s, p, o]
            out = []
            for y in l:
                if y[0] == "Q":
                    out.append("ent:" + y)
                elif y[0] == "P":
                    out.append("prop:" + y)
                else:
                    out.append(y)
            of.write(f"{out[0]} {out[1]} {out[2]} .\n")
    with gzip.open("wikidata5m_text.txt.gz", 'rt') as input:
        for line in input:
            sep_pos = line.find("\t")  # it is tab seperated
            id = line[0:sep_pos]
            comment = escape_turtle_string(line[sep_pos + 1:-1])  # -1 to remove the trailing \n
            of.write(f"<wd:{id}> <rdfs:comment> \"{comment}\" .\n")