Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding documentation, loading and creating datasets #33

Open
infinite-dao opened this issue Mar 5, 2020 · 9 comments
Open

Understanding documentation, loading and creating datasets #33

infinite-dao opened this issue Mar 5, 2020 · 9 comments

Comments

@infinite-dao
Copy link

Hi,

I found, that it does not create the dataset, the documentation is documenting, I guess it is not FUSEKI_DATASET1 but FUSEKI_DATASET_1 … ;-) see

docker run -d --name fuseki -p 3030:3030 \
  -e FUSEKI_DATASET_1=mydataset \
  -e FUSEKI_DATASET_2=otherdataset \
  stain/jena-fuseki

The other thing I have a hard time to figure it out, is how to use volumes and to load data into that empty created database from before. I have no clue yet. I try to load just rdfs into mydataset.

  • Can one run and create empty datasets of the fuseki and load data at the same time?
  • Can you kindly provide a complete minimum example from scratch for loading data via command line into the before created empty dataset please? I do not understand how container fuseki-data gets to know about the empty created database at start up, in which I want to load my data via command line.

Thank you

@infinite-dao
Copy link
Author

Aiming for persistence of data, I do not understand the meant use of the load.sh step in relation to a previously created fuseki-app and a data container, e.g. fuseki-data (busybox).

Two scenarios: (one) if I load data immediately after fuseki-data (busybox), I get a database ERROR Does not exist: /fuseki/databases/cetaf-test/:

# start anew from scratch
docker ps -a
# CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
# no container is running
docker run --name fuseki-data -v /fuseki busybox # data container
docker run  --name fuseki-loadsh-use-fuseki-data \
    --publish 3030:3030 -e ADMIN_PASSWORD=pw123 \
  --volumes-from fuseki-data \
  --volume /home/aplank/sandbox/staging:/staging \
  stain/jena-fuseki \
  ./load.sh cetaf-test Thread-1_herbarium.bgbm.org.rdf
# ERROR Does not exist: /fuseki/databases/cetaf-test/

So I guess I have to create a fuseki-app container first(?).

Scenario two:

docker run --name fuseki-data -v /fuseki busybox # data container
docker run --name fuseki-app --detach --publish 3030:3030 \
  -e ADMIN_PASSWORD=pw123 --volumes-from fuseki-data \
  stain/jena-fuseki
docker logs fuseki-app # look fine
docker stop fuseki-app
docker ps --all
# CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS                        PORTS               NAMES
# 35f00de23352        stain/jena-fuseki   "/docker-entrypoint.…"   About a minute ago   Exited (137) 18 seconds ago                       fuseki-app
# b5e5195b55ea        busybox             "sh"                     5 minutes ago        Exited (0) 5 minutes ago                          fuseki-data
# import RDF data to a brand new data base cetaf-test
docker run  --name fuseki-loadsh-use-fuseki-data \
  --volumes-from fuseki-data \
  -v /home/aplank/sandbox/staging:/staging \
  stain/jena-fuseki \
  ./load.sh cetaf-test Thread-1_herbarium.bgbm.org.rdf
# works so far but what shall I do with container fuseki-loadsh-use-fuseki-data ?

I did in scenario two:

  • fuseki-data (busybox)
  • fuseki-app (intermediate step: start and stop container)
  • fuseki-loadsh-use-fuseki-data (import OK)

In this example, then what?

  • Is fuseki-loadsh-use-fuseki-data meant to become the fuseki interface instead of fuseki-app ?
    • I did it but connecting via browser using the IP I get ERR_CONNECTION_REFUSED, I tried: http://1.2.3.4:3030 (of course IP 1.2.3.4 is just exemplary)
  • Shall I stop fuseki-loadsh-use-fuseki-data and restart fuseki-app ?
    • I did it but there were no data
    • or it leads to database locks errors like ERROR Exception in initialization: Process ID 7 can't open database at location /fuseki/system/ because it is already locked by the process with PID 9. TDB databases do not permit concurrent usage across JVMs so in order to prevent possible data corruption you cannot open this location from the JVM that does not own the lock for the dataset
  • or let fuseki-loadsh-use-fuseki-data run and restart fuseki-app ? Leads to database locks errors
    • I did it but it leads to database locks errors like ERROR Exception in initialization: Process ID 7 can't open database at location /fuseki/system/ because it is already locked by the process with PID 8. TDB databases do not permit concurrent usage across JVMs so in order to prevent possible data corruption you cannot open this location from the JVM that does not own the lock for the dataset

What scenario is meant to function for persistence of data? Here I would like you to provide a minimum example that works in itself. Can you provide it please or point me to my mistakes? Thank you

@kinow kinow mentioned this issue May 2, 2020
@infinite-dao
Copy link
Author

I found that the error …
ERROR Exception in initialization: Process ID 7 can't open database at location /fuseki/system/ because it is already locked by the process with PID 8. TDB databases do not permit concurrent usage across JVMs so in order to prevent possible data corruption you cannot open this location from the JVM that does not own the lock for the dataset
… is due to missing command ps; solution is to install it inside the fuseki container first (or even better @stain : fix and add it toDockerfile like so bash curl ca-certificates findutils coreutils pwgen procps).

Manually you can fix the missing ps command like this:

# go into the running container fuseki-app (it has only primitive core bash by default)
docker exec -it fuseki-app bash 
# root@ffcd017e0b51:/jena-fuseki# 
  # inside the container fuseki-app
  apt-get update;
# fix dependencies of apache-fuseki to use ps command line tool
  apt-get install -y --no-install-recommends procps 

# optionally add some tools you need
  apt-get install -y --no-install-recommends vim nano tree # add some useful tools: editor vim, nano and listing with tree tool
  apt-get install -y --no-install-recommends ruby-full  # for data import with SPARQL over HTTP via ruby /jena-fuseki/bin/s-put and commands

# do your commands needed inside the container and eventually exit the docker container
exit

@stain
Copy link
Owner

stain commented May 25, 2021

Assuming #50 fixes this, can we close this issue? Or should we make a different issue to add ruby-full for s-put?

@bzar
Copy link

bzar commented Jun 27, 2022

This is probably the most relevant place to leave this for anyone else seeking answers since it's about the same error message:

I also bumped into the database locking issue, but it was caused by tdb.lock files in /fuseki/system and /fuseki/databases/dataset-name-here. Clearing those files before running the fuseki-server fixed the issue.

@kuzeko
Copy link
Collaborator

kuzeko commented Dec 14, 2022

@infinite-dao can you please update us if this if fixed for you now?

@qertis
Copy link

qertis commented Jan 10, 2023

+1

My config:

version: "3.8"
services:

  fuseki:
    image: 'stain/jena-fuseki'
    container_name: 'fuseki'
    ports:
      - '3030:3030'
    volumes:
      - fuseki-data:/fuseki

volumes:
  fuseki-data:

I login and create new database TS. And after run I see this Exception

ERROR Exception in initialization: caught: Process ID 7 can't open database at location /fuseki/databases/TS/ because it is already locked by the process with PID 17. TDB databases do not permit concurrent usage across JVMs so in order to prevent possible data corruption you cannot open this location from the JVM that does not own the lock for the dataset

@infinite-dao
Copy link
Author

infinite-dao commented Jan 12, 2023

@infinite-dao can you please update us if this if fixed for you now?

The ps error is gone, and it works (for the main parts) as expected, I tested locally:

docker run --name fuseki-data_test20230112 --volume /fuseki busybox 
# to create the data container

docker run --name fuseki-app_test20230112 --detach --publish 3030:3030 \
  --env FUSEKI_DATASET_1=mydataset   \
  --env FUSEKI_DATASET_2=otherdataset   \
  --env ADMIN_PASSWORD="some-other-password" \
  --env JVM_ARGS=-Xmx2g \
  --volume /path/to/my/import/data:/import-data \
  --volumes-from fuseki-data_20230112 \
  stain/jena-fuseki:4.0.0

The fuseki interface did run, no log errors, but I expected that it should have created empty datasets, but there were none showing up in the fuseki interface, I could of course create some new using the interface, with no log errors. Restart the containers also no log errors.

But in the end I did not dig deeper into where, any why, the empty datasets were not created. For me it works OK.

@infinite-dao
Copy link
Author

infinite-dao commented Jan 12, 2023

ERROR Exception in initialization: caught: Process ID 7 can't open database at location /fuseki/databases/TS/ because it is already locked by the process with PID 17. TDB databases do not permit concurrent usage across JVMs so in order to prevent possible data corruption you cannot open this location from the JVM that does not own the lock for the dataset

What Fuseki Version are you running on? (Check e.g. like docker logs "the-name-of-my-fuseki-app-container")

@infinite-dao
Copy link
Author

infinite-dao commented Jan 18, 2023

I still struggle with the README and the meant intention of “load” data into an existing container or afresh. After testing around, the point seems to me: the term “loading“ is describing it ambigous, it is not loading (in essence) meant to function (on running data sets), but only create from-data, i.e. only for the first time. Is that correct?

Fuseki Server itself is meant to import data when the fuseki server itself is offline, because on a running fuseki server attempting to load data, it will not let you, and points you out that the database is locked (tdb.lock), so I can understand that philosophy.

So the question remains: How can one do that loading procedure the right way for an existing database? (e.g. using tdbloader2-wrapper aso.)

  • Is loading data meant for only creating data or is it also meant to load data into an existing database with data already in it?
  • How can one stop the fuseki-server without stopping the docker container? (to use the tdbloader2-wrapper for existing data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants