Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing the funders-zenodo-ror-data as funders.yaml crashes #218

Open
chriz-uniba opened this issue Sep 7, 2022 · 7 comments
Open

Importing the funders-zenodo-ror-data as funders.yaml crashes #218

chriz-uniba opened this issue Sep 7, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@chriz-uniba
Copy link
Contributor

Package version (if known): invenio-app-rdm 9.1.3; invenio-vocabularies 0.11.6

Describe the bug

When downloading and converting the https://zenodo.org/api/files/25d4f93f-6854-4dd4-9954-173197e7fad7/v1.1-2022-06-16-ror-data.zip into a funders.yaml and then trying to importing the funders vocabulary using a vocabularies-future.yaml and this funders.yaml we get an TypeError and a Traceback.

Steps to Reproduce

Following the documentation here: https://inveniordm.docs.cern.ch/customize/vocabularies/funding/#funders-ror

curl https://zenodo.org/api/files/25d4f93f-6854-4dd4-9954-173197e7fad7/v1.1-2022-06-16-ror-data.zip -o funders.zip
# invenio vocabularies convert --vocabulary funders --origin funders.zip --target funders.yaml
Vocabulary funders converted. Total items 102742.
102742 items succeeded
0 contained errors
0 were filtered.
# cat vocabularies-future.yaml
names:
  readers:
    - type: yaml
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity
# invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml --origin funders.yaml
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 126, in import_vocab
    config = get_config_for_ds(vocabulary, filepath, origin)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 45, in get_config_for_ds
    config["readers"][0]["args"]["origin"] = origin
TypeError: 'NoneType' object is not subscriptable

Expected behavior

Importing should work (although we probably identified four broken isni within this funders.yaml)

Additional Notes

Same, if " are added around the file-names.

# invenio vocabularies import --vocabulary funders --filepath "./vocabularies-future.yaml" --origin "./funders.yaml" 
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 126, in import_vocab
    config = get_config_for_ds(vocabulary, filepath, origin)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 45, in get_config_for_ds
    config["readers"][0]["args"]["origin"] = origin
TypeError: 'NoneType' object is not subscriptable
@chriz-uniba chriz-uniba added the bug Something isn't working label Sep 7, 2022
@chriz-uniba
Copy link
Contributor Author

chriz-uniba commented Sep 7, 2022

There is an open PR for the documentation: https://github.com/inveniosoftware/docs-invenio-rdm/pull/398/files

cat vocabularies-future.yaml 
funders:
  readers:
    - type: yaml
      args:
          orgin: "funders.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity

names needs to be funders and the origin needs to be given.

Then you can use the following and it seems to work1

invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml --origin funders.yaml

When removing the origin in the import then it is crashing

# invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 127, in import_vocab
    success, errored, filtered = _process_vocab(config, num_samples)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 81, in _process_vocab
    for result in ds.process():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 50, in process
    for stream_entry in self.read():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 85, in read
    yield from pipe_gen(read_gens)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 70, in pipe_gen
    for item in current_gen_func(piped_item):
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/readers.py", line 50, in read
    with open(self._origin, self._mode) as file:
TypeError: expected str, bytes or os.PathLike object, not NoneType

When setting a wrong yaml for orgin we get the following

# invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml --origin app_data/vocabularies/subjects_oecd_fos.yaml 
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]

When setting a wrong yaml in vocabularies-future.yaml

cat vocabularies-future.yaml 
funders:
  readers:
    - type: yaml
      args:
          orgin: "app_data/vocabularies/subejects_oecd_fos.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity

and calling with the right origin - it seems to work1

# invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml --origin funders.yaml

Footnotes

  1. Note: It seems to work - I do not get an immediate error message - I do not let the whole thing run through - since it takes some while to be completed. 2

@Samk13
Copy link
Member

Samk13 commented Sep 7, 2022

in vocabularies-future.yaml orgin should contain the full relative path not only the file name:

funders:
  readers:
    - type: yaml
      args:
          origin: "app_data/vocabularies/funders.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity
awards:
  readers:
    - type: yaml
      args:
          origin: "app_data/vocabularies/awards.yaml"
  writers:
    - type: awards-service
      args:
        service_or_name: awards
        identity: system_identity

funders schema should look like that:

- id: 202100-2585
  country: SE
  name: name
  title:
    en: name

the command is:

invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml

Please follow this recipe and let me know If it works

@chriz-uniba
Copy link
Contributor Author

My folder structure:

ls
app_data  docker                   docker-compose.yml  docker-services.yml  funders.zip  logs     Pipfile.lock  static     vocabularies-future.yaml
assets    docker-compose.full.yml  Dockerfile          funders.yaml         invenio.cfg  Pipfile  README.md     templates

So the file vocabularies-future.yaml and the funders.yaml are lying at the same level. So I guess the relative path should be right.

@Samk13
Copy link
Member

Samk13 commented Sep 7, 2022

and still not working?
How about putting future and funders.yaml inside app_data and app_data/vocabulary respectively and adjust the paths?
will still not work?

@chriz-uniba
Copy link
Contributor Author

So if we change the paths:

[root@d94b497621ee app_data]# ls
README.md  vocabularies  vocabularies-future.yaml  vocabularies.yaml

setting the path relative to function call (src calls the import --- what I have never done so far, because for me that doesn't make any sense)

[root@d94b497621ee app_data]# cat vocabularies-future.yaml 
funders:
  readers:
    - type: yaml
      args:
          orgin: "app_data/vocabularies/funders.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity
[root@d94b497621ee app_data]# ls vocabularies/
affiliations_ror.yaml  funders.yaml  subjects_oecd_fos.yaml
[root@d94b497621ee src]# invenio vocabularies import --vocabulary funders --filepath ./app_data/vocabularies-future.yaml 
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 127, in import_vocab
    success, errored, filtered = _process_vocab(config, num_samples)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 81, in _process_vocab
    for result in ds.process():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 50, in process
    for stream_entry in self.read():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 85, in read
    yield from pipe_gen(read_gens)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 70, in pipe_gen
    for item in current_gen_func(piped_item):
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/readers.py", line 50, in read
    with open(self._origin, self._mode) as file:
TypeError: expected str, bytes or os.PathLike object, not NoneType

alternatively: when setting the path relative to vocabularies-future.yaml (something we have done at several other places successfully already)

[root@d94b497621ee src]# cat app_data/vocabularies-future.yaml 
funders:
  readers:
    - type: yaml
      args:
          orgin: "./vocabularies/funders.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity
[root@d94b497621ee src]# invenio vocabularies import --vocabulary funders --filepath ./app_data/vocabularies-future.yaml 
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 127, in import_vocab
    success, errored, filtered = _process_vocab(config, num_samples)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 81, in _process_vocab
    for result in ds.process():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 50, in process
    for stream_entry in self.read():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 85, in read
    yield from pipe_gen(read_gens)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 70, in pipe_gen
    for item in current_gen_func(piped_item):
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/readers.py", line 50, in read
    with open(self._origin, self._mode) as file:
TypeError: expected str, bytes or os.PathLike object, not NoneType

@Samk13
Copy link
Member

Samk13 commented Sep 7, 2022

you should check your paths maybe you are adding ./ or you put the path in a string "app_data/vocabularies-future.yaml" in the command, double-check your paths there are no other issues other than that I think.

@chriz-uniba
Copy link
Contributor Author

chriz-uniba commented Sep 7, 2022

okay - I will not go on with testing paths here but sum up:

  • there seems to be something weird with the paths and how the imports-command work
    • e.g., it looks like I can put anything? in the origin within the vocabularies-future as long it is there and in the end it takes the origin from the import-command - however, if there is no origin in the vocabularies-future the import-command fails.
  • even if my paths are wrong, then it is not very intuitive to do it right
  • there are various more issues than just the "can I leave out the origin when calling the import-command"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants