Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest data from the TDC data sources into the portal #166

Open
5 of 8 tasks
Uchechukwu-Onye-Igbo opened this issue Oct 18, 2024 · 8 comments
Open
5 of 8 tasks

Ingest data from the TDC data sources into the portal #166

Uchechukwu-Onye-Igbo opened this issue Oct 18, 2024 · 8 comments
Assignees

Comments

@Uchechukwu-Onye-Igbo
Copy link

Uchechukwu-Onye-Igbo commented Oct 18, 2024

As a user of the TDC portal, I want the data from the priority TDC data sources to live in the portal so that I can access and preview the data directly through the portal.

The sources to ingest are:

Tasklist

  • Prepare ingestion script for JRC source
  • Prepare ingestion script for GFEI
  • Prepare ingestion script for EUROSTAT
  • Publish all 3 sources to our data portal
  • Setup Github CI/CD for running on regular basis

Acceptance Criteria

Given that I am interested in a data from any of the priority sources on the TDC portal,

  • I can preview and interact with the data directly in the portal
  • I can download the full data directly from the portal without having to be redirected out of the portal
  • Data will be updated on regular basis with keeping previous versions of the resource
@Verena205
Copy link

The other ones should be Eurostat (https://tdc-data-portal.vercel.app/search?data_provider="EUROSTAT") and GFEI data --> @khaeru will provide more information on this

@Verena205
Copy link

The GFEI is related to the #17
The dataset is yet to be included in the website

@khaeru
Copy link

khaeru commented Oct 30, 2024

  • The original GFEI data is at https://doi.org/10.5281/zenodo.10148348.
  • We have code at https://github.com/transport-data/gfei-2023 that converts these to SDMX.
  • @PierpCazzola or his collaborator Leonardo Paoli will be updating the Zenodo record to include the converted SDMX files; hopefully in the next week or so.
  • IMHO this data source should not be special-cased; rather we need a general-purpose capability to ‘ingest’ any record from Zenodo and similar permanent archives to a corresponding TDC CKAN record. This could involve:
    • Transforming Zenodo metadata to TDC metadata.
    • Displaying a clear link to the original Zenodo record.
    • Either mirroring the files, or linking to the files in their original location. (I am not sure what value is added by mirroring.)
    • Monitoring and refreshing the TDC CKAN record as the Zenodo record is updated.

@luccasmmg
Copy link
Contributor

@khaeru im 99% that is not the case but just want to make it clear, this general purpose action where we could provide any random url from Zenodo and create ckan datasets based on that, would not require us to convert the data into SDMX format, since im assuming thats quite dependant on the shape of the file

@nicolas-becker
Copy link
Contributor

Hi @luccasmmg & @Mikanebu , the SDMX version of the GFEI data was shared with us today.

Regarding your last comment, the general-purpose capability of ingesting data sources from Zenodo proposed by @khaeru (please correct me if I am wrong) is unrelated to it being SDMX formatted or not, as you noticed correctly. The SDMX conversion is rather related to the capability of providing a preview of the data on our platform and to further process/merge it with similarly formatted data sources. Therefor, you can use this SDMX data source as a use case to test the tabular preview of the TDC Formatted data (#171).

@Uchechukwu-Onye-Igbo
Copy link
Author

Hello @nicolas-becker and @khaeru

You would recall we agreed to ingest three data sources: JRC-IDEES, EUROSTAT, and GFEI. Additionally, since the GFEI source has a version SDMX format, we will include it as a second resource - just like you said we would in the call yesterday.

cc @Mikanebu @luccasmmg

@nicolas-becker
Copy link
Contributor

Hi @Uchechukwu-Onye-Igbo , yes this is what we agreed on. But if you add this sdmx version as a second resource, it would still be visualized in the dataset preview, right? This is why we have the following dropdown menu:

image

My comment was refering to the question of @luccasmmg , who is implementing the visualization of SDMX compliant data (or TDC Formatted data). The idea is that this dataset could be used by him to test this functionality.

@Uchechukwu-Onye-Igbo
Copy link
Author

Hi @Uchechukwu-Onye-Igbo , yes this is what we agreed on. But if you add this sdmx version as a second resource, it would still be visualized in the dataset preview, right? This is why we have the following dropdown menu:

image

My comment was refering to the question of @luccasmmg , who is implementing the visualization of SDMX compliant data (or TDC Formatted data). The idea is that this dataset could be used by him to test this functionality.

Okay. @luccasmmg kindly take a look on this comment and revert to @nicolas-becker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants