Using Open Data Hub toolkit and Operate First infrastructure for OS-Climate

Important

On June 26 2024, Linux Foundation announced the merger of its financial services umbrella, the Fintech Open Source Foundation (FINOS), with OS-Climate, an open source community dedicated to building data technologies, modeling, and analytic tools that will drive global capital flows into climate change mitigation and resilience; OS-Climate projects are in the process of transitioning to the FINOS governance framework; read more on finos.org/press/finos-join-forces-os-open-source-climate-sustainability-esg

Using Open Data Hub toolkit and Operate First infrastructure for OS-Climate

This repository is the central location for the demos the Open Services (previously AICoE) team is developing within the OS-Climate project.

This demo shows how to use the tools provided by Open Data Hub (ODH) running on the Operate First cluster to perform ETL, create training and inference pipelines, create interactive dashboards and visualizations of our data. Specifically, we will define pipelines that preprocess data, train Natural Language Processing (NLP) models, and run inference, and finally display the results on a dashboard. We adapt the data processing and inference pipelines developed by the ALLIANZ NLP team for the OS climate project in this repository. The inference pipeline takes raw pdf files as input and extracts the text content from them. Then, it uses a pre-trained language model to determine which paragraphs are relevant to answering each KPI question. Next, it uses another pre-trained language model to extract precise answers for these questions from the corresponding relevant paragraphs. Lastly, it uploads these results to a table on Trino.

The key components of the ODH infrastructure used in this demo are JupyterHub with a container image, Elyra pipelines with the Kubeflow runtime, and Apache Superset. The source data, processed data, trained model, and the output data are all stored on a bucket on the Ceph S3 storage. The following flowchart depicts the overview of different stages of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.github		.github
.jupyter		.jupyter
data		data
docs		docs
manifests		manifests
models		models
notebooks		notebooks
references		references
reports		reports
src		src
.aicoe-ci.yaml		.aicoe-ci.yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prow.yaml		.prow.yaml
.thoth.yaml		.thoth.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
_config.yml		_config.yml
_toc.yml		_toc.yml
mypy.ini		mypy.ini
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Open Data Hub toolkit and Operate First infrastructure for OS-Climate

Contents

JupyterHub Image Setup (AICoE-CI, Thoth)

Access JupyterHub Environment

Set Up Experiments

Model Components

Elyra pipeline

How to contribute

About

Releases 21

Packages

Contributors 12

Languages

License

os-climate/aicoe-osc-demo

Folders and files

Latest commit

History

Repository files navigation

Using Open Data Hub toolkit and Operate First infrastructure for OS-Climate

Contents

About

Resources

License

Stars

Watchers

Forks

Languages