Skip to content

Proof-of-concept system for grounding large language models through interaction with remote RDF cube stores

Notifications You must be signed in to change notification settings

Marios-Mamalis/LLM_sparql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM SPARQL

Official implementation for the proof of concept system presented in the paper "Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot".

Overview

The system aims to demonstrate the feasibility of retrieving relevant information from a remote RDF cube-based database by executing meaningful SPARQL queries, generated by large language models, in response to user questions, in an effort to provide factually grounded responses and avoid the phenomenon of LLM hallucinations.

The system is tailored to the open goverment data portal of Scotland, and all retrievals are conducted through SPARQL queries made to the portal's API endpoint. This version uses OpenAI's GPT-3.5 Turbo (checkpoint 0613) as the LLM, and text-embedding-ada-002 as the embeddings model, however, any OpenAI model can be used for either role. ChromaDB serves as the underlying vector store.

The retrieval process starts with the user's natural language input. The LLM embeds this input and stores it in a ChromaDB, using embedding cosine similarity to select the relevant dataset. The system then iteratively retrieves and filters necessary data from the SPARQL endpoint, such as dataset measures and dimensions, refining the original query along the process. Once the final query is crafted, it is used to retrieve the desired data. The LLM then structures the response and presents it to the user, seamlessly integrating the result into the ongoing conversation.

For a detailed overview of how the components interact, refer to the figure below, illustrating the system architecture:

Installation

Requires Python 3.9
Install dependencies:

pip install -r requirements.txt

Citation

If you used the code in the main branch of the repository, please consider citing the corresponding paper:

@inproceedings{mamalis2023can,
  title={Can large language models revolutionalize open government data portals? a case of using chatgpt in statistics. gov. scot},
  author={Mamalis, Marios Evangelos and Kalampokis, Evangelos and Karamanou, Areti and Brimos, Petros and Tarabanis, Konstantinos},
  booktitle={Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics},
  pages={53--59},
  year={2023}
}

About

Proof-of-concept system for grounding large language models through interaction with remote RDF cube stores

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages