Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Invalid label generated for upsert_triplet method when creating a KG for Neptune Analytics #17358

Open
kevinphillips81 opened this issue Dec 23, 2024 · 3 comments · May be fixed by #17363
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@kevinphillips81
Copy link

Bug Description

When generating a KG using the KnowledgeGraphIndex, on occasions the labels extracted from the documents include backticks. Backticks are supported by Neptune, and the code already provisions these within the query. When the label itself includes the backticks, the query fails along with the KG creation.

!pip install llama-index-llms-bedrock llama-index-graph-stores-neptune llama-index-embeddings-bedrock -q

Using version 0.3.0 of llama-index-graph-stores-neptune

Version

0.12.6

Steps to Reproduce

llm = Bedrock(model="anthropic.claude-3-haiku-20240307-v1:0")
embed_model = BedrockEmbedding(model="amazon.titan-embed-text-v1")

Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512

docs = SimpleWebPageReader(
    html_to_text=True,
    metadata_fn=lambda url:{'url': url}
).load_data(web_urls)

graph_store = NeptuneAnalyticsGraphStore(
    graph_identifier=g-abc1d23e45
)

storage_context = StorageContext.from_defaults(graph_store=graph_store)

# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
    docs,
    storage_context=storage_context,
    max_triplets_per_chunk=100,
)

Relevant Logs/Tracbacks

No response

@kevinphillips81 kevinphillips81 added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 23, 2024
Copy link

dosubot bot commented Dec 23, 2024

Hey @kevinphillips81! I'm here to help you with any bugs, questions, or contributions. Let's tackle this together!

To address the issue of invalid labels with backticks being generated by the upsert_triplet method in the llama-index-graph-stores-neptune package, you should ensure that the rel parameter is properly sanitized before being used in the query. The current implementation replaces spaces with underscores and converts the relationship type to uppercase, which is correct. However, if backticks or other invalid characters are present, you should further sanitize the rel string to remove or replace these characters.

Here's a snippet of the upsert_triplet method for reference:

def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
    """Add triplet to the graph."""
    query = """
        MERGE (n1:`%s` {id:$subj})
        MERGE (n2:`%s` {id:$obj})
        MERGE (n1)-[:`%s`]->(n2)
    """

    prepared_statement = query % (
        self.node_label,
        self.node_label,
        rel.replace(" ", "_").upper(),
    )

    self.query(prepared_statement, {"subj": subj, "obj": obj})

If the issue persists, you may need to implement additional logic to sanitize the rel string, ensuring it conforms to the valid label syntax for your specific use case [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@kevinphillips81
Copy link
Author

To mitigate, you need to update the upsert_triplet method in the /graph_stores/neptune/base.py file as follows:

def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
        """Add triplet to the graph."""
        query = """
            MERGE (n1:`%s` {id:$subj})
            MERGE (n2:`%s` {id:$obj})
            MERGE (n1)-[:`%s`]->(n2)
        """

        prepared_statement = query % (
            self.node_label.replace("`",""),
            self.node_label.replace("`",""),
            rel.replace(" ", "_").replace("`","").upper(),
        )

        self.query(prepared_statement, {"subj": subj.replace("`",""), "obj": obj.replace("`","")})

@Riddhimaan-Senapati
Copy link
Contributor

Created a PR to solve this issue using your implemented solution @kevinphillips81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
2 participants