Skip to content

Commit

Permalink
community: Apache AGE wrapper. Ensure Node Uniqueness by ID. (#28759)
Browse files Browse the repository at this point in the history
**Description:**

The Apache AGE graph integration incorrectly handled node merging,
allowing duplicate nodes with different IDs but the same type and other
properties. Unlike
[Neo4j](https://github.com/langchain-ai/langchain/blob/cdf62021569dd7f02b35679b46ee6abe92f02cb7/libs/community/langchain_community/graphs/neo4j_graph.py#L47),
[Memgraph](https://github.com/langchain-ai/langchain/blob/cdf62021569dd7f02b35679b46ee6abe92f02cb7/libs/community/langchain_community/graphs/memgraph_graph.py#L50),
[Kuzu](https://github.com/langchain-ai/langchain/blob/cdf62021569dd7f02b35679b46ee6abe92f02cb7/libs/community/langchain_community/graphs/kuzu_graph.py#L253),
and
[Gremlin](https://github.com/langchain-ai/langchain/blob/cdf62021569dd7f02b35679b46ee6abe92f02cb7/libs/community/langchain_community/graphs/gremlin_graph.py#L165),
it did not use the node ID as the primary identifier for merging.

This inconsistency caused data integrity issues and unexpected behavior
when users expected updates to specific nodes by ID.

**Solution:**
This PR modifies the `node_insert_query` to `MERGE` nodes based on label
and ID *only* and updates properties with `SET`, aligning the behavior
with other graph database integrations. The `_format_properties` method
was also modified to handle id overrides.

**Impact:**

This fix ensures data integrity by preventing duplicate nodes, and
provides a consistent behavior across graph database integrations.
  • Loading branch information
GMartin-dev authored Dec 17, 2024
1 parent cdf6202 commit 3a1d053
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 3 deletions.
6 changes: 4 additions & 2 deletions libs/community/langchain_community/graphs/age_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -697,8 +697,9 @@ def add_graph_documents(
# query for inserting nodes
node_insert_query = (
"""
MERGE (n:`{label}` {properties})
"""
MERGE (n:`{label}` {{`id`: "{id}"}})
SET n = {properties}
"""
if not include_source
else """
MERGE (n:`{label}` {properties})
Expand Down Expand Up @@ -735,6 +736,7 @@ def add_graph_documents(
query = node_insert_query.format(
label=AGEGraph.clean_graph_labels(node.type),
properties=self._format_properties(node.properties),
id=node.id,
)

self.query(query)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,11 @@

test_data = [
GraphDocument(
nodes=[Node(id="foo", type="foo"), Node(id="bar", type="bar")],
nodes=[
Node(id="foo", type="foo"),
Node(id="bar", type="bar"),
Node(id="foo", type="foo", properties={"property_a": "a"}),
],
relationships=[
Relationship(
source=Node(id="foo", type="foo"),
Expand Down

0 comments on commit 3a1d053

Please sign in to comment.