Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unable to delete entities #37725

Open
1 task done
qiulingdong opened this issue Nov 15, 2024 · 3 comments
Open
1 task done

[Bug]: Unable to delete entities #37725

qiulingdong opened this issue Nov 15, 2024 · 3 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@qiulingdong
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.4.6
- Deployment mode(standalone or cluster): standalone 
- MQ type(rocksmq, pulsar or kafka):  None 
- SDK version(e.g. pymilvus v2.0.0rc2):  pymilvus  v.2.4.9
- OS(Ubuntu or CentOS): Ubuntu 22.04
- CPU/Memory: AMD 64C / 256G
- GPU: 96G
- Others:

Current Behavior

I created a strongly consistent collection and inserted multiple records. Frequently, when deleting data, I encounter situations where the data cannot be deleted properly. There are several phenomena:

  1. Data can be deleted normally.
  2. Data cannot be deleted.
  3. Some of the data that meets the conditions is deleted, but the remaining part cannot be deleted.

Using code

#setting
MILVUS_FIELD = [
    FieldSchema(name='doc_id', dtype=DataType.VARCHAR, max_length=64, is_primary=True, auto_id=False),
    FieldSchema(name='kb_id', dtype=DataType.VARCHAR, max_length=64, is_partition_key=True),
    FieldSchema(name='file_id', dtype=DataType.VARCHAR, max_length=64),
    FieldSchema(name="file_name", dtype=DataType.VARCHAR, max_length=256),
    FieldSchema(name='content', dtype=DataType.VARCHAR, max_length=65535),  
    FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024),
    FieldSchema(name="source", dtype=DataType.VARCHAR, max_length=1024), 
    FieldSchema(name="keywords", dtype=DataType.VARCHAR, max_length=2048), 
    FieldSchema(name="metadata", dtype=DataType.VARCHAR, max_length=16144),
    FieldSchema(name="dtype", dtype=DataType.VARCHAR, max_length=32),
]

MILVUS_INDEX_PARAMS = {
    'metric_type': 'COSINE',
    'index_type': 'IVF_FLAT',
    'params': {'nlist': 1024}
}


#create connections
connections.connect(db_name=database,host=host, port=port, user=user, password=password)
collection = Collection(name=collection_name,
                                    consistency_level=0,
                                    schema=CollectionSchema(fields=MILVUS_FIELD, enable_dynamic_field=True))
collection.create_index(field_name='embedding', index_params=MILVUS_INDEX_PARAMS)

#insert data ...
#...

#delete data
collection.load()
result = collection.query(expr="file_id=='FILE593c68a689aa4797bf59f44a3a2e8c9a'",output_fields=["doc_id","file_id"])
print(f"before delete:{result}")

result=collection.delete(expr="file_id=='FILE593c68a689aa4797bf59f44a3a2e8c9a'")
print(f"delete result:{result}")

result = collection.query(expr="file_id=='FILE593c68a689aa4797bf59f44a3a2e8c9a'",output_fields=["doc_id","file_id"],consistency_level="Strong")
print(f"end delete:{result}")

print log

before delete:data: ["{'doc_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_1', 'file_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a'}", "{'doc_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_2', 'file_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a'}", "{'doc_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_4', 'file_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a'}", "{'doc_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_6', 'file_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a'}"]

delete result:(insert count: 0, delete count: 4, upsert count: 0, timestamp: 0, success count: 0, err count: 0

end delete:data: ["{'doc_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_1', 'file_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a'}", "{'doc_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_2', 'file_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a'}", "{'doc_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_4', 'file_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a'}", "{'doc_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_6', 'file_id': 'FILE593c68a689aa4797bf59f44a3a2e8c9a'}"]

Expected Behavior

I expect the data in the Collection to be successfully deleted as it .

Steps To Reproduce

1.Deploy milvus using the docker compose file I provided.
2.Created a strongly consistent collection and inserted multiple records.
3.Run Python Code

Milvus Log

No response

Anything else?

No response

@qiulingdong qiulingdong added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 15, 2024
@yanliang567
Copy link
Contributor

@qiulingdong I guess there are duplicated doc_id when you insert into the collection. you can check it by
collection.query(expr="doc_id=='xxxxx'", output_fields=["count(*)"]

/assign @qiulingdong
/unassign

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 15, 2024
@qiulingdong
Copy link
Author

qiulingdong commented Nov 15, 2024

@qiulingdong I guess there are duplicated doc_id when you insert into the collection. you can check it by collection.query(expr="doc_id=='xxxxx'", output_fields=["count(*)"]

/assign @qiulingdong /unassign

I have tried to look up, and there is no duplicate doc_id appearing.
Additionally, I attempted to use attu to randomly generate 1000 data entries, and the same issue persists.

result = collection.query(expr="doc_id=='FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_1'", output_fields=["count(*)"])
print(f"1.count:{result}")
result = collection.query(expr="doc_id=='FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_2'", output_fields=["count(*)"])
print(f"2.count:{result}")
result = collection.query(expr="doc_id=='FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_4'", output_fields=["count(*)"])
print(f"3.count:{result}")
result = collection.query(expr="doc_id=='FILE593c68a689aa4797bf59f44a3a2e8c9a_CHUNK_6'", output_fields=["count(*)"])
print(f"4.count:{result}")

1.count:data: ["{'count()': 1}"]
2.count:data: ["{'count(
)': 1}"]
3.count:data: ["{'count()': 1}"]
4.count:data: ["{'count(
)': 1}"]

1

@xiaofan-luan
Copy link
Collaborator

could you try 2.4.15?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

3 participants