Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added store_many_vectors on Mongo Storage #87

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

etudor
Copy link

@etudor etudor commented Nov 20, 2019

Because I found this method to be very useful on Redis storage, I've added it to the mongo storage as well.

I haven't done a benchmark to compare what is the speed increase vs single inserts.

Copy link
Collaborator

@amorgun amorgun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests for store_many_vectors to the mongo test suite?

# Push JSON representation of dict to end of bucket list
self.mongo_object.insert_one(val_dict)

def _get_vector(self, hash_name, bucket_key, v, data):
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring belongs to store_vector method

def store_many_vectors(self, hash_name, bucket_keys, vs, data):
requests = []

for v, d, bk in zip(vs, data, bucket_keys):
Copy link
Collaborator

@amorgun amorgun Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using from future.builtins import zip because it is more efficient in python2.7.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code, thanks

def store_many_vectors(self, hash_name, bucket_keys, vs, data):
requests = []

for v, d, bk in zip(vs, data, bucket_keys):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code, thanks

@@ -147,7 +166,7 @@ def get_bucket(self, hash_name, bucket_key):
shape=(val_dict['dim'], 1))

else:
vector = numpy.fromstring(val_dict['vector'],
vector = numpy.frombuffer(val_dict['vector'],
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this because I got some deprecation warnings

{'lsh': {'$regex': self._format_hash_prefix(hash_name)}})

def clean_all_buckets(self):
"""
Removes all buckets from all hashes and their content.
"""
self.mongo_object.remove(
self.mongo_object.delete_many(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove method is deprecated so I replaced it with the suggested delete_many. It avoids annoying deprecation warnings

Copy link
Author

@etudor etudor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amorgun I added tests for the store_many_vectors method

@amorgun
Copy link
Collaborator

amorgun commented Nov 21, 2019

@etudor The test is broken because it cannot import InsertOne. You should add pymongo to test requirements.

@etudor
Copy link
Author

etudor commented Nov 29, 2019

@amorgun I have updated this

@amorgun
Copy link
Collaborator

amorgun commented Nov 29, 2019

@etudor It looks like a lot of mongo tests are broken now in python2.7. Please, check if it is related to your changes. Maybe you should pin an older version of pymongo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants