Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement experimental ai-powered search #398

Merged
merged 13 commits into from
Jan 11, 2025

Conversation

memishood
Copy link
Contributor

Pull Request

Related issue

Fixes #369

What does this PR do?

I've implemented all the required steps mentioned in the issue above. I made every effort to respect the existing project conventions. I hope this pull request will be helpful and add value to Meilisearch 💜

PR: I didn't uncomment vector search tests in search_tests.dart as I'm focused on embedders capability.

Some technical points that I might have considered wrong in the pull-req

  • Ensure semanticRatio is between 0.0 and 1.0
  • Throw exception when embedder source is not one of them: openAi, huggingFace, userProvided, rest, ollama

Any comment & feedback is appreciated!

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

@ahmednfwela ahmednfwela self-requested a review December 27, 2024 09:17
Copy link
Collaborator

@ahmednfwela ahmednfwela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great contribution!

I have provided a quick initial review for now, will continue once current changes are solved.

Comment on lines 29 to 32
String? get openAiKey {
return Platform.environment['OPEN_AI_API_KEY'];
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not work if the tests are run on web

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest also adding documentation in CONTRIBUTING.md for users who want to run tests to set this env variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code to cover flutter web, and updated the documentation in CONTRIBUTING.md file

required this.source,
});

Map<String, Object?> toMap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since source is shared between all embedders, this can be changed to:

Suggested change
Map<String, Object?> toMap();
Map<String, Object?> toMap() {
'source': source,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code, now it is mapped only once in the base class as you described


@override
Map<String, Object?> toMap() => {
'source': source,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all embedders: this can now be changed to

Suggested change
'source': source,
...super.toMap(),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

'userProvided' => UserProvidedEmbedder.fromMap(map),
'rest' => RestEmbedder.fromMap(map),
'ollama' => OllamaEmbedder.fromMap(map),
_ => throw Exception('unexpected embedder source'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a breaking change if meilisearch adds new embedder versions, so I suggest adding an UnkownEmbedder class that stores the map as is.

Copy link
Contributor Author

@memishood memishood Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes more sense, however, I guess I'll need to update the structure a bit. It seems that I need to either remove source field in the base class or make it nullable as unknown embedder's source field might not be defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code, now we return unknown embedder with the original data when the source is not handled.

this.url,
this.documentTemplateMaxBytes,
this.binaryQuantized,
}) : super(source: 'openAi');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all embedders: we are typing the source twice, which might lead to typos, once here and once in Embedder.fromMap, we should probably unify this in a static const sourceValue = 'openAi';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code, now it is typed only once

Comment on lines 17 to 20
Map<String, Object?> toMap() => {
'mean': mean,
'sigma': sigma,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is used in query parameters, it should automatically exclude null values (as per meilisearch conventions), the logic for this is already present in Queryable class, so it can be uesd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distribution isn't used in the query parameters; it is a field of some embedders, and the fields in this class aren't actually nullable.

@memishood
Copy link
Contributor Author

Thanks for your great contribution!

I have provided a quick initial review for now, will continue once current changes are solved.

Hi, thanks for the good feedback.

I'll be updating the code based on your comments soon.

@memishood memishood marked this pull request as draft December 27, 2024 14:38
@memishood memishood marked this pull request as ready for review December 27, 2024 14:47
@memishood
Copy link
Contributor Author

@ahmednfwela any update here?

@ahmednfwela
Copy link
Collaborator

I will do another review now, thanks for your contribution! @memishood

@ahmednfwela
Copy link
Collaborator

@memishood since the fork belongs to an organization I can't directly push changes to your PR, so can you please merge main into this ?

@@ -590,6 +591,60 @@ void main() {
// });
// });

group('Embedders', () {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests need to be disabled if the openAI API key was not present, so that CI can pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i have updated it

@memishood memishood requested a review from ahmednfwela January 10, 2025 22:27
Copy link

codecov bot commented Jan 11, 2025

Codecov Report

Attention: Patch coverage is 24.34211% with 115 lines in your changes missing coverage. Please review.

Project coverage is 75.19%. Comparing base (ec68546) to head (d728b29).
Report is 69 commits behind head on main.

Files with missing lines Patch % Lines
lib/src/settings/embedder.dart 20.51% 93 Missing ⚠️
lib/src/index.dart 0.00% 13 Missing ⚠️
lib/src/query_parameters/hybrid_search.dart 0.00% 5 Missing ⚠️
lib/src/settings/index_settings.dart 66.66% 2 Missing ⚠️
lib/src/query_parameters/index_search_query.dart 0.00% 1 Missing ⚠️
lib/src/query_parameters/search_query.dart 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #398      +/-   ##
==========================================
- Coverage   75.83%   75.19%   -0.64%     
==========================================
  Files          55       60       +5     
  Lines        1320     1524     +204     
==========================================
+ Hits         1001     1146     +145     
- Misses        319      378      +59     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ahmednfwela
Copy link
Collaborator

I have started a PR of my own suggested edits here: Phosum#1
feel free to give me any comments

improvements to "implement experimental ai-powered search meilisearch#398"
test/search_test.dart Outdated Show resolved Hide resolved
@ahmednfwela
Copy link
Collaborator

one last change to fix the test and we can merge 👍

@ahmednfwela
Copy link
Collaborator

bors merge

meili-bors bot added a commit that referenced this pull request Jan 11, 2025
398: implement experimental ai-powered search r=ahmednfwela a=memishood

# Pull Request

## Related issue
Fixes #369 

## What does this PR do?
I've implemented all the required steps mentioned in the issue above. I made every effort to respect the existing project conventions. I hope this pull request will be helpful and add value to Meilisearch 💜

**PR:** I didn't uncomment vector search tests in search_tests.dart as I'm focused on embedders capability.

### Some technical points that I might have considered wrong in the pull-req
- Ensure `semanticRatio` is between 0.0 and 1.0
- Throw exception when embedder source is not one of them: `openAi`, `huggingFace`, `userProvided`, `rest`, `ollama`

**Any comment & feedback is appreciated!**

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Emre Memis <[email protected]>
Co-authored-by: ahmednfwela <[email protected]>
Co-authored-by: Emre <[email protected]>
Copy link
Contributor

meili-bors bot commented Jan 11, 2025

This PR was included in a batch that successfully built, but then failed to merge into main. It will not be retried.

Additional information:

{"message":"1 review requesting changes by reviewers with write access.","documentation_url":"https://docs.github.com/articles/about-protected-branches","status":"422"}

@ahmednfwela ahmednfwela self-requested a review January 11, 2025 16:09
Copy link
Collaborator

@ahmednfwela ahmednfwela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahmednfwela
Copy link
Collaborator

bors merge

Copy link
Contributor

meili-bors bot commented Jan 11, 2025

Build succeeded:

@meili-bors meili-bors bot merged commit be0ecb9 into meilisearch:main Jan 11, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v1.6] Implement vector search experimental feature
2 participants