-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Move Lucene Vector field and HNSW KNN Search as a first class feature in core #17338
Comments
@navneet1v @kotwanikunal What do you think about this? |
adding @vamshin |
I agree with this generally. IMO:
However, I think that there should not be multiple "vector" field types in the distribution - i.e. vector vs knn_vector. From a user
I dont think this is completely accurate. We deprecated nmslib in 2.19 and we support lucene. Its more so hesitation around maintenance burden another engine would pose, which can be heavy. But, I do think itd be good to open up for custom implementations. |
This seems like a reasonable architecture to move towards to me as well, but I admit I don't know specifically what it would take to accomplish it. @sam-herman Have you scoped out the effort or any high level plan on what it would take to get this done? Also, hardly the biggest issue here, but if we could get rid of the need to specify |
@andrross agree that getting rid of |
Is your feature request related to a problem? Please describe
I can't really move this issue between projects, but will be copy pasting this great suggestion by @nknize and add a little bit more context to it for the issues I'm seeing while attempting to integrate jVector opensearch-project/k-NN#2386
Is your feature request related to a problem?
Core OpenSearch does not support Vector types as a first class field. The correlation engine has a CorrelationVectorFieldMapper that uses Lucene's KNNFloatVectorField but this is in the events-correlation-engine plugin. We could move that field mapper to the core library, but we don't want to fragment between different vector field implementations. So why not move the Lucene HNSW backed vector field and Knn search as a first class field in a core library?
What solution would you like?
A discussion around making vector field type as a first class citizen in core. We've discussed this before in "person" but I don't know if we have an issue around it. I don't think there's a reason to not have Lucene vector fields and HNSW backed KNN search as a core feature and leverage the OpenSearch kNN plugin as an optional accelerator using alternative native options like FAISS or nmslib?
What alternatives have you considered?
Leave as is if there is a compelling reason to keep this base Lucene capability integration in a separate downstream plugin.
Do you have any additional context?
We were trying to extend the k-NN plugin for jVector engine and encountered several issues with the existing approach that convinced us that core would be a better fit for
vector types
and vector search going forward.The issues can be enumerated as follows:
a. build and interfaces in the plugin are quite complex and often break. This is primarily due to some of the native libraries not well thought out inclusion of source code dependencies. Also some versions are not backwards compatible etc.
b. Native memory - native memory makes a lot of difficulties to track and analyze performance issues. JVM analysis will have hard time detecting such issues and not all users would like them
The above proposal should make new extensions into OpenSearch easier and less contentious. Satisfy different community needs such as:
The text was updated successfully, but these errors were encountered: