-
Notifications
You must be signed in to change notification settings - Fork 0
feat: adds hybrid search for async VS interface [3/N] #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: hybrid_search_2
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Perform similarity search (or hybrid search) query on database. | ||
Queries might be slow if the hybrid search column does not exist. | ||
For best hybrid search performance, consider creating a TSV column | ||
and adding GIN index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since this is a private function, this docstring will not be visible to users. Consider moving it elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please clarify that only the hybrid search will be slow (not the normal similarity searches)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to do this for the user when creating the new table. At a minimum, we should add documentation for using hybrid search and call this out.
if hybrid_search_config.tsv_column | ||
else content_column + "_tsv" | ||
) | ||
hybrid_search_config.tsv_column = tsv_column_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only update the name if the column exists, then we don't have to manage the separate boolean?
Perform similarity search (or hybrid search) query on database. | ||
Queries might be slow if the hybrid search column does not exist. | ||
For best hybrid search performance, consider creating a TSV column | ||
and adding GIN index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to do this for the user when creating the new table. At a minimum, we should add documentation for using hybrid search and call this out.
@@ -557,16 +610,18 @@ async def __query_collection( | |||
filter_dict = None | |||
if filter and isinstance(filter, dict): | |||
safe_filter, filter_dict = self._create_filter_clause(filter) | |||
param_filter = f"WHERE {safe_filter}" if safe_filter else "" | |||
where_filters = f"WHERE {safe_filter}" if safe_filter else "" | |||
and_filters = f"AND ({safe_filter})" if safe_filter else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment on what this is or move this closer to where the code is used.
@@ -806,15 +912,38 @@ async def aapply_vector_index( | |||
index.name = self.table_name + DEFAULT_INDEX_NAME_SUFFIX | |||
name = index.name | |||
stmt = f'CREATE INDEX {"CONCURRENTLY" if concurrently else ""} "{name}" ON "{self.schema_name}"."{self.table_name}" USING {index.index_type} ({self.embedding_column} {function}) {params} {filter};' | |||
|
|||
if self.hybrid_search_config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this in the vector index method
Adds hybrid search in async vector store.
Key features: