Skip to content

feat: adds hybrid search for async VS interface [3/N] #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: hybrid_search_2
Choose a base branch
from

Conversation

vishwarajanand
Copy link
Owner

Adds hybrid search in async vector store.

Key features:

  • Insert (actually an upsert) a document with TSV column
  • Search when hybrid config is provided, in both the following cases:
    • With TSV column
    • Without TSV column

Copy link

@dishaprakash dishaprakash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Perform similarity search (or hybrid search) query on database.
Queries might be slow if the hybrid search column does not exist.
For best hybrid search performance, consider creating a TSV column
and adding GIN index.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since this is a private function, this docstring will not be visible to users. Consider moving it elsewhere

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please clarify that only the hybrid search will be slow (not the normal similarity searches)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to do this for the user when creating the new table. At a minimum, we should add documentation for using hybrid search and call this out.

if hybrid_search_config.tsv_column
else content_column + "_tsv"
)
hybrid_search_config.tsv_column = tsv_column_name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only update the name if the column exists, then we don't have to manage the separate boolean?

Perform similarity search (or hybrid search) query on database.
Queries might be slow if the hybrid search column does not exist.
For best hybrid search performance, consider creating a TSV column
and adding GIN index.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to do this for the user when creating the new table. At a minimum, we should add documentation for using hybrid search and call this out.

@@ -557,16 +610,18 @@ async def __query_collection(
filter_dict = None
if filter and isinstance(filter, dict):
safe_filter, filter_dict = self._create_filter_clause(filter)
param_filter = f"WHERE {safe_filter}" if safe_filter else ""
where_filters = f"WHERE {safe_filter}" if safe_filter else ""
and_filters = f"AND ({safe_filter})" if safe_filter else ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment on what this is or move this closer to where the code is used.

@@ -806,15 +912,38 @@ async def aapply_vector_index(
index.name = self.table_name + DEFAULT_INDEX_NAME_SUFFIX
name = index.name
stmt = f'CREATE INDEX {"CONCURRENTLY" if concurrently else ""} "{name}" ON "{self.schema_name}"."{self.table_name}" USING {index.index_type} ({self.embedding_column} {function}) {params} {filter};'

if self.hybrid_search_config:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this in the vector index method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants