-
Notifications
You must be signed in to change notification settings - Fork 81
Fix metadata deserialization in async mode for PGVector #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9cab442
512728f
015a086
0413300
d8e11ef
a56c921
33a18b9
75f254b
ad890bb
ecb7e8a
9227acd
16dfbaf
f4cdf73
27fe274
4db7c59
9038c3d
5989544
317c39a
be50ffa
7650cdc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# pylint: disable=too-many-lines | ||
from __future__ import annotations | ||
|
||
import json | ||
import contextlib | ||
import enum | ||
import logging | ||
|
@@ -1057,17 +1058,38 @@ async def asimilarity_search_with_score_by_vector( | |
|
||
def _results_to_docs_and_scores(self, results: Any) -> List[Tuple[Document, float]]: | ||
"""Return docs and scores from results.""" | ||
docs = [ | ||
( | ||
Document( | ||
id=str(result.EmbeddingStore.id), | ||
page_content=result.EmbeddingStore.document, | ||
metadata=result.EmbeddingStore.cmetadata, | ||
), | ||
result.distance if self.embeddings is not None else None, | ||
docs = [] | ||
for result in results: | ||
metadata = result.EmbeddingStore.cmetadata | ||
|
||
# Attempt to convert metadata to a dict | ||
try: | ||
if isinstance(metadata, dict): | ||
pass # Already a dict | ||
elif isinstance(metadata, str): | ||
metadata = json.loads(metadata) | ||
elif hasattr(metadata, 'buf'): | ||
# For Fragment types (e.g., from asyncpg) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. only psycopg3 is supported There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @eyurtsev, thanks for the review! I understand that only psycopg3 is officially supported. However, I’ve received reports of issues in async mode that suggest some users might be encountering non‑dict metadata (perhaps inadvertently using asyncpg or similar drivers). This patch adds defensive logic to convert metadata that isn’t already a dict (for example, when it’s a JSON string, a Fragment‑like object with a I’ve also added unit tests to simulate these scenarios and ensure the conversion works as expected. Please let me know if you’d like any adjustments or if you think we should further restrict this behavior given our psycopg3-only support. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @shamspias feel free to @ me if I don't respond quickly enough. Are you able to create a minimal reproduction against the actual vectorstore? If so, you can send me the code snippet and I'm happy to update the tests myself.
Can you confirm that this is specifically from asyncpg where you're seeing the failures? We definitely don't want to mock the results from asyncpg. If we want to support asynpcg, the way to do it is to run the full suite of tests with that driver. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @eyurtsev, I’ve put together a minimal reproduction that runs against a real Postgres instance (with pgvector) using asyncpg. The test confirms that the defensive logic for non-dict metadata triggers correctly without mocking. Let me know if you’d like the code snippet or any adjustments! |
||
metadata_bytes = metadata.buf | ||
metadata_str = metadata_bytes.decode('utf-8') | ||
metadata = json.loads(metadata_str) | ||
elif hasattr(metadata, 'decode'): | ||
# For other byte-like types | ||
metadata_str = metadata.decode('utf-8') | ||
metadata = json.loads(metadata_str) | ||
else: | ||
metadata = {} # Default to empty dict if unknown type | ||
except Exception as e: | ||
self.logger.warning(f"Failed to deserialize metadata: {e}") | ||
metadata = {} | ||
|
||
doc = Document( | ||
id=str(result.EmbeddingStore.id), | ||
page_content=result.EmbeddingStore.document, | ||
metadata=metadata, | ||
) | ||
for result in results | ||
] | ||
score = result.distance if self.embeddings is not None else None | ||
docs.append((doc, score)) | ||
return docs | ||
|
||
def _handle_field_filter( | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Uh oh!
There was an error while loading. Please reload this page.