Description
When using the PGVector
class with async_mode=True
, the metadata
field of the Document
objects returned from query methods (e.g., asimilarity_search_with_score_by_vector
) is not deserialized into a Python dict
. Instead, it remains as a Fragment
object or another non-dict type. This causes a ValidationError
when the Document
class expects metadata
to be a dictionary.
To Reproduce
Steps to reproduce the behavior:
- Initialize
PGVector
withasync_mode=True
anduse_jsonb=True
. - Add documents to the vector store with metadata.
- Perform an asynchronous similarity search, e.g.,
asimilarity_search
orasimilarity_search_with_score_by_vector
. - Observe that the returned
Document
objects havemetadata
fields that are not dictionaries.
Expected behavior
The metadata
field of the returned Document
objects should be properly deserialized into Python dictionaries, matching the behavior when async_mode=False
.
Actual behavior
When async_mode=True
, the metadata
field is a Fragment
object (from asyncpg
), leading to errors when the code expects a dict
.
Error message
ValidationError: 1 validation error for Document
metadata
Input should be a valid dictionary [type=dict_type, input_value=Fragment(buf=b'{"user_id": "ahmed"}'), input_type=Fragment]
Environment:
langchain_postgres
version: 0.0.12- Python version: 10,11,12
- Database: PostgreSQL with
pgvector
extension - Async driver:
asyncpg
Additional context
This issue arises because asyncpg
returns JSONB fields as Record
or Fragment
objects, which are not automatically deserialized into Python dictionaries by SQLAlchemy when using asynchronous sessions.
Code to Reproduce
Ensure that the required connection details like connection_string
, collection_name
, and embedding_model
are securely provided when testing the code.
from langchain_postgres.vectorstores import PGVector
# Setup the connection to PGVector
connection_string = 'your_connection_string_here'
collection_name = 'your_collection_name_here'
embedding_model = 'your_embedding_model_here'
# Initialize PGVector with the necessary parameters
vstore = PGVector(
connection=connection_string,
collection_name=collection_name,
embeddings=embedding_model,
use_jsonb=True,
pre_delete_collection=False,
async_mode=True # Set to True to reproduce the issue
)
# Add a document with metadata
vstore.add_document({"user_id": "ahmed"}, metadata={"data": "example"})
# Perform an asynchronous similarity search
result = vstore.asimilarity_search_with_score_by_vector()
print(result.metadata) # The issue: metadata is not returned as a dictionary
Proposed Solution
Modify the _results_to_docs_and_scores
method in the PGVector
class to ensure that the metadata
field is correctly converted into a dictionary before creating the Document
objects.
Related Issues:
#118