Skip to content

bulk_* ignores errors and can cause missing data #346

Closed
@Zaczero

Description

@Zaczero

async def bulk_async(
self, collection_id: str, processed_items: List[Item], refresh: bool = False
) -> None:
"""Perform a bulk insert of items into the database asynchronously.
Args:
self: The instance of the object calling this function.
collection_id (str): The ID of the collection to which the items belong.
processed_items (List[Item]): A list of `Item` objects to be inserted into the database.
refresh (bool): Whether to refresh the index after the bulk insert (default: False).
Notes:
This function performs a bulk insert of `processed_items` into the database using the specified `collection_id`. The
insert is performed asynchronously, and the event loop is used to run the operation in a separate executor. The
`mk_actions` function is called to generate a list of actions for the bulk insert. If `refresh` is set to True, the
index is refreshed after the bulk insert. The function does not return any value.
"""
await helpers.async_bulk(
self.client,
mk_actions(collection_id, processed_items),
refresh=refresh,
raise_on_error=False,
)
def bulk_sync(
self, collection_id: str, processed_items: List[Item], refresh: bool = False
) -> None:
"""Perform a bulk insert of items into the database synchronously.
Args:
self: The instance of the object calling this function.
collection_id (str): The ID of the collection to which the items belong.
processed_items (List[Item]): A list of `Item` objects to be inserted into the database.
refresh (bool): Whether to refresh the index after the bulk insert (default: False).
Notes:
This function performs a bulk insert of `processed_items` into the database using the specified `collection_id`. The
insert is performed synchronously and blocking, meaning that the function does not return until the insert has
completed. The `mk_actions` function is called to generate a list of actions for the bulk insert. If `refresh` is set to
True, the index is refreshed after the bulk insert. The function does not return any value.
"""
helpers.bulk(
self.sync_client,
mk_actions(collection_id, processed_items),
refresh=refresh,
raise_on_error=False,
)

The bulk methods set raise_on_error=False but then ignore the returned errors, making these methods unsafe to use in production environment. If you don't want to deal with the errors, let them raise so it's possible to handle it in the client code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions