Skip to content

Implement caching for evaluated prompts #44

Closed
@abetlen

Description

@abetlen

The goal of this feature is to reduce latency for repeated calls to the chat_completion api by saving the kv_cache keyed by the prompt tokens.

The basic version of this is to simply save the kv_state after the prompt is generated.

Additionally we should investigate if it's possible save and restore the kv_state after the completion has been generated as well.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions