@@ -761,6 +761,8 @@ curl http://localhost:8080/v1/chat/completions \
761
761
762
762
# ## POST `/v1/embeddings`: OpenAI-compatible embeddings API
763
763
764
+ This endpoint requires that the model uses a pooling different than type `none`.
765
+
764
766
*Options:*
765
767
766
768
See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-reference/embeddings).
@@ -793,7 +795,45 @@ See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-r
793
795
}'
794
796
` ` `
795
797
796
- When `--pooling none` is used, the server will output an array of embeddings - one for each token in the input.
798
+ # ## POST `/embeddings`: non-OpenAI-compatible embeddings API
799
+
800
+ This endpoint supports `--pooling none`. When used, the responses will contain the embeddings for all input tokens.
801
+ Note that the response format is slightly different than `/v1/embeddings` - it does not have the `"data"` sub-tree and the
802
+ embeddings are always returned as vector of vectors.
803
+
804
+ *Options:*
805
+
806
+ Same as the `/v1/embeddings` endpoint.
807
+
808
+ *Examples:*
809
+
810
+ Same as the `/v1/embeddings` endpoint.
811
+
812
+ **Response format**
813
+
814
+ ` ` ` json
815
+ [
816
+ {
817
+ "index": 0,
818
+ "embedding": [
819
+ [ ... embeddings for token 0 ... ],
820
+ [ ... embeddings for token 1 ... ],
821
+ [ ... ]
822
+ [ ... embeddings for token N-1 ... ],
823
+ ]
824
+ },
825
+ ...
826
+ {
827
+ "index": P,
828
+ "embedding": [
829
+ [ ... embeddings for token 0 ... ],
830
+ [ ... embeddings for token 1 ... ],
831
+ [ ... ]
832
+ [ ... embeddings for token N-1 ... ],
833
+ ]
834
+ }
835
+ ]
836
+ ` ` `
797
837
798
838
# ## GET `/slots`: Returns the current slots processing state
799
839
0 commit comments