Skip to content

Commit c3c94fe

Browse files
Add server example (#9918)
* Add server example. * Minor updates to README. * Add fixes after local testing. * Apply suggestions from code review Updates to README from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * More doc updates. * Maybe this will work to build the docs correctly? * Fix style issues. * Fix toc. * Minor reformatting. * Move docs to proper loc. * Fix missing tick. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Sync docs changes back to README. * Very minor update to docs to add space. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
1 parent 365a938 commit c3c94fe

File tree

6 files changed

+390
-0
lines changed

6 files changed

+390
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@
5555
- sections:
5656
- local: using-diffusers/overview_techniques
5757
title: Overview
58+
- local: using-diffusers/create_a_server
59+
title: Create a server
5860
- local: training/distributed_inference
5961
title: Distributed inference
6062
- local: using-diffusers/merge_loras
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
2+
# Create a server
3+
4+
Diffusers' pipelines can be used as an inference engine for a server. It supports concurrent and multithreaded requests to generate images that may be requested by multiple users at the same time.
5+
6+
This guide will show you how to use the [`StableDiffusion3Pipeline`] in a server, but feel free to use any pipeline you want.
7+
8+
9+
Start by navigating to the `examples/server` folder and installing all of the dependencies.
10+
11+
```py
12+
pip install .
13+
pip install -f requirements.txt
14+
```
15+
16+
Launch the server with the following command.
17+
18+
```py
19+
python server.py
20+
```
21+
22+
The server is accessed at http://localhost:8000. You can curl this model with the following command.
23+
```
24+
curl -X POST -H "Content-Type: application/json" --data '{"model": "something", "prompt": "a kitten in front of a fireplace"}' http://localhost:8000/v1/images/generations
25+
```
26+
27+
If you need to upgrade some dependencies, you can use either [pip-tools](https://github.com/jazzband/pip-tools) or [uv](https://github.com/astral-sh/uv). For example, upgrade the dependencies with `uv` using the following command.
28+
29+
```
30+
uv pip compile requirements.in -o requirements.txt
31+
```
32+
33+
34+
The server is built with [FastAPI](https://fastapi.tiangolo.com/async/). The endpoint for `v1/images/generations` is shown below.
35+
```py
36+
@app.post("/v1/images/generations")
37+
async def generate_image(image_input: TextToImageInput):
38+
try:
39+
loop = asyncio.get_event_loop()
40+
scheduler = shared_pipeline.pipeline.scheduler.from_config(shared_pipeline.pipeline.scheduler.config)
41+
pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline.pipeline, scheduler=scheduler)
42+
generator = torch.Generator(device="cuda")
43+
generator.manual_seed(random.randint(0, 10000000))
44+
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator))
45+
logger.info(f"output: {output}")
46+
image_url = save_image(output.images[0])
47+
return {"data": [{"url": image_url}]}
48+
except Exception as e:
49+
if isinstance(e, HTTPException):
50+
raise e
51+
elif hasattr(e, 'message'):
52+
raise HTTPException(status_code=500, detail=e.message + traceback.format_exc())
53+
raise HTTPException(status_code=500, detail=str(e) + traceback.format_exc())
54+
```
55+
The `generate_image` function is defined as asynchronous with the [async](https://fastapi.tiangolo.com/async/) keyword so that FastAPI knows that whatever is happening in this function won't necessarily return a result right away. Once it hits some point in the function that it needs to await some other [Task](https://docs.python.org/3/library/asyncio-task.html#asyncio.Task), the main thread goes back to answering other HTTP requests. This is shown in the code below with the [await](https://fastapi.tiangolo.com/async/#async-and-await) keyword.
56+
```py
57+
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator))
58+
```
59+
At this point, the execution of the pipeline function is placed onto a [new thread](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor), and the main thread performs other things until a result is returned from the `pipeline`.
60+
61+
Another important aspect of this implementation is creating a `pipeline` from `shared_pipeline`. The goal behind this is to avoid loading the underlying model more than once onto the GPU while still allowing for each new request that is running on a separate thread to have its own generator and scheduler. The scheduler, in particular, is not thread-safe, and it will cause errors like: `IndexError: index 21 is out of bounds for dimension 0 with size 21` if you try to use the same scheduler across multiple threads.

examples/server/README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
2+
# Create a server
3+
4+
Diffusers' pipelines can be used as an inference engine for a server. It supports concurrent and multithreaded requests to generate images that may be requested by multiple users at the same time.
5+
6+
This guide will show you how to use the [`StableDiffusion3Pipeline`] in a server, but feel free to use any pipeline you want.
7+
8+
9+
Start by navigating to the `examples/server` folder and installing all of the dependencies.
10+
11+
```py
12+
pip install .
13+
pip install -f requirements.txt
14+
```
15+
16+
Launch the server with the following command.
17+
18+
```py
19+
python server.py
20+
```
21+
22+
The server is accessed at http://localhost:8000. You can curl this model with the following command.
23+
```
24+
curl -X POST -H "Content-Type: application/json" --data '{"model": "something", "prompt": "a kitten in front of a fireplace"}' http://localhost:8000/v1/images/generations
25+
```
26+
27+
If you need to upgrade some dependencies, you can use either [pip-tools](https://github.com/jazzband/pip-tools) or [uv](https://github.com/astral-sh/uv). For example, upgrade the dependencies with `uv` using the following command.
28+
29+
```
30+
uv pip compile requirements.in -o requirements.txt
31+
```
32+
33+
34+
The server is built with [FastAPI](https://fastapi.tiangolo.com/async/). The endpoint for `v1/images/generations` is shown below.
35+
```py
36+
@app.post("/v1/images/generations")
37+
async def generate_image(image_input: TextToImageInput):
38+
try:
39+
loop = asyncio.get_event_loop()
40+
scheduler = shared_pipeline.pipeline.scheduler.from_config(shared_pipeline.pipeline.scheduler.config)
41+
pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline.pipeline, scheduler=scheduler)
42+
generator = torch.Generator(device="cuda")
43+
generator.manual_seed(random.randint(0, 10000000))
44+
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator))
45+
logger.info(f"output: {output}")
46+
image_url = save_image(output.images[0])
47+
return {"data": [{"url": image_url}]}
48+
except Exception as e:
49+
if isinstance(e, HTTPException):
50+
raise e
51+
elif hasattr(e, 'message'):
52+
raise HTTPException(status_code=500, detail=e.message + traceback.format_exc())
53+
raise HTTPException(status_code=500, detail=str(e) + traceback.format_exc())
54+
```
55+
The `generate_image` function is defined as asynchronous with the [async](https://fastapi.tiangolo.com/async/) keyword so that FastAPI knows that whatever is happening in this function won't necessarily return a result right away. Once it hits some point in the function that it needs to await some other [Task](https://docs.python.org/3/library/asyncio-task.html#asyncio.Task), the main thread goes back to answering other HTTP requests. This is shown in the code below with the [await](https://fastapi.tiangolo.com/async/#async-and-await) keyword.
56+
```py
57+
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator))
58+
```
59+
At this point, the execution of the pipeline function is placed onto a [new thread](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor), and the main thread performs other things until a result is returned from the `pipeline`.
60+
61+
Another important aspect of this implementation is creating a `pipeline` from `shared_pipeline`. The goal behind this is to avoid loading the underlying model more than once onto the GPU while still allowing for each new request that is running on a separate thread to have its own generator and scheduler. The scheduler, in particular, is not thread-safe, and it will cause errors like: `IndexError: index 21 is out of bounds for dimension 0 with size 21` if you try to use the same scheduler across multiple threads.

examples/server/requirements.in

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
torch~=2.4.0
2+
transformers==4.46.1
3+
sentencepiece
4+
aiohttp
5+
py-consul
6+
prometheus_client >= 0.18.0
7+
prometheus-fastapi-instrumentator >= 7.0.0
8+
fastapi
9+
uvicorn

examples/server/requirements.txt

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# This file was autogenerated by uv via the following command:
2+
# uv pip compile requirements.in -o requirements.txt
3+
aiohappyeyeballs==2.4.3
4+
# via aiohttp
5+
aiohttp==3.10.10
6+
# via -r requirements.in
7+
aiosignal==1.3.1
8+
# via aiohttp
9+
annotated-types==0.7.0
10+
# via pydantic
11+
anyio==4.6.2.post1
12+
# via starlette
13+
attrs==24.2.0
14+
# via aiohttp
15+
certifi==2024.8.30
16+
# via requests
17+
charset-normalizer==3.4.0
18+
# via requests
19+
click==8.1.7
20+
# via uvicorn
21+
fastapi==0.115.3
22+
# via -r requirements.in
23+
filelock==3.16.1
24+
# via
25+
# huggingface-hub
26+
# torch
27+
# transformers
28+
frozenlist==1.5.0
29+
# via
30+
# aiohttp
31+
# aiosignal
32+
fsspec==2024.10.0
33+
# via
34+
# huggingface-hub
35+
# torch
36+
h11==0.14.0
37+
# via uvicorn
38+
huggingface-hub==0.26.1
39+
# via
40+
# tokenizers
41+
# transformers
42+
idna==3.10
43+
# via
44+
# anyio
45+
# requests
46+
# yarl
47+
jinja2==3.1.4
48+
# via torch
49+
markupsafe==3.0.2
50+
# via jinja2
51+
mpmath==1.3.0
52+
# via sympy
53+
multidict==6.1.0
54+
# via
55+
# aiohttp
56+
# yarl
57+
networkx==3.4.2
58+
# via torch
59+
numpy==2.1.2
60+
# via transformers
61+
packaging==24.1
62+
# via
63+
# huggingface-hub
64+
# transformers
65+
prometheus-client==0.21.0
66+
# via
67+
# -r requirements.in
68+
# prometheus-fastapi-instrumentator
69+
prometheus-fastapi-instrumentator==7.0.0
70+
# via -r requirements.in
71+
propcache==0.2.0
72+
# via yarl
73+
py-consul==1.5.3
74+
# via -r requirements.in
75+
pydantic==2.9.2
76+
# via fastapi
77+
pydantic-core==2.23.4
78+
# via pydantic
79+
pyyaml==6.0.2
80+
# via
81+
# huggingface-hub
82+
# transformers
83+
regex==2024.9.11
84+
# via transformers
85+
requests==2.32.3
86+
# via
87+
# huggingface-hub
88+
# py-consul
89+
# transformers
90+
safetensors==0.4.5
91+
# via transformers
92+
sentencepiece==0.2.0
93+
# via -r requirements.in
94+
sniffio==1.3.1
95+
# via anyio
96+
starlette==0.41.0
97+
# via
98+
# fastapi
99+
# prometheus-fastapi-instrumentator
100+
sympy==1.13.3
101+
# via torch
102+
tokenizers==0.20.1
103+
# via transformers
104+
torch==2.4.1
105+
# via -r requirements.in
106+
tqdm==4.66.5
107+
# via
108+
# huggingface-hub
109+
# transformers
110+
transformers==4.46.1
111+
# via -r requirements.in
112+
typing-extensions==4.12.2
113+
# via
114+
# fastapi
115+
# huggingface-hub
116+
# pydantic
117+
# pydantic-core
118+
# torch
119+
urllib3==2.2.3
120+
# via requests
121+
uvicorn==0.32.0
122+
# via -r requirements.in
123+
yarl==1.16.0
124+
# via aiohttp

0 commit comments

Comments
 (0)