Skip to content

Commit df90e3b

Browse files
authored
Merge pull request #109 from yanboliang/readme
Update README to add link to Mixtral MoE folder
2 parents 89b2502 + a206186 commit df90e3b

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,23 @@ This is *NOT* intended to be a "framework" or "library" - it is intended to show
1414

1515
For an in-depth walkthrough of what's in this codebase, see this [blog post](https://pytorch.org/blog/accelerating-generative-ai-2/).
1616

17+
## Supported Models
18+
19+
### LLaMA family
20+
Please check the rest of this page about benchmark of LLaMA family models.
21+
22+
### Mixtral 8x7B
23+
We also supported [Mixtral 8x7B](https://mistral.ai/news/mixtral-of-experts/) which is a high-quality sparse mixture of experts (MoE) model, the average token generation rates are:
24+
25+
| | 1 GPU | 2 GPU | 4 GPU | 8 GPU |
26+
|------------------|---------|-----------|--------|------------|
27+
|baseline(bfloat16)| OOM | 78.75 | 118.23 | 203.69 |
28+
| int8 | 56.04 | 99.91 | 149.53 | 218.48 |
29+
30+
Note that the benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh topology. Note that all benchmarks are run at *batch size=1*, making the reported tokens/s numbers equivalent to "tokens/s/user". In addition, they are run with a very small prompt length (just 5 tokens).
31+
32+
For more details about Mixtral 8x7B, please check [this page](./mixtral-moe).
33+
1734
## Community
1835

1936
Projects inspired by gpt-fast in the community:

0 commit comments

Comments
 (0)