Skip to content

Commit 4bc443d

Browse files
committed
Blog post edit
Signed-off-by: Chris Abraham <cjyabraham@gmail.com>
1 parent d900d02 commit 4bc443d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2024-06-23-training-moes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: "Training MoEs at Scale with PyTorch"
44
author: Brian Chu, Mihir Patel, Less Wright, Vitaliy Chiley, Evan Racah, Wanchao Liang, Iris Zhang, Andrew Gu
55
---
66

7-
Over the past year, Mixture of Experts (MoE) models have surged in popularity, fueled by powerful open-source models like [DBRX](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm), [Mixtral](https://mistral.ai/news/mixtral-of-experts/), [DeepSeek](https://github.com/deepseek-ai/DeepSeek-V2), and many more. In this blog post, we’ll talk about how we scale to over three thousand GPUs using [PyTorch Distributed](https://pytorch.org/tutorials/beginner/dist_overview.html) and [MegaBlocks](https://github.com/databricks/megablocks), an efficient open-source MoE implementation in PyTorch.
7+
Over the past year, Mixture of Experts (MoE) models have surged in popularity, fueled by powerful open-source models like [DBRX](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm), [Mixtral](https://mistral.ai/news/mixtral-of-experts/), [DeepSeek](https://github.com/deepseek-ai/DeepSeek-V2), and many more. At Databricks, we've worked closely with the PyTorch team to scale training of MoE models. In this blog post, we’ll talk about how we scale to over three thousand GPUs using [PyTorch Distributed](https://pytorch.org/tutorials/beginner/dist_overview.html) and [MegaBlocks](https://github.com/databricks/megablocks), an efficient open-source MoE implementation in PyTorch.
88

99

1010
## What is a MoE?

0 commit comments

Comments
 (0)