diff --git a/_posts/2024-05-11-zeus.md b/_posts/2024-05-11-zeus.md index 46fa59549ffd..e861653a8d3b 100644 --- a/_posts/2024-05-11-zeus.md +++ b/_posts/2024-05-11-zeus.md @@ -60,8 +60,6 @@ measurement = monitor.end_window("training") print(f"Entire training: {measurement.time} s, {measurement.total_energy} J") ``` -<script src="https://gist.github.com/jaywonchung/f580b782ff0513374c6fa507d5e072a8.js"></script> - What you see above is a typical PyTorch training loop which uses four GPUs for data parallel training. Inside, we created an instance of `ZeusMonitor` and passed in a list of GPU indices to monitor. Then, using the monitor, we can measure the time and energy consumption of arbitrary execution _windows_ within the training script by pairing calls to `begin_window` and `end_window`. Multiple windows can overlap and nest in arbitrary ways without affecting the measurement of each, as long as their names are different. `ZeusMonitor` adds very little overhead – typically single digit milliseconds – around the window. This allows `ZeusMonitor` to be used in various applications. For instance: @@ -79,9 +77,8 @@ See our [blog post](https://ml.energy/blog/energy/measurement/measuring-gpu-ener Let me introduce you to two of the energy optimizers provided by Zeus. -``` -GlobalPowerLimitOptimizer -``` +### GlobalPowerLimitOptimizer + GPUs allow users to configure its maximum power draw, called _power limit_. Typically, as you lower the GPU’s power limit from the default maximum, computation may get slightly slower, but you’ll save disproportionately more energy. The `GlobalPowerLimitOptimizer` in Zeus automatically finds the optimal GPU power limit globally across all GPUs. @@ -108,8 +105,6 @@ for e in range(100): plo.on_epoch_end() ``` -<script src="https://gist.github.com/jaywonchung/1922ddd56b15f8764f2bdacc4a441109.js"></script> - In our familiar PyTorch training loop, we have instantiated `GlobalPowerLimitOptimizer` and passed it an instance of the `ZeusMonitor`, through which the optimizer sees the GPUs. Then, we just need to let the optimizer know about training progress (step and epoch boundaries), and the optimizer will transparently do all the necessary profiling and converge to the optimal power limit. If you’re using the HuggingFace [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) or [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer), integration is even easier: @@ -131,8 +126,6 @@ trainer = Trainer( ) ``` -<script src="https://gist.github.com/jaywonchung/69aa379dd9633a6a486cede1887cec2c.js"></script> - The `HFGlobalPowerLimitOptimizer` wraps `GlobalPowerLimitOptimizer` so that it automatically detects step and epoch boundaries. We have example integrations [here](https://github.com/ml-energy/zeus/tree/master/examples/huggingface), including running Gemma 7B supervised fine-tuning with QLoRA. Now, we know how to integrate the optimizer, but what is the _optimal_ power limit? We know different users can have different preferences regarding trading off time and energy, so we allow users to specify an `OptimumSelector` (basically the [Strategy Pattern](https://en.wikipedia.org/wiki/Strategy_pattern)) to express their needs. @@ -154,15 +147,10 @@ plo = GlobalPowerLimitOptimizer( ``` -<script src="https://gist.github.com/jaywonchung/1077b14bc7440b849be1f8320d4bf791.js"></script> - Some of the built-in strategies include “Minimize time” ([Time](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.Time), this might still reduce the power limit from the default since some workloads exhibit almost no slowdown even on lower power limits), “Minimize energy” ([Energy](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.Energy)), “Somewhere in between” ([ZeusCost](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.ZeusCost)), and “Minimize energy given maximum slowdown” ([MaxSlowdownConstraint](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.MaxSlowdownConstraint)). Users can also create their own optimum selectors as needed. -``` -PipelineFrequencyOptimizer -``` - +### PipelineFrequencyOptimizer The pipeline frequency optimizer, based on our research paper [Perseus](https://ml.energy/zeus/research_overview/perseus), is our latest work on energy optimization for large model training, like GPT-3. Perseus can reduce the energy consumption of large model training with no or negligible training throughput degradation. We’ll briefly talk about how.