You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What you see above is a typical PyTorch training loop which uses four GPUs for data parallel training. Inside, we created an instance of `ZeusMonitor` and passed in a list of GPU indices to monitor. Then, using the monitor, we can measure the time and energy consumption of arbitrary execution _windows_ within the training script by pairing calls to `begin_window` and `end_window`. Multiple windows can overlap and nest in arbitrary ways without affecting the measurement of each, as long as their names are different.
66
64
67
65
`ZeusMonitor` adds very little overhead – typically single digit milliseconds – around the window. This allows `ZeusMonitor` to be used in various applications. For instance:
@@ -79,9 +77,8 @@ See our [blog post](https://ml.energy/blog/energy/measurement/measuring-gpu-ener
79
77
Let me introduce you to two of the energy optimizers provided by Zeus.
80
78
81
79
82
-
```
83
-
GlobalPowerLimitOptimizer
84
-
```
80
+
### GlobalPowerLimitOptimizer
81
+
85
82
86
83
87
84
GPUs allow users to configure its maximum power draw, called _power limit_. Typically, as you lower the GPU’s power limit from the default maximum, computation may get slightly slower, but you’ll save disproportionately more energy. The `GlobalPowerLimitOptimizer` in Zeus automatically finds the optimal GPU power limit globally across all GPUs.
In our familiar PyTorch training loop, we have instantiated `GlobalPowerLimitOptimizer` and passed it an instance of the `ZeusMonitor`, through which the optimizer sees the GPUs. Then, we just need to let the optimizer know about training progress (step and epoch boundaries), and the optimizer will transparently do all the necessary profiling and converge to the optimal power limit.
114
109
115
110
If you’re using the HuggingFace [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) or [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer), integration is even easier:
The `HFGlobalPowerLimitOptimizer` wraps `GlobalPowerLimitOptimizer` so that it automatically detects step and epoch boundaries. We have example integrations [here](https://github.com/ml-energy/zeus/tree/master/examples/huggingface), including running Gemma 7B supervised fine-tuning with QLoRA.
137
130
138
131
Now, we know how to integrate the optimizer, but what is the _optimal_ power limit? We know different users can have different preferences regarding trading off time and energy, so we allow users to specify an `OptimumSelector` (basically the [Strategy Pattern](https://en.wikipedia.org/wiki/Strategy_pattern)) to express their needs.
Some of the built-in strategies include “Minimize time” ([Time](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.Time), this might still reduce the power limit from the default since some workloads exhibit almost no slowdown even on lower power limits), “Minimize energy” ([Energy](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.Energy)), “Somewhere in between” ([ZeusCost](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.ZeusCost)), and “Minimize energy given maximum slowdown” ([MaxSlowdownConstraint](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.MaxSlowdownConstraint)). Users can also create their own optimum selectors as needed.
160
151
161
152
162
-
```
163
-
PipelineFrequencyOptimizer
164
-
```
165
-
153
+
### PipelineFrequencyOptimizer
166
154
167
155
The pipeline frequency optimizer, based on our research paper [Perseus](https://ml.energy/zeus/research_overview/perseus), is our latest work on energy optimization for large model training, like GPT-3. Perseus can reduce the energy consumption of large model training with no or negligible training throughput degradation. We’ll briefly talk about how.
0 commit comments