Skip to content

GPU Memory Allocation with POT + pytorch #523

Closed
@atong01

Description

@atong01

Describe the bug

In version 0.9.1 (haven't checked other versions), when using torch distributed data parallel, importing POT allocates memory on GPU:0 for all processes. e.g. if DDP is running 4 ways then there are 4 extra allocations of memory on GPU:0 of ~800MB even when not using any torch-backed POT functions.

This was a very difficult bug to find as I did not expect importing a package where I'm not using any torch functionality to allocate GPU memory. Why does POT need to allocate GPU memory when using a backend? Even if using a backend actively, I would prefer a switch between using CPU and GPU if for instance I need that extra memory for model / data.

This is partially fixed by PR #520 and setting an environment variable, but I would greatly prefer it if either this did not happen, or at least gives a warning or message like POT is using {PACKAGE} backend: allocating GPU memory, set {XXX} to disable.

Related to issues #516 #382 and PR #520

To Reproduce

import torch
import pot when using DDP training in pytorch.

Expected behavior

Make it clear when POT is allocating GPU memory, only allocate when necessary, and attempt to allocate on the correct device when necessary.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions