Skip to content

Provide an API to get GPU global free and used memory #522

Open
@notsyncing

Description

@notsyncing

Describe the issue

Hello, I'm trying to make accelerate works with ipex and device_map="auto", and in the source code utils/modeling.py of accelerate, it contains:

def get_max_memory(max_memory: Optional[Dict[Union[int, str], Union[int, str]]] = None):
    # ... (other codes) ...
            elif is_xpu_available():
                for i in range(torch.xpu.device_count()):
                    _ = torch.tensor(0, device=torch.device("xpu", i))
                max_memory = {i: torch.xpu.max_memory_allocated(i) for i in range(torch.xpu.device_count())}
    # ... (other codes) ...

The torch.xpu.max_memory_allocated only returns allocated memory in current program, so at startup it will always be a very small value, thus this device will be ignored.

According to the document (https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/api_doc.html), there is no way to know global allocated or free memory. Would you mind considering adding a method like torch.cuda.mem_get_info to directly get global memory information? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions