Skip to content

test-backend-ops performance numbers incorrect #8898

Closed
@JohannesGaessler

Description

@JohannesGaessler

I noticed that for the CUDA backend using an RTX 3090 the reported achieved memory bandwidth for matrix multiplication can be much greater than 936 GB/s (the maximum of the hardware). Therefore, there must be some bug with how these numbers are calculated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions