Closed
Description
Bug Description
2.6
Package Version
------------------------ ------------
nvidia-cuda-runtime-cu12 12.6.77
tensorrt 10.7.0.post1
tensorrt_cu12 10.7.0.post1
tensorrt-cu12-bindings 10.7.0.post1
tensorrt-cu12-libs 10.7.0.post1
torch 2.6.0+cu126
torch_tensorrt 2.6.0+cu126
torchvision 0.21.0+cu126
torch model timing:
Min=1.7192959785461426 ms, Mean=2.8140222126722336 ms, Max=10.684288024902344 ms
trt model timing:
Min=1.539072036743164 ms, Mean=1.8346313728809356 ms, Max=3.330048084259033 ms
2.7
Package Version
------------------------ ------------
nvidia-cuda-runtime-cu12 12.6.77
tensorrt 10.9.0.34
tensorrt_cu12 10.9.0.34
tensorrt_cu12_bindings 10.9.0.34
tensorrt_cu12_libs 10.9.0.34
torch 2.7.1+cu126
torch_tensorrt 2.7.0+cu126
torchvision 0.22.1+cu126
torch model timing:
Min=1.7500159740447998 ms, Mean=3.242773882341385 ms, Max=7.173120021820068 ms
trt model timing:
Min=2.742271900177002 ms, Mean=2.9178302402973175 ms, Max=4.151296138763428 ms
2.8.dev
Package Version
------------------------ ------------------------
nvidia-cuda-runtime-cu12 12.8.90
tensorrt 10.9.0.34
tensorrt_cu12 10.9.0.34
tensorrt_cu12_bindings 10.9.0.34
tensorrt_cu12_libs 10.9.0.34
torch 2.8.0.dev20250607+cu128
torch_tensorrt 2.8.0.dev20250607+cu128
torchvision 0.23.0.dev20250607+cu128
torch model timing:
Min=1.802240014076233 ms, Mean=2.7240907769203186 ms, Max=6.532224178314209 ms
trt model timing:
Min=2.7350399494171143 ms, Mean=2.89359746632576 ms, Max=3.839711904525757 ms
To Reproduce
import os
import tempfile
import numpy as np
import tensorrt
import torch
import torch_tensorrt
import torchvision.models as models
os.environ["CI_BUILD"] = "1"
torch.manual_seed(12345)
times = 5000
def benchmark(model: torch.nn.Module, inputs: list[torch.Tensor]) -> np.ndarray:
# Warm up
for i in range(3):
model(inputs[i])
torch.cuda.synchronize()
start_events = [torch.cuda.Event(enable_timing=True) for _ in range(times)]
end_events = [torch.cuda.Event(enable_timing=True) for _ in range(times)]
for i in range(times):
torch.cuda._sleep(1_000_000)
start_events[i].record()
model(inputs[i])
end_events[i].record()
torch.cuda.synchronize()
timings = [s.elapsed_time(e) for s, e in zip(start_events, end_events)]
return np.array(timings)
with torch.inference_mode():
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1).eval().cuda().half()
inputs = (torch.randn(1, 3, 224, 224, dtype=torch.half, device="cuda"),)
exported_program = torch.export.export(model, inputs)
trt_model = torch_tensorrt.dynamo.compile(
exported_program,
inputs,
enabled_precisions={torch.half},
debug=False,
min_block_size=1,
timing_cache_path=os.path.join(tempfile.gettempdir(), f"timing_cache_{tensorrt.__version__}.bin"),
)
inputs = [torch.randn(1, 3, 224, 224, dtype=torch.half, device="cuda") for _ in range(times)]
torch_timing = benchmark(model, inputs)
print("torch model timing:")
print(f"Min={torch_timing.min()} ms, Mean={torch_timing.mean()} ms, Max={torch_timing.max()} ms")
trt_timing = benchmark(trt_model, inputs)
print("trt model timing:")
print(f"Min={trt_timing.min()} ms, Mean={trt_timing.mean()} ms, Max={trt_timing.max()} ms")
Environment
- CPU Architecture: x64
- OS (e.g., Linux): Ubuntu 24.04 LTS
- How you installed PyTorch (
conda
,pip
,libtorch
, source): pip - Python version: 3.12.3
- GPU models and configuration: RTX 4060 Ti