Open
Description
Hi IPEX team,
Thanks for excellent project, I'm having weird performance issue when using IPEX launch scripts in docker, that disabling thread affinity delivers 1.28x training speedup which is unexpected. Do you know the possible root cause?
Bellow is detailed information:
- system config: 4 nodes ICX, pytorch 1.10, ipex 1.10
- docker image based on intel oneapi-aikit
- training scripts:
python -m intel_extension_for_pytorch.cpu.launch --distributed --nproc_per_node=2 --nnodes=4 --hostfile hosts train.py
How I disable thread affinity:
# modify launch.py, comment out following code
self.set_env("KMP_AFFINITY", "granularity=fine,compact,1,0")
self.set_env("I_MPI_PIN_DOMAIN", mpi_pin_domain)
self.set_env("CCL_WORKER_AFFINITY", ccl_affinity)
Looking forward to your reply, thanks.