Open
Description
Affects: PythonCall
Describe the bug
This is a very quirky bug. I'm getting a segmentation fault when using python's gymnasium
package with multiple processes while a Flux model is loaded on the GPU.
Setup:
]add CondaPkg
]add PythonCall
]add Flux
]add CUDA
using CondaPkg
CondaPkg.add("gymnasium")
CondaPkg.add("swig")
CondaPkg.add("gymnasium-box2d")
CondaPkg.add("gymnasium-other")
Run (crash is non-deterministic, try running a few times on a machine with an NVIDIA GPU):
using Distributed
addprocs(12; env=["CUDA_HARD_MEMORY_LIMIT" => "5%", "CUDA_MEMORY_POOL"=>"none"])
@everywhere begin
using CUDA
using Flux
using CondaPkg
using PythonCall
function initialize_car_racing_env(_)
gym = pyimport("gymnasium")
x = Flux.Dense(512=>512) |> gpu
env = gym.make("CarRacing-v3")
obs, info = env.reset()
env.close()
return 1
end
end
for generation in 1:10_000
if generation % 100 == 0
println("Generation: $generation")
end
pmap(initialize_car_racing_env, 1:12)
end
Stack trace:
From worker 5:
From worker 5: [35654] signal 11: Segmentation fault
From worker 5: in expression starting at none:0
From worker 5: jl_gc_state_set at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia_threads.h:334 [inlined]
From worker 5: jl_gc_state_set at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia_threads.h:329 [inlined]
From worker 5: jl_gc_state_save_and_set at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia_threads.h:340
From worker 5: throw_internal_altstack at /cache/build/builder-demeter6-6/julialang/julia-master/src/task.c:755 [inlined]
From worker 5: ijl_sig_throw at /cache/build/builder-demeter6-6/julialang/julia-master/src/task.c:800
From worker 5: Allocations: 21901595 (Pool: 21900914; Big: 681); GC: 219
ERROR: Worker 5 terminated.LoadError:
ProcessExitedException(Unhandled Task ERROR: EOFError: read end of file
Stacktrace:
[1] (::Base.var"#wait_locked#832")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base ./stream.jl:970
[2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base ./stream.jl:978
[3] unsafe_read
@ ./io.jl:891 [inlined]
[4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base ./io.jl:890
[5] read!
@ ./io.jl:895 [inlined]
[6] deserialize_hdr_raw
@ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/messages.jl:167 [inlined]
[7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/sh
Your system
Please provide detailed information about your system:
- The operating system
5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- The version of Julia, Python, PythonCall, JuliaCall and any other affected packages
[052768ef] CUDA v5.5.2
[992eb4ea] CondaPkg v0.2.24
[587475ba] Flux v0.14.25
[6099a3de] PythonCall v0.9.23 `https://github.com/JuliaPy/PythonCall.jl.git#main`
[02a925ec] cuDNN v1.4.0
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 × AMD Ryzen Threadripper PRO 5975WX 32-Cores
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
Environment:
LD_LIBRARY_PATH =
CondaPkg Status /home/garbus/.julia/environments/v1.11/CondaPkg.toml
Environment
/home/garbus/.julia/environments/v1.11/.CondaPkg/env
Packages
gymnasium v1.0.0
gymnasium-box2d v1.0.0
gymnasium-other v1.0.0
swig v4.2.1
Additional context
I'm researching embodied AI and trying to use Julia's distributed capabilities to do so while still evaluating on python environments.