Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Chapel docs w.r.t. MPI contention #26676

Open
bradcray opened this issue Feb 7, 2025 · 0 comments
Open

Update Chapel docs w.r.t. MPI contention #26676

bradcray opened this issue Feb 7, 2025 · 0 comments

Comments

@bradcray
Copy link
Member

bradcray commented Feb 7, 2025

We generally recommend users use ssh-based, rather than mpirun-based, options when launching Chapel programs using conduits other than mpi (like ofi or ibv), the reason being that MPI can consume resources that can either hurt Chapel performance or simply lock GASNet out of being able to use the network. For example, we've had a few users on Omnipath networks hit the error:

*** FATAL ERROR (proc 0): in gasnetc_ofi_init() at /third-party/gasnet/gasnet-src/ofi-conduit/gasnet_ofi.c:1336: fi_endpoint for rdma failed: -22(Invalid argument)

Again, if using the ssh-spawner is an option, that is often the most straightforward path forward to avoid MPI overheads. However, if it is not an option for some reason, it is preferable to have mpirun utilize TCP/IP to avoid contention for key network resources.

In the Build-time Configuration section of GASNet's documentation in https://bitbucket.org/berkeleylab/gasnet/src/master/other/mpi-spawner/README, the GASNet developers list options that can be used at configuration or execution time to request that MPI do this. For example, when using OpenMPI, two options are to:

  • set OMPI_MCA_btl=tcp,self in the environment (good for a quick check, annoying to have to do every time)
  • pass --mca btl tcp,self to mpirun, for example by setting it as part of MPIRUN_CMD at GASNet configuration time (see third-party/gasnet/gasnet-src/mpi-conduit/README for more about setting MPIRUN_CMD

This issue is here to:

  • capture a TODO item to update our GASNet-related documentation, particularly when the mpi spawner is mentioned, to mention this concern, workarounds, and possibly even error message (to assist with Google searching)
  • capture the information itself (should someone be searching GitHub issues for the error message)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant