Opened 4 years ago
Closed 4 years ago
#62175 closed defect (fixed)
mpich @3.4 runtime error
Reported by: | derek-teaney | Owned by: | eborisch (Eric A. Borisch) |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | ports | Version: | 2.6.4 |
Keywords: | Cc: | ||
Port: | mpich |
Description
I install mpich and mpich-default smoothly, and compile a simple program "cpi.cc"
Then at runtime, I am getting a runtime error when I try to run the simplest mpi program. This seems related to the following mail on the mpich web page
https://lists.mpich.org/pipermail/discuss/2020-August/006031.html
I am not sure if this is a mac-ports problem. If not any advice is greatly appreciated.
This is the output
➜ MPI git:(master) ✗ mpiexec -np 2 ./a.out
Assertion failed in file src/mpid/ch4/netmod/ofi/ofi_init.c at line 1988: mapped_table[i] != FI_ADDR_NOTAVAIL Assertion failed in file src/mpid/ch4/netmod/ofi/ofi_init.c at line 1988: mapped_table[i] != FI_ADDR_NOTAVAIL 0 libpmpi.12.dylib 0x000000010bc55d24 MPL_backtrace_show + 52 1 libpmpi.12.dylib 0x000000010bbe1694 MPIR_Assert_fail + 36 2 libpmpi.12.dylib 0x000000010bc2af97 MPIDI_OFI_mpi_init_hook + 6487 3 libpmpi.12.dylib 0x000000010bc0b86f MPID_Init + 2383 4 libpmpi.12.dylib 0x000000010ba27cb4 MPIR_Init_thread + 228 5 libmpi.12.dylib 0x000000010b816a47 MPI_Init + 279 6 a.out 0x000000010b7b0bb8 main + 104 7 libdyld.dylib 0x00007fff684a8cc9 start + 1 0 libpmpi.12.dylib 0x0000000105f31d24 MPL_backtrace_show + 52 1 libpmpi.12.dylib 0x0000000105ebd694 MPIR_Assert_fail + 36 2 libpmpi.12.dylib 0x0000000105f06f97 MPIDI_OFI_mpi_init_hook + 6487 3 libpmpi.12.dylib 0x0000000105ee786f MPID_Init + 2383 4 libpmpi.12.dylib 0x0000000105d03cb4 MPIR_Init_thread + 228 5 libmpi.12.dylib 0x0000000104e3ea47 MPI_Init + 279 6 a.out 0x0000000104ddabb8 main + 104 7 libdyld.dylib 0x00007fff684a8cc9 start + 1 Abort(1) on node 0: Internal error Abort(1) on node 0: Internal error
Without the multiple process all runs smoothly
➜ MPI git:(master) ✗ mpiexec ./a.out
Process 0 on MacBook-Pro-5.local pi is approximately 3.1416009869231254, Error is 0.0000083333333323 wall clock time = 0.000071
I am running:
Catalina 10.15.5 (19F96), 2.2 GHz Quad-Core Intel Core i7
Change History (8)
comment:1 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)
Owner: | set to eborisch |
---|---|
Status: | new → assigned |
comment:2 Changed 4 years ago by eborisch (Eric A. Borisch)
comment:3 Changed 4 years ago by eborisch (Eric A. Borisch)
Reported upstream: https://github.com/pmodels/mpich/issues/5041
comment:4 Changed 4 years ago by eborisch (Eric A. Borisch)
Well, it's more complicated than just using mpich-clang10; I was running mpich-clang10 +tuned and some additional customizations of my own; I'm tracking down just what made it work.
comment:5 Changed 4 years ago by eborisch (Eric A. Borisch)
It looks (with two quick spot-checks) like installing +tuned makes it work.
comment:6 Changed 4 years ago by derek-teaney
Thanks, installing clang10 +tuned worked. The +tuned variant is not documented by the command port info mpich.
I did try to compile mpich directly from source "out of the box" with default tools, which means (as I learned) "/usr/bin/gcc = apples clang 12", and ran into the same runtime err.
I was unable to compile mpich with macports /opt/local/bin/gcc-mp-10, for the stupid reason that ./configure ran into a configure error complaining about -std=c99 not working (it does), but that is a discussion for another group.
comment:7 Changed 4 years ago by eborisch (Eric A. Borisch)
This should be resolved now (as of switch back to ch3.)
comment:8 Changed 4 years ago by eborisch (Eric A. Borisch)
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Huh. I can recreate this with mpich-default, but I typically use mpich-clang* (because I also want OpenMP support baked in), and it doesn't occur on mpich-clang10, to be sure.
Short term, I would recommend one of the mpich-clang* ports; long term, we may need to revisit if -default should use one of our provided clangs rather than the system one, as well as reporting upstream to see if we can get some resolution.