#70859 closed defect (invalid)
gmp @6.3.0: tests fail when built with clang on Intel only, but pass when assembly is disabled. Forcing ld_classic appears to fix the issue.
Reported by: | haberg-1 (Hans Åberg) | Owned by: | MarcusCalhoun-Lopez (Marcus Calhoun-Lopez) |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | ports | Version: | |
Keywords: | ventura | Cc: | eric-j-ason, cooljeanius (Eric Gallager), cjones051073 (Chris Jones), markmentovai (Mark Mentovai) |
Port: | gmp |
Description (last modified by ryandesign (Ryan Carsten Schmidt))
GMP 'make check' fails when built with later versions of Clang, but passes when built with GCC, so that should be the build dependency. Tried on MacOS 14. See:
https://gmplib.org/list-archives/gmp-bugs/2024-June/005505.html
https://gmplib.org/list-archives/gmp-bugs/2024-July/005506.html
Attachments (1)
Change History (68)
comment:1 follow-up: 17 Changed 8 weeks ago by kencu (Ken)
comment:2 Changed 8 weeks ago by kencu (Ken)
there is a newer version of gmp here, not yet released:
https://gmplib.org/download/snapshot/gmp-next/gmp-6.3.0-20240515185115.tar.zst
The arm64 build test summary looks good there too (I included the cxx tests in this test run):
============================================================================ Testsuite summary for GNU MP 6.3.0 ============================================================================ # TOTAL: 22 # PASS: 22 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0
I'll try the x64_64 run next.
comment:3 Changed 8 weeks ago by kencu (Ken)
you can force an arm64 Mac into Intel mode like this:
gmp-6.3.0 % arch -arch x86_64 zsh
and from then on, the system thinks it's running on an Intel processor:
% ./configure --enable-cxx && make -j 10 checking build system type... westmere-apple-darwin24.0.0 checking host system type... westmere-apple-darwin24.0.0 checking for a BSD-compatible install... /usr/bin/install -c ...
as per the ticket, there are certainly plenty of errors with the x86_64 build. This was the last bit that printed:
============================================================================ Testsuite summary for GNU MP 6.3.0 ============================================================================ # TOTAL: 53 # PASS: 20 # SKIP: 1 # XFAIL: 0 # FAIL: 32 # XPASS: 0 # ERROR: 0 ============================================================================ See tests/mpn/test-suite.log
disabling assembly on Intel fixes all of the errors (this was suggested by a gmp developer but I don't see that anyone ever tested it):
% ./configure --enable-cxx --disable-assembly && make -j 10
============================================================================ Testsuite summary for GNU MP 6.3.0 ============================================================================ # TOTAL: 22 # PASS: 22 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0
So the error is in the way the handwritten gmp assembly files are being handled / compiled by clang. I suspect it is related to the ALIGN(8) above, so I'll see if changing that makes any difference. Here, we might be in tricky territory, however.
I haven't tried a gcc build as yet.
comment:4 Changed 8 weeks ago by kencu (Ken)
I tried changing the ALIGN(8) to ALIGN(16) in two places in mpn/x86_64/bdiv_q_1.asm
and that did nothing to improve the test errors.
There are a lot of assembly files, it turns out, that set ALIGN(8) -- perhaps others need fixing, or perhaps we are barking up the wrong tree altogether.
More to sort out.
You can't really set gmp to build with gcc14, as gmp is a build dependency of gcc14. <https://ports.macports.org/port/gcc14/details/>. So really, this has to be properly sorted out by upstream gmp, and just saying "clang is hosed" or "Apple is silly" is not really going to be the solution.
For now -- I would personally lean towards disabling assembly for Intel builds of gmp until they sort it out.
MacPorts has lots of smart people around -- perhaps someone knows enough x86_64 assembly to see what gmp is doing to make these errors happen.
comment:5 Changed 8 weeks ago by kencu (Ken)
Now unfortunately it is currently difficult to build with gcc in MacPorts (or HomeBrew I believe) targeting a non-native arch. The gcc compilers installed by either package manager only compile for the native arch, and current gcc cross-compilers have not been properly set up.
So there is no way for me to build gmp using gcc on this arm64 Mac as an Intel build to test it.
I would have to build a custom intel cross compiling gcc to do that -- which is actually not all that hard, just have to do it manually.
Let me see if any of my Intel machines can run a new enough MacOS to demonstrate the problem. They are generally running OpenCore now, but even with that, I don't think I can get to a new enough OS to show this issue.
comment:6 Changed 8 weeks ago by ryandesign (Ryan Carsten Schmidt)
Description: | modified (diff) |
---|---|
Owner: | set to MarcusCalhoun-Lopez |
Status: | new → assigned |
Summary: | GMP build with GCC, not Clang → gmp @6.3.0: tests fail when built with clang |
comment:7 Changed 8 weeks ago by kencu (Ken)
Summary: | gmp @6.3.0: tests fail when built with clang → gmp @6.3.0: tests fail when built with clang on Intel only |
---|
comment:8 Changed 8 weeks ago by kencu (Ken)
Summary: | gmp @6.3.0: tests fail when built with clang on Intel only → gmp @6.3.0: tests fail when built with clang on Intel only, but pass when assembly is disabled |
---|
comment:9 Changed 8 weeks ago by kencu (Ken)
As a data point, I tried a quick test on 10.6 Intel using clang-15 to build gmp, including the assembly files, and it was a 100% pass there.
============================================================================ Testsuite summary for GNU MP 6.3.0 ============================================================================ # TOTAL: 22 # PASS: 22 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0
comment:10 Changed 7 weeks ago by haberg-1 (Hans Åberg)
You can set gmp to depend on gcc14 which now installs; cf. #70866: The GMP developers primarily focus on GCC, so it is safest to have as a dependency.
comment:11 Changed 7 weeks ago by eric-j-ason
Cc: | eric-j-ason added |
---|
comment:12 Changed 7 weeks ago by kencu (Ken)
comment:13 Changed 7 weeks ago by kencu (Ken)
the handwritten intel assembly files in gmp need to be updated.
comment:14 Changed 7 weeks ago by haberg-1 (Hans Åberg)
Such recursive dependencies may require recursive builds (I have done such by hand). The GMP developers do not have much incentive doing Clang workarounds, so don't expect a fix to the assembly code anytime soon.
It might be possible to build GCC with a for the build only GMP version in turn built using Clang with the assembly code turned off, assuming 'make check' passes, and then the public GMP is built with GCC, and assembly turned on. The assembly is only there to boost performance on certain platforms, so it should make no difference other than that the GCC build becomes somewhat slower.
comment:15 Changed 7 weeks ago by kencu (Ken)
they will have to fix gmp as the primary xcode compiler needs to be able to build it. someone who has a current MacOS system and knows how to debug needs to help them.
we can’t build it with gcc for many reasons. Deps is one. No universal gcc builds is another.
gmp is a core project…it won’t take long once someone skilled gets involved, which is likely to be soon.
until then, simply disable asm on Intel. current compilers optimize so well the handwritten assembly most likely adds little anyway.
comment:16 Changed 7 weeks ago by haberg-1 (Hans Åberg)
Fixing GMP for regular Clang is difficult and time consuming, that is why they do not do it, and it gets worse with Apple Clang, an hacked and old version, which is entirely off the list: GCC is the primary compiler for GNU projects, all else extras.
A public version of GMP with assembly turned off means that one cannot recommend using it, as it is used in numerics requiring high performance.
comment:17 Changed 7 weeks ago by eric-j-ason
Replying to kencu:
I realize, looking at the above message and <https://gmplib.org/list-archives/gmp-bugs/2024-June/005494.html>, there is an alignment error with the x86_64 assembly code that now flags on MacOS as recent MacOS versions are intolerant of unaligned pointers.
Would the code even work with such an error present?
comment:18 Changed 7 weeks ago by haberg-1 (Hans Åberg)
It may be a Clang issue, its developers ditching the C/C++ standards in favor of promoting mainstream programming, whereas the GMP developers are focused on optimizations that do not fall into that picture.
You might take up the issue on the GMP bugs list: https://gmplib.org/mailman/listinfo/gmp-bugs
comment:19 Changed 7 weeks ago by haberg-1 (Hans Åberg)
I am checking on the GCC Help list if they have some suggestions: https://gcc.gnu.org/pipermail/gcc-help/2024-September/143751.html
comment:20 Changed 7 weeks ago by haberg-1 (Hans Åberg)
There is a download_prerequisites script, that will build and link GMP statically as a part of the GCC build: https://gcc.gnu.org/wiki/InstallingGCC
Also see: https://gcc.gnu.org/pipermail/gcc-help/2024-September/143754.html
comment:21 Changed 7 weeks ago by cooljeanius (Eric Gallager)
Cc: | cooljeanius added |
---|
comment:22 Changed 6 weeks ago by kencu (Ken)
Let me see if I can update one of my Intel machines to a version that shows the error. If I can, perhaps we can sort out what the issue is with their assembly files.
In the modern world of compiler optimizations, it is not a given that their assembly is actually any faster than the optimized fallback code. You’d have to benchmark it to know for sure.
comment:23 Changed 6 weeks ago by kencu (Ken)
macports already knows how to make a gcc with an embedded gmp, by the way.
We just don’t want to do that for the mainstream gcc ports.
And building the gmp port with gcc would mean it can’t be universal, which we don’t want either.
best to sort this out properly, as the whole world needs gmp to build properly with the primary macos compiler, not just macports.
comment:24 Changed 6 weeks ago by kencu (Ken)
in the meantime, disabling asm on Intel builds for the newest systems is a pretty trivial fix.
I know what you mean about a possible speed issue..
it looks like you can benchmark gmp like this:
would you like to do that with asm (gcc built) and without asm (clang built) and compare them so we can see what the cost really is?
comment:25 Changed 6 weeks ago by kencu (Ken)
just as another data point, the gcc10-bootstrap port does not build on arm64 Sequoia. So that pathway is unavailable.
comment:26 Changed 6 weeks ago by kencu (Ken)
MacOS 12.7 Intel with Xcode clang (1400.x)and asm enabled passes all tests.
% clang -v Apple clang version 14.0.0 (clang-1400.0.29.202) Target: x86_64-apple-darwin21.6.0
comment:27 Changed 6 weeks ago by kencu (Ken)
on MacOS 12.7 Intel, building with clang-18 and asm enabled passes all tests.
comment:28 Changed 6 weeks ago by kencu (Ken)
(aside for later: might be an idea to try the classic linker on Sequoia.)
comment:29 Changed 6 weeks ago by kencu (Ken)
it looks like using the classic linker might be the fix, for now.
If we do this:
export LDFLAGS='-Wl,-ld_classic'
then I can build and run gmp on a current system in Intel mode, with all the assembly enabled, and it passes all the tests.
Why this is the fix is a question to be answered later. I still think it will turn out to be an alignment thing, but smarter people than me can sort it out.
Please give that a try on your machine as well. It's pretty easy to add that flag to the gmp build.
THis is a much better fix than trying to build it with gcc.
(NB. Up until 3 days ago, gcc14 was configured to always use ld_classic -- so it may be that the reason gcc worked was because it was using ld_classic).
comment:30 Changed 6 weeks ago by kencu (Ken)
Summary: | gmp @6.3.0: tests fail when built with clang on Intel only, but pass when assembly is disabled → gmp @6.3.0: tests fail when built with clang on Intel only, but pass when assembly is disabled. Forcing ld_classic appears to fix the issue. |
---|
comment:31 Changed 6 weeks ago by kencu (Ken)
I have been trying to show that gcc builds gmp incorrectly when using the standard linker too, just like clang does...
but no matter what I do, I can't seem to force gcc14 to use the standard linker. It always uses ld_classic, eg:
libtool: link: gcc-mp-14 -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem -march=nehalem -o .libs/t-toom6h t-toom6h.o ../../tests/.libs/libtests.a /Users/cunningh/gmp-6.3.0/.libs/libgmp.dylib ../../.libs/libgmp.dylib ld: warning: -ld_classic is deprecated and will be removed in a future release
I thought gcc had been changed recently to no longer use ld_classic -- that is what the commits said
https://github.com/macports/macports-ports/commits/master/lang/gcc14/Portfile
but there have been some fixups and refixups in gcc and I"m not sure just now what it is doing.
what I do know is that these gcc14 versions:
% port -v installed gcc14 libgcc14 The following ports are currently installed: gcc14 @14.2.0_3+stdlib_flag (active) requested_variants='' platform='darwin 23' archs='x86_64' date='2024-09-30T22:25:34-0700' libgcc14 @14.2.0_3+stdlib_flag (active) requested_variants='' platform='darwin 23' archs='x86_64' date='2024-09-30T22:25:29-0700'
which are current, are using ld_classic as above.
comment:32 Changed 6 weeks ago by kencu (Ken)
Oh, I bet the CLTs and XCode on the buildbots haven't been updated to 16 yet.
That's why we're still using ld_classic.
Let me built gcc from source.
comment:33 follow-up: 40 Changed 6 weeks ago by haberg-1 (Hans Åberg)
It may be the removal from GCC of the ld option -ld_classic that causes the 'make check' fails, as this option is deprecated on MacOS 15. See: https://trac.macports.org/ticket/70951
With 'make check' tests on gmp-6.3.0 using both port clang-18 and gcc14, on MacOS 15, the fails are the same.
I have reported this on the GMP Bugs list: https://gmplib.org/list- archives/gmp-bugs/2024-October/005537.html
comment:34 follow-up: 39 Changed 6 weeks ago by cjones051073 (Chris Jones)
The changes here
https://github.com/macports/macports-ports/commit/2453011ee18c25153b716a2ae42bed85ed52752a
only remove the explicit reference to the classic linker option on the Macports for Xcode 16 or newer. Internally, GCC still knows about the option and uses it, and removing that will require an upstream GCC fix.
comment:35 Changed 6 weeks ago by kencu (Ken)
OK, confirmed. This issue has absolutely nothing to do with building with either gcc or clang, and is 100% related to ld_classic vs ld_prime.
If gcc14 is built to use the new linker, and gmp is built with gcc14, then gmp fails every bit as badly as it does when built with clang and the new linker:
/bin/sh ../../libtool --tag=CC --mode=link gcc-mp-14 -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem -march=nehalem -no-install -o t-gcdext_1 t-gcdext_1.o ../../tests/libtests.la ../../libgmp.la libtool: warning: '-no-install' is ignored for westmere-apple-darwin23.6.0 libtool: warning: assuming '-no-fast-install' instead libtool: link: gcc-mp-14 -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem -march=nehalem -o .libs/t-gcdext_1 t-gcdext_1.o ../../tests/.libs/libtests.a /Users/cunningh/gmp-6.3.0/.libs/libgmp.dylib ../../.libs/libgmp.dylib /Applications/Xcode.app/Contents/Developer/usr/bin/make check-TESTS PASS: t-asmtype PASS: t-aors_1 ../../test-driver: line 107: 36828 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-divrem_1 PASS: t-mod_1 ../../test-driver: line 107: 36866 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-fat PASS: t-get_d PASS: t-instrument PASS: t-iord_u PASS: t-mp_bases PASS: t-perfsqr PASS: t-scan PASS: logic ../../test-driver: line 107: 37022 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom22 ../../test-driver: line 107: 37041 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom32 ../../test-driver: line 107: 37060 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-toom33 ../../test-driver: line 107: 37079 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-toom42 ../../test-driver: line 107: 37098 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom43 ../../test-driver: line 107: 37117 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom44 ../../test-driver: line 107: 37136 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-toom52 ../../test-driver: line 107: 37155 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom53 ../../test-driver: line 107: 37174 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom54 ../../test-driver: line 107: 37193 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom62 ../../test-driver: line 107: 37212 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom63 ../../test-driver: line 107: 37231 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-toom6h ../../test-driver: line 107: 37250 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-toom8h PASS: t-toom2-sqr PASS: t-toom3-sqr PASS: t-toom4-sqr ../../test-driver: line 107: 37326 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-toom6-sqr ../../test-driver: line 107: 37345 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-toom8-sqr ../../test-driver: line 107: 37364 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-div ../../test-driver: line 107: 37383 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-mul ../../test-driver: line 107: 37402 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-mullo ../../test-driver: line 107: 37421 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-sqrlo ../../test-driver: line 107: 37440 Trace/BPT trap: 5 "$@" > $log_file 2>&1 FAIL: t-mulmod_bnm1 ../../test-driver: line 107: 37459 Trace/BPT trap: 5 "$@" > $log_file 2>&1 FAIL: t-sqrmod_bnm1 PASS: t-mulmid ../../test-driver: line 107: 37497 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-mulmod_bknp1 ../../test-driver: line 107: 37516 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-sqrmod_bknp1 SKIP: t-addaddmul ../../test-driver: line 107: 37554 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-hgcd ../../test-driver: line 107: 37573 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-hgcd_appr ../../test-driver: line 107: 37592 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-matrix22 ../../test-driver: line 107: 37611 Trace/BPT trap: 5 "$@" > $log_file 2>&1 FAIL: t-invert ../../test-driver: line 107: 37630 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-bdiv ../../test-driver: line 107: 37649 Illegal instruction: 4 "$@" > $log_file 2>&1 FAIL: t-fib2m PASS: t-broot PASS: t-brootinv PASS: t-minvert ../../test-driver: line 107: 37725 Segmentation fault: 11 "$@" > $log_file 2>&1 FAIL: t-sizeinbase PASS: t-gcd_11 PASS: t-gcd_22 PASS: t-gcdext_1 ============================================================================ Testsuite summary for GNU MP 6.3.0 ============================================================================ # TOTAL: 53 # PASS: 20 # SKIP: 1 # XFAIL: 0 # FAIL: 32 # XPASS: 0 # ERROR: 0 ============================================================================ See tests/mpn/test-suite.log Please report to gmp-bugs@gmplib.org (see https://gmplib.org/manual/Reporting-Bugs.html) ============================================================================ make[5]: *** [test-suite.log] Error 1 make[4]: *** [check-TESTS] Error 2 make[3]: *** [check-am] Error 2 make[2]: *** [check-recursive] Error 1 make[1]: *** [check-recursive] Error 1 make: *** [check] Error 2 cunningh@macpro gmp-6.3.0 %
comment:36 Changed 6 weeks ago by cjones051073 (Chris Jones)
Cc: | cjones051073 added |
---|
comment:37 Changed 6 weeks ago by kencu (Ken)
Cc: | cjones051073 removed |
---|
This of course is a well known issue:
https://www.scivision.dev/xcode-ld_classic/
So the fix for gmp is to use ld_classic, whatever the compiler, and then the Intel assembly will be properly done.
We can stop talking about forcing builds with gcc.
I will push this through.
comment:38 Changed 6 weeks ago by kencu (Ken)
Cc: | cjones051073 added |
---|
comment:39 Changed 6 weeks ago by kencu (Ken)
Replying to cjones051073:
Internally, GCC still knows about the option and uses it, and removing that will require an upstream GCC fix.
A change we certainly hope nobody suggests to upstream or tries to implement any time soon, until all these projects get sorted out with the new linker.
comment:40 follow-up: 42 Changed 6 weeks ago by kencu (Ken)
Replying to haberg-1:
It may be the removal from GCC of the ld option -ld_classic that causes the 'make check' fails,
No, it's not that.
The new linker is doing something different with the Intel assembly than ld_classic did, and that is what is making "make check" fail.
Edit: Sorry, I misunderstood you. Yes, it is exactly that gcc14 now doesn't use ld_classic that is now causing "make check" to fail when gmp is built with gcc14. Exactly that.
comment:41 Changed 6 weeks ago by cjones051073 (Chris Jones)
My point was more when Apple completely removes the classic option (and now they have officially depreciated it I would say its on the cards from Xcode 17 onwards) GCC will have to adapt at that point. The sooner they start planing for this the better.
comment:42 Changed 6 weeks ago by cjones051073 (Chris Jones)
Replying to kencu:
Replying to haberg-1:
It may be the removal from GCC of the ld option -ld_classic that causes the 'make check' fails,
No, it's not that.
The new linker is doing something different with the Intel assembly than ld_classic did, and that is what is making "make check" fail.
Edit: Sorry, I misunderstood you. Yes, it is exactly that gcc14 now doesn't use ld_classic that is now causing "make check" to fail when gmp is built with gcc14. Exactly that.
Perhaps turning it off for Xcode 16 already is a bit early. I could be persuaded to roll back a bit the change in GCC14 and only limit it for Xcode 17 or newer.
comment:43 Changed 6 weeks ago by kencu (Ken)
indeed -- hopefully folks like us can get gmp and similar projects up to speed.
Unfortunately right now gmp etc consider this everyone else's problem to fix -- and maybe it is. I don't know what is exactly causing the new linker to generate these errors, and whose error it is to fix,exactly.
comment:44 Changed 6 weeks ago by cjones051073 (Chris Jones)
... But then we would be back to having to live with the linker warning, which causes issues in itself in some cases.
comment:45 Changed 6 weeks ago by kencu (Ken)
Indeed so -- we need someone like Jeremy around here with close ties to Apple to sort out whether this is a linker bug or a gmp bug...
comment:46 Changed 6 weeks ago by kencu (Ken)
I don't even know where to usefully report this. Opening RADARs is always such a black hole, it seems...
comment:47 Changed 6 weeks ago by cjones051073 (Chris Jones)
Jeremy has sadly been MIA for awhile now.
comment:48 Changed 6 weeks ago by cjones051073 (Chris Jones)
I can almost guarantee you filing an Apple radar about a linker issue specific to GMP (GPL-3 code) will go absolutely no where.
comment:49 Changed 6 weeks ago by cjones051073 (Chris Jones)
b.t.w. is not blacklisting Xcode clang for 16 and above and just falling back to using a macports clang build (18 say) not a viable workaround for now as well ?
comment:50 Changed 6 weeks ago by kencu (Ken)
I think it's not the compiler at all.
Just the linker that is chosen, whatever the compiler might be.
(The asm files are just assembled anyway, and passed to the linker, and the compiler is doing very little here).
comment:51 Changed 6 weeks ago by cjones051073 (Chris Jones)
True, but if by blacklisting Xcode clang and using a macports version, you also side step using Xcode provied linker and use the one for that clang build, you by-pass the issue. Did you not say above GMP builds fine if you use clang-18 ? What linker is actually used in that case ?
comment:52 Changed 6 weeks ago by kencu (Ken)
You are correct that a linker is installed with all our recent clang ports. However, it is not used, by default at least. I believe that even on the most recent llvm builds that linker is not considered "ready for prime time". I believe the new xcode linker (which is causing our troubles here) is a fork of that project, however.
The linker used by default by macports-clang versions continues to be the one pointed to by the ld64 port with it's shim in ${prefix}/bin/ld
, and most often that points to ld64 +xcode
to pick up the xcode-supplied linker.
I did mention that on MacOS 12.7 gmp built without troubles using clang-18 to build it. MacOS 12.7 uses the xcode linker, which at that stage of things is equivalent to "ld_classic". I was using that example more to support the idea that it wasn't something with newer clangs that was causing the build errors, it was something else (like the linker).
comment:53 Changed 6 weeks ago by markmentovai (Mark Mentovai)
Cc: | markmentovai added |
---|
comment:54 Changed 6 weeks ago by markmentovai (Mark Mentovai)
(MacPorts-specific: This is a message that I’m trying to post to gmp-bugs@…, but it hasn’t landed there yet. There is nothing wrong with any compiler, either in Xcode or MacPorts. There is a bug in Apple’s new linker and it can occur using any compiler, but it’s not a bug that gmp needs to suffer, and it’s possible to avoid the bug without opting for the deprecated linker.)
If you read nothing else, read this:
gmp-6.3.0 ships libtool-2.4.6 (2015-02-16). Update to libtool-2.4.7 (2022-03-17) to solve this problem.
Details:
There does appear to be a bug in Apple’s new linker (ld-new or ld-prime) when targeting x86_64, producing a Mach-O dynamic library (clang -dynamiclib
), and using the flat namespace option (-flat_namespace
). I observed this as a variety of crashes in make check
. I investigated t-bdiv raising SIGILL
in particular:
% lldb tests/mpn/.libs/t-bdiv (lldb) target create "tests/mpn/.libs/t-bdiv" Current executable set to '…/gmp-6.3.0.build/tests/mpn/.libs/t-bdiv' (x86_64). (lldb) env DYLD_LIBRARY_PATH=.libs (lldb) run Process 19802 launched: '…/gmp-6.3.0.build/tests/mpn/.libs/t-bdiv' (x86_64) Process 19802 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) frame #0: 0x00000001000de806 libgmp.10.dylib`__gmpn_sub_n + 3 Target 0: (t-bdiv) stopped. (lldb) disassemble libgmp.10.dylib`: 0x1000de803 <+0>: jmpq *0x11ecf(%rip) ; (void *)0x00000001000a1a00: __gmpn_sub_n (lldb) disassemble -s 0x1000de803 -e 0x1000de80f libgmp.10.dylib`: 0x1000de803 <+0>: jmpq *0x11ecf(%rip) ; (void *)0x00000001000a1a00: __gmpn_sub_n libgmp.10.dylib`: 0x1000de809 <+0>: jmpq *0x11ed1(%rip) ; (void *)0x00000001000a1aae: __gmpn_sub_nc
With the fault address at 0x1000de806
falling partway through the instruction at 0x1000de803
, this certainly would be a bad instruction. This code was assembled from https://gmplib.org/repo/gmp-6.3/file/62abbaeaab13/mpn/x86_64/core2/aors_n.asm, at the bottom of the file has __gmpn_sub_nc
jumping to within (but not the beginning of) __gmpn_sub_n
. Duplicating that structure in a reduced testcase:
% cat ts_x86-64.s .text .globl _F .p2align 4, 0x90 _F: movl $1, %eax Lcommon: shll %eax retq .globl _G .p2align 4, 0x90 _G: movl $2, %eax jmp Lcommon % cat tc.c int F(); int G(); int main(int argc, char* argv[]) { return G(); }
The problem is easily reproduced:
% clang -dynamiclib -flat_namespace -o libt.dylib ts_x86-64.s % clang -o t tc.c libt.dylib % ./t zsh: segmentation fault ./t
This dylib is small enough to observe what’s going on inside directly:
% objdump -d libt.dylib libt.dylib: file format mach-o 64-bit x86-64 Disassembly of section __TEXT,__text: 0000000000000f80 <_F>: f80: b8 01 00 00 00 movl $1, %eax f85: d1 e0 shll %eax f87: c3 retq f88: 0f 1f 84 00 00 00 00 00 nopl (%rax,%rax) 0000000000000f90 <_G>: f90: b8 02 00 00 00 movl $2, %eax f95: e9 05 00 00 00 jmp 0xf9f Disassembly of section __TEXT,__stubs: 0000000000000f9a <__stubs>: f9a: ff 25 60 00 00 00 jmpq *96(%rip) ## 0x1000
The jump at 0xf95
is bad: 0xf9f
is a bad jump target. As before, that address lies within another instruction (in this case, the last byte of the instruction at 0xf9a
). In fact, that’s the very last byte of the section:
% otool -l libt.dylib […] Section sectname __stubs segname __TEXT addr 0x0000000000000f9a size 0x0000000000000006 […] Section sectname __unwind_info segname __TEXT addr 0x0000000000000fa0 size 0x0000000000000058 […]
The jump at 0xf95
should target 0xf85
, or _G + 0x5
. For some reason, the linker created a stub for this jump (which itself shouldn’t be necessary) and then, instead of arranging for the stub to resolve and jump to _G + 0x5
, jumped to offset 0x5
within the stub.
This is a clear bug in the linker, and I’ll report it to Apple, but don’t know that anyone could expect much traction.
That doesn’t need to be the end of the story. There’s another concern here: this bug only occurs with -flat_namespace
. gmp shouldn’t need -flat_namespace
, and in fact it’s undesirable to enable it. It’s coming into this build from configure, via aclocal.m4, having been included from libtool.m4. In libtool-2.4.6, which gmp-6.3.0 is using, that’s https://git.savannah.gnu.org/cgit/libtool.git/tree/m4/libtool.m4?h=v2.4.6#n1070. In particular, it intends to enable -flat_namespace
only on very early Mac OS X versions (pre-10.4, in the PowerPC-only era). But the case that we’d like to hit, assuming MACOSX_DEPLOYMENT_TARGET
is unset (as it normally would be), doesn’t match $host
on a modern macOS system, because the Darwin version has marched past 20, while the pattern only contemplates versions up to 19.
https://git.savannah.gnu.org/cgit/libtool.git/commit/m4/libtool.m4?id=9e8c882517082fe5755f2524d23efb02f1522490, in libtool-2.4.7, modernizes this check in libtool, and with that in use, does not enable -flat_namespace
in this situation. Upgrading libtool in gmp to that version will fix this problem. I ran autoreconf --install
with autoconf-2.69, automake-1.15, and libtool-2.4.7, and observed a clean make check
on macOS 14.7 x86_64 (nehalem-apple-darwin23.6.0)/Xcode 15.4 and macOS 15.0 x86_64 (nehalem-apple-darwin24.0.0)/Xcode 16.0. In both cases, the linker is ld-new/ld-prime (no -ld_classic
).
comment:55 Changed 6 weeks ago by haberg-1 (Hans Åberg)
It is now on the GMP Bugs list: https://gmplib.org/list-archives/gmp-bugs/2024-October/005539.html
Changed 6 weeks ago by markmentovai (Mark Mentovai)
Attachment: | gmp-6.3.0_libtool-2.4.7.patch added |
---|
comment:56 Changed 6 weeks ago by markmentovai (Mark Mentovai)
Applying gmp-6.3.0_libtool-2.4.7.patch is an “easy” way to update gmp-6.3.0 to use libtool-2.4.7 without having to fiddle with autotools.
comment:57 follow-up: 58 Changed 6 weeks ago by cjones051073 (Chris Jones)
Could you see if adding
use_autoconf yes
helps, instead of that patch ? If memory serves that triggers a rerun of the autoconf utility before the build.
comment:58 Changed 6 weeks ago by markmentovai (Mark Mentovai)
Replying to cjones051073:
Could you see if adding
use_autoconf yeshelps, instead of that patch ? If memory serves that triggers a rerun of the autoconf utility before the build.
MacPorts’ gmp package is unaffected, because it always builds with MACOSX_DEPLOYMENT_TARGET
set, which even under the older libtool doesn’t cause -flat_namespace
to be used. (Incidentally, it also causes -Wl,-undefined,dynamic_lookup
to not be specified, but gmp doesn’t actually require this, so it’s fine.)
This bug as I understand it is about building gmp outside of MacPorts:
GMP 'make check' fails when built with later versions of Clang, but passes when built with GCC, so that should be the build dependency. Tried on MacOS 14. See:
https://gmplib.org/list-archives/gmp-bugs/2024-June/005505.html
https://gmplib.org/list-archives/gmp-bugs/2024-July/005506.html
This refers to make check
and not MacPorts’ port check
, and seems to mean that when building outside of MacPorts, a failure was observed with clang but not with gcc. This would have been before 2453011ee18c and 771b2dab4689, so MacPorts gcc would have been using ld -ld_classic
.
We now understand:
- The problem is in the linker, not the compiler. The bug can occur with any compiler, Xcode’s or MacPorts’. It can also be avoided with any compiler by forcing the use of
ld -ld_classic
, although that’s not the best solution. - The linker bug occurs using ld-new/ld-prime targeting x86_64 and using
-flat_namespace
. It’s a bug in gmp’s build system that-flat_namespace
is used at all, and this bug can be fixed by gmp picking up new build dependencies (in particular, libtool) that already fixed this bug a couple of years ago. - MacPorts’ own build of gmp is not affected by the bug, because
MACOSX_DEPLOYMENT_TARGET
during its build. There is no reason to fear any particular compiler when MacPorts builds gmp.
Given the new understanding, in light of how this bug was originally filed, I think that it should be closed with no action.
I left the patch file here to “show work”, but given the above, don’t think MacPorts needs to take it. use_autoconf
/use_autoreconf
might not work in this instance anyway, since (as is typical for autotools-based projects) gmp seems tied to specific versions of autoconf and automake, and those are not the current versions in MacPorts. In order to regenerate these files, I had to use autoconf269 and automake115, and place symbolic links from ${prefix}/share/automake115/aclocal to ../../aclocal for a variety of libtool’s m4 files. That’s why I called it fiddly.
comment:59 Changed 6 weeks ago by kencu (Ken)
Resolution: | → invalid |
---|---|
Status: | assigned → closed |
OK, this looks sorted.
We had established there is no need to force any particular compiler, and the problem was with the new linker.
Mark elegantly determined that it was "flat-namespace" that was the killer, and that because we set MACOSX_DEPLOYMENT_TARGET in MacPorts, that flat namespace is not going to be added to our macports builds.
I can certainly confirm that there is no "flat namespace" in the macports builds:
/bin/sh ../libtool --tag=CC --mode=link clang -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem -march=nehalem -no-install -o libtests.la memory.lo misc.lo refmpf.lo refmpn.lo refmpq.lo refmpz.lo spinner.lo trace.lo amd64call.lo amd64check.lo ../libgmp.la
and upstream has updated libtool already so that users playing with this outside of macports should see their "make check" working right with the next release.
Case closed!
Of the closing options -- the only one that really fits is "invalid" as this never affected macports builds anyway, in the end.
If someone doesn't like "invalid", feel free to reclose it with whatever you want.
comment:60 Changed 6 weeks ago by cjones051073 (Chris Jones)
Ok. I am a bit confused why this ticket was then created in the first place if there isn’t a problem with macports build of gmp. That certainly is not clear from the original submission or discussion.
Closing as invalid is perfectly reasonable in this case.
comment:61 Changed 6 weeks ago by kencu (Ken)
We didn't know that the macports builds were unaffected until the details about flat_namespace came forth.
Only once it was known that flat_namespace was the key piece of the puzzle, and also knowing that macports doesn't use flat_namespace due to -- well, essentially lucky reasons, basically -- we realized macports was unaffected and we had dodged this bullet.
comment:62 Changed 6 weeks ago by haberg-1 (Hans Åberg)
According to the GMP Bugs list, the issue can be fixed by using recent libtool, which has been done, but not pushed in a release: https://gmplib.org/list-archives/gmp-bugs/2024-October/005540.html
comment:63 Changed 6 weeks ago by haberg-1 (Hans Åberg)
You can use the latest snaphot for now in the original setup, compiling GMP with Clang, and linking GCC to that, as all GMP 'make check' tests pass, with both gcc14 and clang-18.
https://gmplib.org/download/snapshot/gmp-next/gmp-6.3.0-20240515185115.tar.zst
comment:64 Changed 6 weeks ago by kencu (Ken)
We don't need to worry about doing that, though, because the make check error does not show up in MacPorts builds even with the existing release, because of the way MacPorts builds have been configured.
you can see this for yourself, if you like, by doing this:
sudo port -v test gmp
comment:65 Changed 6 weeks ago by haberg-1 (Hans Åberg)
Anyway, tests pass without that special configuration.
comment:67 Changed 6 weeks ago by haberg-1 (Hans Åberg)
There is an informative description of the -flat_namespace option on the GMP Bugs list: https://gmplib.org/list-archives/gmp-bugs/2024-October/005543.html
If you download the current gmp software from here:
https://gmplib.org/download/gmp/gmp-6.3.0.tar.xz
decompress it and just build in on a current arm64 Mac system, outside of MacPorts, just with standard tools (Xcode and CLT installed, clang and standard tools used) you get a 100% pass:
I realize, looking at the above message and <https://gmplib.org/list-archives/gmp-bugs/2024-June/005494.html>, there is an alignment error with the x86_64 assembly code that now flags on MacOS as recent MacOS versions are intolerant of unaligned pointers. We have several MacPorts tickets about software trying to force unaligned pointers and Xcode clang / linker rejecting that.
The unaligned pointer issue seemed to be in one assembly file
bdiv_q_1.asm
. I noticed that file was forcing a low alignment of 8:https://github.com/gmp-mirror/gmp/blob/14fe69d7f56e00917e9fd9ab616afc798a1af6c1/mpn/x86_64/bdiv_q_1.asm#L137
I wondered if that might be the problem. I haven't as yet tried to fix it though.
So it might be premature to say clang is broken here.