Opened 12 years ago
Closed 12 years ago
#35508 closed defect (fixed)
arpack port does not work on Lion with GFortran 4.6.2 due to Accelerate problem
Reported by: | gcrosswhite@… | Owned by: | mamoll (Mark Moll) |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | ports | Version: | 2.1.2 |
Keywords: | Cc: | ||
Port: | arpack |
Description
I have seen problem this before and thought it had been squashed in this port but it has appeared again.
ARPACK has a problem in that it uses the BLAS routine ZDOTC which has a different calling convention in Accelerate.framework then that used by GFortran which causes crashes that I have encountered in my code. I know that this was the source of the problem because when I downloaded arpack-ng and patched it manually, replacing
X = ZDOTC(....)
with
call ZDOTC(X,...)
then the problems went away.
I am not sure how people would prefer to see this problem solved, but I could submit a patch making the changes above if you all would like.
Attachments (2)
Change History (12)
comment:1 Changed 12 years ago by mf2k (Frank Schima)
Port: | arpack added |
---|
comment:2 Changed 12 years ago by mf2k (Frank Schima)
Owner: | changed from macports-tickets@… to mmoll@… |
---|
comment:3 Changed 12 years ago by mamoll (Mark Moll)
I have Mountain Lion installed and can't reproduce this. I just reinstalled arpack @3.1.1_2+accelerate+gcc46+openmpi. Can you attach your main.log file?
comment:4 Changed 12 years ago by gcrosswhite@…
I didn't see anything in /opt/local/var/macports/logs, but I wasn't expecting to as the port builds just fine; the problem is that the resulting library is not okay because it segfaults at runtime because it is using the wrong calling convention for some BLAS routines such as zdotc.
To create a simple test case that illustrates the problem, I compiled the test program zndrv1.f in EXAMPLES/COMPLEX of the main ARPACK distribution and linked it against the MacPorts build of libarpack.a. The result was:
$ gfortran zndrv1.f /opt/local/lib/libarpack.a -framework Accelerate $ ./a.out zsh: segmentation fault ./a.out
We can see where the segmentation fault is coming from by using gdb:
$ gdb ./a.out GNU gdb 6.3.50-20050815 (Apple version gdb-1752) (Sat Jan 28 03:02:46 UTC 2012) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .. (gdb) run Starting program: /Users/gcross/Downloads/ARPACK/EXAMPLES/COMPLEX/a.out Reading symbols for shared libraries +++++................................ done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000015 0x00007fff87f02d9a in zdotc_ () (gdb) backtrace #0 0x00007fff87f02d9a in zdotc_ () #1 0x0000000100009f8f in zneupd_ () #2 0x0000000100002ac1 in MAIN__ () #3 0x0000000100003833 in main ()
So in conclusion the crash is related to zdotc, and when I linked the test program against my own version of libarpack.a which did the replacement I discussed earlier the program ran just fine:
$ gfortran zndrv1.f /usr/local/lib/libarpack.a -framework Accelerate $ ./a.out Ritz values (Real, Imag) and relative residuals ----------------------------------------------- Col 1 Col 2 Col 3 Row 1: 7.16197D+02 1.02958D+03 6.80426D-15 Row 2: 7.16197D+02 -1.02958D+03 9.03466D-15 Row 3: 6.87583D+02 1.02958D+03 1.11184D-14 Row 4: 6.87583D+02 -1.02958D+03 1.58575D-14 _NDRV1 ====== Size of the matrix is 100 The number of Ritz values requested is 4 The number of Arnoldi vectors generated (NCV) is 20 What portion of the spectrum: LM The number of converged Ritz values is 4 The number of Implicit Arnoldi update iterations taken is 25 The number of OP*x is 392 The convergence criterion is 1.11022302462515654E-016
So, this doesn't quite answer your question, but it is the closest answer I can think of at the moment that provides you with a log that records the problem, as well as an example easily available test case that triggers it.
comment:5 Changed 12 years ago by mamoll (Mark Moll)
Ah, I get it now. If you could submit a patch, that'd be great.
Changed 12 years ago by gcrosswhite@…
Attachment: | patches.tar.gz added |
---|
Patches to change all CDOTC and ZDOTC calls to work with Accelerate.
comment:6 Changed 12 years ago by gcrosswhite@…
I did a grep through the sources and changed every call to either CDOTC or ZDOTC so that they were treated like subroutines with the return value stored in the first argument rather than like functions. I did some spot checks to make sure that the resulting library is good; the changes made the double-precision complex valued tests work (e.g., zndrv* in EXAMPLES/COMPLEX) but for some reason lots of other test including the single-precision complex tests in COMPLEX/ fail both before and after makings the changes; however, they do so with an error message rather than a segfault so I don't think that their problem is related to this one, and in particular these changes don't seem to be making anything worse.
I have attached the patches for all of the files that I changed; there are 24 in total: 4 base files * 2 precisions * 3 modes (sequential, parallel MPI, parallel BLACS).
VERY IMPORTANT: You most likely already were going to do this but just to be sure: make sure that this patch is only applied when using Accelerate! This is because only Accelerate has the weird ABI issue that requires this rather strange form of patch in order to work the quirk, so if the path it is applied when using, say, atlas, then it will actually break things rather than fixing them.
comment:7 Changed 12 years ago by mamoll (Mark Moll)
I committed a change in the Portfile that applies your patches in r96280. Please give it a try. One of the patches, patch-SRC-cneupd.f.diff, was 0 bytes. Is that correct?
Changed 12 years ago by gcrosswhite@…
Attachment: | patch-SRC-cneupd.f.diff added |
---|
Corrected patch for the file SRC/cneupd.f
comment:8 Changed 12 years ago by gcrosswhite@…
Ugh, indeed you caught that one of my patches got screwed up somehow; the corrected version has been attached above. As cneupd.f is not used by my own program, I will try out the new port now.
comment:10 Changed 12 years ago by mamoll (Mark Moll)
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thanks for your patches. The last patch was added in r96338. Closing this issue.
In the future, please fill in the Port field and Cc the port maintainer(s).