Opened 2 years ago
Closed 14 months ago
#65569 closed defect (fixed)
emacs and legacy-support: `openat` broken?
Reported by: | lemzwerg (Werner Lemberg) | Owned by: | drkp (Dan Ports) |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | ports | Version: | |
Keywords: | lion mountainlion snowleopard mavericks | Cc: | mascguy (Christopher Nielsen), cooljeanius (Eric Gallager) |
Port: | emacs legacy-support |
Description
In https://lists.macports.org/pipermail/macports-users/2022-July/051283.html I wrote:
The 'emacs-devel' package of MacPorts is currently based on commit 4650ea9c25514831d925e5006ea0c3679344333b (from 2022-Jul-12).
On Mac OS X 10.7.5 (Lion), emacs 27.2 builds fine. However, the above commit fails with
./temacs --batch -l loadup --temacs=pbootstrap \ --bin-dest /opt/local/bin/ \ --eln-dest /opt/local/lib/emacs/29.0.50/ Warning: arch-independent data dir '/opt/local/share/emacs/29.0.50/etc/': Invalid argument Warning: Lisp directory '/opt/local/var/macports/build/_opt_bblocal_var_buildworker_ports_build_ports_editors_emacs/emacs-devel/work/emacs-20220712/lisp': Invalid argument Error: /opt/local/share/emacs/29.0.50/etc/charsets: Invalid argument Emacs will not function correctly without the character map files.
I think I found the problem: I took the Portfile for emacs 27.2 (which compiles successfully in MacPorts) and applied emacs commit
From b3ad638a60845f17938ff812efcf2b2edfbd8c57 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Mon, 20 Jan 2020 01:08:42 -0800 Subject: [PATCH] Work better if stat etc. are interrupted
to it (see attached Portfile
together with the patch file patch-fstat.diff
) – et voilà, port destroot emacs
fails in the same way as described above.
My guess is that there is a problem with the implementation of openat
in the 'macports-legacy-support' library from Snow Leopard to Mavericks, since building emacs 28.2 fails exactly the same way on those platforms.
Attachments (2)
Change History (14)
Changed 2 years ago by lemzwerg (Werner Lemberg)
Changed 2 years ago by lemzwerg (Werner Lemberg)
Attachment: | patch-fstat.diff added |
---|
comment:1 Changed 2 years ago by lemzwerg (Werner Lemberg)
Keywords: | lion mountainlion snowleopard mavericks added |
---|---|
Port: | emacs macports-legacy-support added |
comment:2 Changed 2 years ago by mascguy (Christopher Nielsen)
Cc: | mascguy added |
---|
comment:3 Changed 2 years ago by mascguy (Christopher Nielsen)
Owner: | set to drkp |
---|---|
Status: | new → assigned |
comment:4 Changed 2 years ago by cooljeanius (Eric Gallager)
Cc: | cooljeanius added |
---|
comment:5 Changed 2 years ago by lemzwerg (Werner Lemberg)
comment:7 Changed 19 months ago by lemzwerg (Werner Lemberg)
Port: | legacy-support added; macports-legacy-support removed |
---|---|
Summary: | emacs and macports-legacy-support → emacs and legacy-support: `openat` broken? |
comment:8 Changed 15 months ago by acjones8 (Alex Jones)
Out of curiosity, I've tried compiling Emacs 29.1 for 10.5 PPC, and I ran into this bug as well. Truth be told, I don't think I have enough experience to solve this on my own, but I tried debugging temacs
and legacy-support
with GDB, and I may have found a few clues.
The issue starts in src/callproc.c
, on line 1987. file_accessible_directory_p()
gets called, seemingly to check if the directory is valid. We step through some lines of code that initialize a few variables, and we end up at file_access_p()
. We then get to an if statement that checks whether faccessat (AT_FDCWD, file, amode, AT_EACCESS) == 0
. Stepping into faccessat()
brings us to rpl_faccessat()
, which calls orig_faccessat(fd, file, mode, flag)
, when then calls faccessat()
again - but this time, it seems to resolve to legacy-support's implementation.
The thing here is that the fd argument for faccessat()
gets set to -2, when AT_FDCWD is supposed to be the value of a file descriptor for the current location. Ordinarily, file descriptors are always positive, and negative values represent an error... strangely enough, according to the Linux man page, faccessat()
is apparently supposed to ignore the file descriptor if the path is absolute (which it is in this case), and should attempt to access the file regardless of where the fd points.
Confirming this, if I dig around in legacy-support's source code, I find that the implementation of faccessat()
calls getattrlistat()
, which calls ATCALL()
, which calls _ATCALL()
, which explicitly checks if the first character in the path is '/'.
I don't know enough about how all of this works under the hood to make full sense of it, but AT_FDCWD returning -2 seems very suspicious, as does the behavior of faccessat()
not changing whether or not the path is absolute. I'll continue digging and see if I can make any further progress, but hopefully this at least gives someone more knowledgeable than myself a starting point.
comment:9 Changed 14 months ago by acjones8 (Alex Jones)
I've made some further progress. So - it turns out, the -2 for the file descriptor is actually correct, it's defined in fcntl.h
and shows up when I tested it with a very simple C program that just calls it on its own source file. No issues here.
By rebuilding legacy-support with -g and -O0, I was able to get GDB to step into legacy-support's implementation of faccessat()
. The problem is on line 107:
ERR_ON(EINVAL, flags & ~AT_SYMLINK_NOFOLLOW);
ERR_ON()
expands into:
#define ERR_ON(code, what) if (what) { errno = (code); return -1; }
My understanding of what's going on here is that ERR_ON
checks if the code in the 2nd argument evaluates to anything other than 0. If so, is sets errno to EINVAL
and then returns -1. I'm able to confirm in orig_faccessat()
that the result is indeed -1, and the program crashes in GDB as soon as I try to step through this line, so I'm pretty confident this is the source of the issue.
The strange thing is, if I manually evaluate the code on the right (negating AT_SYMLINK_NOFOLLOW, which is defined as 32, and then anding it with the value of the flags argument, which is 16), I get 0, so this doesn't seem like it should be triggering.
comment:10 Changed 14 months ago by acjones8 (Alex Jones)
Alright! I've found the source of the bug, and I wrote a patch that fixes it. Emacs compiles and functions fine even on 10.5 PPC now, so this should restore compatibility with all newer versions of OS X as well.
The problem is that Emacs passes in AT_EACCESS
when it checks for the directories. That line above only leaves 0 if the flag is either set to 0 itself, or if it equals AT_SYMLINK_NOFOLLOW
. Because the flag isn't accepted, it errors out, and that makes Emacs think the argument is invalid, even though the path is perfectly fine and Emacs has permission to access it. If you insert this piece of code into src/atcalls.c
, inside faccessat()
on line 106:
// Ignore AT_EACCESS flag if (flags == AT_EACCESS) { flags = 0; // Zero out AT_EACCESS, use real user instead }
then it doesn't error out and Emacs compiles without any errors. In testing, it seems so far to function fine - I'm able to load up a C file, text syntax highlighting looks correct, and I can execute elisp code with no problems.
comment:11 follow-up: 12 Changed 14 months ago by acjones8 (Alex Jones)
With version 1.1.1, Legacy-Support implements the missing functionality required, so this ticket can probably be closed now. (#67406 is the sister ticket on Emacs itself)
comment:12 Changed 14 months ago by mascguy (Christopher Nielsen)
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I'm curious whether there is any progress or a new insight into this issue...