Opened 2 years ago

Closed 14 months ago

#65569 closed defect (fixed)

emacs and legacy-support: `openat` broken?

Reported by: lemzwerg (Werner Lemberg) Owned by: drkp (Dan Ports)
Priority: Normal Milestone:
Component: ports Version:
Keywords: lion mountainlion snowleopard mavericks Cc: mascguy (Christopher Nielsen), cooljeanius (Eric Gallager)
Port: emacs legacy-support

Description

In https://lists.macports.org/pipermail/macports-users/2022-July/051283.html I wrote:

The 'emacs-devel' package of MacPorts is currently based on commit 4650ea9c25514831d925e5006ea0c3679344333b (from 2022-Jul-12).

On Mac OS X 10.7.5 (Lion), emacs 27.2 builds fine. However, the above commit fails with

./temacs --batch  -l loadup --temacs=pbootstrap \
         --bin-dest /opt/local/bin/ \
         --eln-dest /opt/local/lib/emacs/29.0.50/
Warning: arch-independent data dir
  '/opt/local/share/emacs/29.0.50/etc/': Invalid argument
Warning: Lisp directory
  '/opt/local/var/macports/build/_opt_bblocal_var_buildworker_ports_build_ports_editors_emacs/emacs-devel/work/emacs-20220712/lisp':
  Invalid argument
Error: /opt/local/share/emacs/29.0.50/etc/charsets: Invalid argument
Emacs will not function correctly without the character map files.

I think I found the problem: I took the Portfile for emacs 27.2 (which compiles successfully in MacPorts) and applied emacs commit

From b3ad638a60845f17938ff812efcf2b2edfbd8c57 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 20 Jan 2020 01:08:42 -0800
Subject: [PATCH] Work better if stat etc. are interrupted

to it (see attached Portfile together with the patch file patch-fstat.diff) – et voilà, port destroot emacs fails in the same way as described above.

My guess is that there is a problem with the implementation of openat in the 'macports-legacy-support' library from Snow Leopard to Mavericks, since building emacs 28.2 fails exactly the same way on those platforms.

Attachments (2)

Portfile (9.5 KB) - added by lemzwerg (Werner Lemberg) 2 years ago.
patch-fstat.diff (11.9 KB) - added by lemzwerg (Werner Lemberg) 2 years ago.

Download all attachments as: .zip

Change History (14)

Changed 2 years ago by lemzwerg (Werner Lemberg)

Attachment: Portfile added

Changed 2 years ago by lemzwerg (Werner Lemberg)

Attachment: patch-fstat.diff added

comment:1 Changed 2 years ago by lemzwerg (Werner Lemberg)

Keywords: lion mountainlion snowleopard mavericks added
Port: emacs macports-legacy-support added

comment:2 Changed 2 years ago by mascguy (Christopher Nielsen)

Cc: mascguy added

comment:3 Changed 2 years ago by mascguy (Christopher Nielsen)

Owner: set to drkp
Status: newassigned

comment:4 Changed 2 years ago by cooljeanius (Eric Gallager)

Cc: cooljeanius added

comment:5 Changed 2 years ago by lemzwerg (Werner Lemberg)

I'm curious whether there is any progress or a new insight into this issue...

comment:6 Changed 19 months ago by lemzwerg (Werner Lemberg)

Ping! :-)

comment:7 Changed 19 months ago by lemzwerg (Werner Lemberg)

Port: legacy-support added; macports-legacy-support removed
Summary: emacs and macports-legacy-supportemacs and legacy-support: `openat` broken?

comment:8 Changed 15 months ago by acjones8 (Alex Jones)

Out of curiosity, I've tried compiling Emacs 29.1 for 10.5 PPC, and I ran into this bug as well. Truth be told, I don't think I have enough experience to solve this on my own, but I tried debugging temacs and legacy-support with GDB, and I may have found a few clues.

The issue starts in src/callproc.c, on line 1987. file_accessible_directory_p() gets called, seemingly to check if the directory is valid. We step through some lines of code that initialize a few variables, and we end up at file_access_p(). We then get to an if statement that checks whether faccessat (AT_FDCWD, file, amode, AT_EACCESS) == 0. Stepping into faccessat() brings us to rpl_faccessat(), which calls orig_faccessat(fd, file, mode, flag), when then calls faccessat() again - but this time, it seems to resolve to legacy-support's implementation.

The thing here is that the fd argument for faccessat() gets set to -2, when AT_FDCWD is supposed to be the value of a file descriptor for the current location. Ordinarily, file descriptors are always positive, and negative values represent an error... strangely enough, according to the Linux man page, faccessat() is apparently supposed to ignore the file descriptor if the path is absolute (which it is in this case), and should attempt to access the file regardless of where the fd points.

Confirming this, if I dig around in legacy-support's source code, I find that the implementation of faccessat() calls getattrlistat(), which calls ATCALL(), which calls _ATCALL(), which explicitly checks if the first character in the path is '/'.

I don't know enough about how all of this works under the hood to make full sense of it, but AT_FDCWD returning -2 seems very suspicious, as does the behavior of faccessat() not changing whether or not the path is absolute. I'll continue digging and see if I can make any further progress, but hopefully this at least gives someone more knowledgeable than myself a starting point.

comment:9 Changed 14 months ago by acjones8 (Alex Jones)

I've made some further progress. So - it turns out, the -2 for the file descriptor is actually correct, it's defined in fcntl.h and shows up when I tested it with a very simple C program that just calls it on its own source file. No issues here.

By rebuilding legacy-support with -g and -O0, I was able to get GDB to step into legacy-support's implementation of faccessat(). The problem is on line 107:

ERR_ON(EINVAL, flags & ~AT_SYMLINK_NOFOLLOW);

ERR_ON() expands into:

#define ERR_ON(code, what)   if (what) { errno = (code); return -1; }

My understanding of what's going on here is that ERR_ON checks if the code in the 2nd argument evaluates to anything other than 0. If so, is sets errno to EINVAL and then returns -1. I'm able to confirm in orig_faccessat() that the result is indeed -1, and the program crashes in GDB as soon as I try to step through this line, so I'm pretty confident this is the source of the issue.

The strange thing is, if I manually evaluate the code on the right (negating AT_SYMLINK_NOFOLLOW, which is defined as 32, and then anding it with the value of the flags argument, which is 16), I get 0, so this doesn't seem like it should be triggering.

comment:10 Changed 14 months ago by acjones8 (Alex Jones)

Alright! I've found the source of the bug, and I wrote a patch that fixes it. Emacs compiles and functions fine even on 10.5 PPC now, so this should restore compatibility with all newer versions of OS X as well.

The problem is that Emacs passes in AT_EACCESS when it checks for the directories. That line above only leaves 0 if the flag is either set to 0 itself, or if it equals AT_SYMLINK_NOFOLLOW. Because the flag isn't accepted, it errors out, and that makes Emacs think the argument is invalid, even though the path is perfectly fine and Emacs has permission to access it. If you insert this piece of code into src/atcalls.c, inside faccessat() on line 106:

// Ignore AT_EACCESS flag
if (flags == AT_EACCESS)
{
    flags = 0; // Zero out AT_EACCESS, use real user instead
}

then it doesn't error out and Emacs compiles without any errors. In testing, it seems so far to function fine - I'm able to load up a C file, text syntax highlighting looks correct, and I can execute elisp code with no problems.

comment:11 Changed 14 months ago by acjones8 (Alex Jones)

With version 1.1.1, Legacy-Support implements the missing functionality required, so this ticket can probably be closed now. (#67406 is the sister ticket on Emacs itself)

comment:12 in reply to:  11 Changed 14 months ago by mascguy (Christopher Nielsen)

Resolution: fixed
Status: assignedclosed

Replying to acjones8:

With version 1.1.1, Legacy-Support implements the missing functionality required, so this ticket can probably be closed now. (#67406 is the sister ticket on Emacs itself)

Sounds good!

Note: See TracTickets for help on using tickets.