Opened 3 years ago

Last modified 2 years ago

#64368 closed defect

cmake @3.22.1_0: "Error running link command: Invalid argument" when run under trace mode — at Initial Version

Reported by: chrstphrchvz (Christopher Chavez) Owned by: michaelld (Michael Dickens)
Priority: Normal Milestone:
Component: ports Version: 2.7.1
Keywords: tracemode Cc:
Port: cmake

Description

I observe that when building ports under trace mode, the file descriptors given to processes can reach/exceed the default value of FD_SETSIZE, which is 1024 when not manually defined before including <sys/select.h>. CMake—and possibly other programs—are compiled without manually defining FD_SETSIZE on macOS, and without defining _DARWIN_UNLIMITED_SELECT or _DARWIN_C_SOURCE. This leads to at least two problems using CMake under trace mode: Error running link command: Invalid argument being output; and out-of-bounds memory accesses, one effect being corruption of the Invalid error message.

More detailed description:

If a program is compiled without defining _DARWIN_UNLIMITED_SELECT (on macOS 10.5 Leopard and later, and before including <sys/time.h> for more recent macOS versions) or _DARWIN_C_SOURCE (unsure which macOS versions), then calling select() for more than 1024 file descriptors fails, and errno is set to EINVAL.

The select() call in kwsysProcessWaitForPipe() in ProcessUNIX.c (inherited from KWSys—Kitware System Library) is affected by this limitation: select() will fail when max+1 exceeds 1024. One use of the affected call is when running cmake -E cmake_link_script. If a trace mode build uses enough file descriptors to cause the select() call to fail, then strerror(errno) is used to get the error message (for EINVAL, it is "Invalid argument"), which CMake eventually prints out as:

Error running link command: Invalid argument

Using the FD_CLR()/FD_SET()/FD_ISSET() macros with too high of a file descriptor is known to cause out-of-bounds accesses (on various platforms—not just macOS), as sizeof(fd_set) is directly dependent on FD_SETSIZE. On macOS, these macros implement bounds checking, but only for an fd_set residing on the stack, and only on very recent macOS versions. The relevant fd_set in CMake is a member of a structure allocated on the heap, and so corruption/bogus reads do occur for file descriptors which are too high: one example is corruption to the error message buffer located only a few slots after the fd_set in the same structure, causing CMake to instead output e.g.

Error running link command: Invalid `rgument

or

Error running link command: In^Valid argument

where ^V is the ASCII SYN character.

Some programs (e.g. Tcl, in tclUnixPort.h) manually define FD_SETSIZE to OPEN_MAX (from <sys/syslimits.h>). Currently, CMake only defines FD_SETSIZE manually (in ProcessUNIX.c) for Cygwin.

I observe that defining FD_SETSIZE to OPEN_MAX and defining _DARWIN_UNLIMITED_SELECT in ProcessUNIX.c prevents select() from failing with EINVAL when called for more than 1024 file descriptors.

  • Source/kwsys/ProcessUNIX.c

    old new  
    4040#if defined(__CYGWIN__)
    4141/* Increase the file descriptor limit for select() before including
    4242   related system headers. (Default: 64) */
    4343#  define FD_SETSIZE 16384
     44#elif defined(__APPLE__)
     45/* Increase the file descriptor limit for select() before including
     46   related system headers. (Default: 1024) */
     47#  define _DARWIN_UNLIMITED_SELECT
     48#  include <sys/syslimits.h> /* OPEN_MAX */
     49#  define FD_SETSIZE OPEN_MAX
    4450#endif
    4551
    4652#include <assert.h>    /* assert */

I believe that both of these definitions are necessary to avoid issues. I observe that if _DARWIN_UNLIMITED_SELECT is defined but FD_SETSIZE is left at its default value, the fd_set structure is still not large enough to avoid out-of-bounds accesses by FD_SET()/FD_CLR()/FD_ISSET(); and the select() call, which previously would fail, now erroneously returns early (because fd_set is not large enough to indicate which file to block on), causing CMake to poll/busy-wait rather than block as intended, but without causing the build to fail. (I wonder if it is possible for the opposite to happen, where select() erroneously blocks on a file descriptor that fd_set is too small to describe, and cause the build to stall indefinitely; this would resemble unresolved trace mode issues I have noticed and reported for other ports.)

The port I used to investigate this issue is mysql8, but I hesitate to claim that making these changes to CMake would fix #63455, at least for anyone other than myself, since I’ve already led others who are much more knowledgable to believe that that issue is due to outdated libtool. I also don’t know if this fixes #55086.

I don’t know whether these changes are something upstream CMake/KWSys should incorporate, because I don’t know whether this is necessarily a bug in CMake, i.e. whether CMake (particularly when running cmake_link_script or other command needing select()) is normally able to get a file descriptor 1024 or higher, and encounter this issue outside of trace mode. But I do find that it is possible, at least on certain recent macOS versions, to naïvely compile a simple program which successfully opens more than 1024 files without defining FD_SETSIZE or _DARWIN_UNLIMITED_SELECT.

I don’t know if file descriptors growing to unusually large values under trace mode is expected behavior. If it isn’t—e.g. due to file descriptor leak in trace mode itself, or not cleaning up after programs with file descriptor leaks—then I don’t know whether it is causing “Too many open files” errors I sometimes observe under trace mode, but have neglected to report. I also don’t know if this issue should instead indicate something missing from the design of trace mode, like abstraction of file descriptor values. And I don’t know if this issue should be considered grave enough to condemn previous trace mode builds because FD_SET()/FD_CLR()/FD_ISSET() may have been silently causing programs to misbehave due to invalid memory accesses.

One idea for making this issue easier to reproduce and find affected programs would be to reserve file descriptors under 1024 at the beginning of trace mode, forcing programs to use file descriptors 1024 and higher. Also note that parallel building—and the number of processors if enabled—are factors in triggering this issue, as these affects the order in which steps in a build are done, and in turn how large the file descriptor values are at a certain point in the build.

Change History (0)

Note: See TracTickets for help on using tickets.