Opened 3 years ago

Closed 3 years ago

Last modified 2 years ago

#62977 closed enhancement (fixed)

Build noarch ports on 11_arm64 builder

Reported by: jmroot (Joshua Root) Owned by: admin@…
Priority: Normal Milestone:
Component: buildbot/mpbb Version:
Keywords: Cc: mascguy (Christopher Nielsen)
Port:

Description

Currently noarch ports are built on 11_x86_64 and skipped on 11_arm64. The former is a VM while the latter is a dedicated system, so building on the arm64 machine instead would likely result in more balanced load and better overall performance.

Change History (17)

comment:1 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

My assumption is that at this time it is still more likely for a port to succeed on x86_64 than on arm64. Therefore, I intended to keep the noarch ports on the x86_64 VM.

comment:2 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

Resolution: fixed
Status: newclosed

In 66c4a14947be745c55bdd7c550686fdcbc3b6891/mpbb (master):

Build noarch ports on arm64 not x86_64

Closes: #62977

comment:3 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

I've made this change now despite my earlier objection because the performance observation is certainly valid, and I'm setting up the macOS 12 arm64 buildbot worker before the x86_64 one, and hopefully nearly a year in now most of the arm64 issues with noarch ports and their dependencies have been resolved, and if not, this is an added incentive to fix them.

After committing this it occurred to me that this might adversely affect the macOS 11 and later CI workers. They might skip builds for which there is no arm64 machine that will do the builds. If so, we'll need to have a way in this Tcl script to differentiate the CI machines from the Buildbot machines. Not sure if we already have such a method.

comment:4 Changed 3 years ago by mascguy (Christopher Nielsen)

Cc: mascguy added

comment:5 Changed 3 years ago by jmroot (Joshua Root)

There is another somewhat related issue with this. Ports with dependencies that set known_fail are skipped. But some of those dependencies may only be known_fail on one arch. So some ports that could be built are skipped on both builders. Example: all noarch dependents of python36.

comment:6 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

Aside from fixing python36 so that it works on arm64, what shall we do about it? We could remove the noarch check and allow both the arm64 and x86_64 builders to build the port.

comment:7 Changed 3 years ago by jmroot (Joshua Root)

The excluded ports are built anyway if needed as a dependency of something else, so that might be fine. The only tricky part would be avoiding a race condition if two builders are deploying the same-named files (which are possibly not quite identical due to things like timestamps).

comment:8 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

Has duplicate #63815.

The only tricky part would be avoiding a race condition if two builders are deploying the same-named files (which are possibly not quite identical due to things like timestamps).

Right. But I think the worst that could happen is that one worker uploads its archive to master, which generates a signature for it and deploys both, and a short time later the second worker uploads its archive to master which generates a new signature and deploys both of those, wiping out the first one. That's probably fine.

comment:9 Changed 3 years ago by mascguy (Christopher Nielsen)

Ah, missed the part about the CI jobs in comment:3. Sorry for the duplicate ticket folks!

comment:10 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

In f97db48d2b71523aeb70f63ecbbd697cc3ee1aba/mpbb (master):

Do not restrict which builders build noarch ports

The arm64 Buildbot builder is faster so we want to build most noarch
ports there. Similarly, the 10.6 i386 builder is idle more often so we
want to build ports there when we can. But some noarch ports or their
dependencies might only be buildable on the x86_64 builders. We also
want ports to be built on the GitHub Actions CI infrastructure and right
now those are x86_64 only.

In the situation where both Buildbot builders are available at the same
time and both try to build it, I assume that buildbot master is
single-threaded and will only process one builder's upload at a time so
that only one builder's archive and its signature will end up on the
package server. Which one is undefined and I assume that is not
important.

Closes: #62977

comment:11 in reply to:  8 Changed 3 years ago by jmroot (Joshua Root)

Replying to ryandesign:

Right. But I think the worst that could happen is that one worker uploads its archive to master, which generates a signature for it and deploys both, and a short time later the second worker uploads its archive to master which generates a new signature and deploys both of those, wiping out the first one. That's probably fine.

There are some comments in deploy_archives.sh that indicate that multiple build steps running on the buildmaster can run simultaneously, which is why a different upload directory name per worker is used. The worst that could happen is that worker 1 uploads an archive, the master signs it and starts deploying the two files, one of them is deployed, execution switches to worker 2's job which signs the archive and deploys both files, and then worker 1's job resumes and deploys its other file. There's then an archive from one worker and a signature from the other on the packages server. Unlikely perhaps, but I find that unlikely race conditions have a bad habit of happening in practice.

There is some locking in deploy_archives.sh that is used in the case of a single shared upload directory. I think it would be safest to extend that to the multi-directory case as well.

comment:12 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

So you're suggesting:

  • buildbot/deploy_archives.sh

    diff --git a/buildbot/deploy_archives.sh b/buildbot/deploy_archives.sh
    index e406744..19b3a51 100755
    a b if [[ -z "$ULPATH" ]]; then 
    1717    # workaround for buildbot not accepting WithProperties in env
    1818    if [[ -n "$1" ]]; then
    1919        ULPATH="$1"
    20         # assume a unique path is used per builder so no locking is needed
     20        NEED_LOCK=1
    2121    else
    2222        ULPATH="./archive_staging"
    2323        NEED_LOCK=1

But, uh:

    echo Acquiring lock...
    lockfile $LOCKFILE -r -1

lockfile is a Linux command that is not present on macOS.

comment:13 Changed 3 years ago by jmroot (Joshua Root)

Yes, that dates from the old Apple-hosted buildbot. It would need to use shlock instead on Mac.

comment:14 Changed 3 years ago by jmroot (Joshua Root)

In 626ccf184aae0f39f290b7f41a453ee744b7abda/macports-infrastructure (master):

re-enable shlock locking in deploy_archives.sh

See: #62977

comment:15 Changed 3 years ago by jmroot (Joshua Root)

Note that I was not able to test the above, so proceed with caution.

comment:16 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

Thanks. Deploying that to buildmaster will have to wait until it's idle.

comment:17 Changed 2 years ago by ryandesign (Ryan Carsten Schmidt)

This change is finally deployed to the buildmaster.

Note: See TracTickets for help on using tickets.