#60098 closed defect (fixed)
10.7 buildbot worker is down
Reported by: | ryandesign (Ryan Carsten Schmidt) | Owned by: | admin@… |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | server/hosting | Version: | |
Keywords: | Cc: | ||
Port: |
Description
The 10.7 buildbot worker experienced a kernel panic, which is unusual for that worker. Thereafter it will not boot into safe mode; it gets stuck before the progress bar fill in. I have not tried booting into regular mode because I want to clean up whatever it was working on before letting it start a new build.
Change History (11)
comment:1 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
comment:2 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
When booted to an OS X installer iso, Disk Utility's First Aid function reported no errors on the startup disk, but it still would not boot to Safe Mode. I disabled the network interface and booted to regular mode, which worked, though there was a long delay after the gray screen with the Apple logo disappeared during which the screen was just white with the spinning beach ball of death cursor. After awhile, this was replaced with the usual desktop, though the menu bar was missing all menus on the left except the menu. The Spotlight menu on the right indicated it was indexing the drive. Eventually the normal menus appeared. Console said the system had detected an inconsistency in the /.fseventsd database so it had deleted it. I rebooted into normal mode, which booted up with normal speed. However, rebooting to Safe Mode still does not work, so there is still something not quite right.
comment:3 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Resolution: | → fixed |
---|---|
Status: | new → closed |
I enabled safe verbose mode with sudo nvram boot-args="-v -x"
and now I can see where it's getting stuck:
Running safe fsck on the boot volume... ** /dev/rdisk0s2 ** Root file system Executing fsck_hfs (version diskdev_cmds-540.1~25). ** Checking Journaled HFS Plus volume. ** Detected a case-sensitive volume. The volume name is SSD88 ** Checking extents overflow file. ** Checking catalog file.
I used Disk Warrior to recreate the disk directory, and now safe and normal booting both work fine.
This worker is back online, but it'll take days to work through the backlog of builds.
comment:4 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Another kernel panic today (in bsdtar) but booting into safe mode worked fine and Disk Utility shows no errors. I manually cleaned up the debris and rebooted and the worker is back online again.
comment:5 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Kernel panic again, in bsdtar again. Safe boot works fine and Disk Utility shows no problem. The problem seems to happen when extracting the python38 archive; I see three failed mpextract* directories in /opt/local/var/macports/software/python38 (from Feb 16, Mar 20, and Mar 21). Extracting the tbz2 on the command line works fine. Nevertheless I've uninstalled python38 on that worker; we'll see if the problem recurs when it reinstalls it later from the archive on the server. If that fails too, I can delete the archive from the server too to have the builder rebuild it.
comment:6 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Another kernel panic in bsdtar, this time while extracting llvm-9.0, so I uninstalled llvm-9.0.
comment:7 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Another kernel panic in bsdtar, this time while extracting openssl, so I uninstalled openssl.
comment:8 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Another kernel panic but this time in mtree. Weird...
comment:9 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Another kernel panic, this time in tclsh8.5. I'm restarting the entire Xserve now.
comment:10 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Another kernel panic, this time in gsed.
comment:11 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Another kernel panic in tclsh8.5.
I should clone the disk to a new disk, in case there is some disk problem that neither disk utility nor diskwarrior could see.
What happened after the kernel panic looks strange at first: the dozens of queued builds failed with:
This happened because of an error I made while restoring the High Sierra buildbot worker. I had installed a fresh copy of High Sierra and then intended to restore the High Sierra worker's Time Machine backup to it, but somehow I selected the Lion worker's backup instead. I didn't realize this until after the restoration completed and I booted up the new worker, which started the buildbot launchd plists, which tried to compile MacPorts with clang from Xcode 4.6.3, which is not compatible with High Sierra. Once I realized the error I shut down that VM, reinstalled High Sierra fresh again and restored the correct backup, but the pending Lion builds had already been consumed. I'll reschedule them.