Opened 5 years ago

Last modified 2 years ago

#58800 new enhancement

Buildbot should wait until the clock catches up to the Portfiles

Reported by: ryandesign (Ryan Carsten Schmidt) Owned by: admin@…
Priority: Normal Milestone:
Component: buildbot/mpbb Version:
Keywords: Cc: chrstphrchvz (Christopher Chavez)
Port:

Description

Apparently some of the buildbot workers' clocks run a little fast. This has happened a few times: a buildbot worker goes down due to a power outage or for some other reason needs to be restarted, and after it restarts its clock re-syncs with the time server and now time has moved a few minutes backwards. If builds were queued up and start immediately, the builds fail with a message:

Error: Portfile is from the future - check date and time of your system

It might be nice if the buildbot would wait until the clock catches up with the timestamp of the Portfile before trying to build, though it needs to check the timestamps of all dependencies to be truly effective. It's probably simpler if we record the timestamp when we sync the ports tree (by touching some file, or maybe there is already some file that gets touched whose timestamp we can use), and then wait until the clock is equal to or greater than that timestamp before allowing the build to proceed.

Change History (10)

comment:1 Changed 5 years ago by mojca (Mojca Miklavec)

This problem very often appears on Travis as well.

comment:2 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)

Oh. Can you show a log example of that?

For buildbot I was thinking it would be enough to record the timestamp during one run, so that the next run (which might be after a reboot with fixed clock) could check it. But on Travis, where the environment is thrown away after every build, we might need to actually inspect files in the ports tree.

I do have a script somewhere that finds the newest modification date of any file in a directory. Maybe that's what we would need to do.

Alternately, since time on Travis is precious, maybe our strategy should be to identify Portfiles newer than now and touch them to bring their modification time back to now. Then we can proceed with a build immediately.

Or maybe we need to revisit why MacPorts checks Portfile timestamps in the first place. If this affects buildbot and Travis maybe it affects some users too.

comment:3 Changed 5 years ago by jmroot (Joshua Root)

The PortIndex's mtime will always be that of the newest Portfile. Proceeding despite evidence of an incorrect system clock is a bad idea because many build systems also rely on timestamps to work correctly.

comment:4 Changed 5 years ago by chrstphrchvz (Christopher Chavez)

Recent example log from one of my PRs: https://paste.macports.org/246941cbb5ff

Would it be safe to synchronize the clock at the beginning of a job?

comment:5 Changed 5 years ago by chrstphrchvz (Christopher Chavez)

Cc: chrstphrchvz added

comment:6 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)

By "synchronize the clock" do you mean do whatever the operating system is supposed to be doing on its own, but is apparently not doing often enough, to get the current time from ntp and set the system clock to that value, or do you mean what I proposed above? If the former, do you know how we could do that?

comment:7 Changed 5 years ago by mojca (Mojca Miklavec)

I might be wrong, but I can imagine that once you fire up a "virtual machine", the OS might not have yet successfully ran the clock syncronisation via NTP before the build starts. Given how often we see random network issue (on more or less the same builder than the one experiencing issues with time; the one running xcode 7), it could even be that the initial NTP sync fails because of network issues.

Yes, I would retry to sync via NTP. That said, if the network is having issues, this could fail as well.

comment:8 Changed 5 years ago by jmroot (Joshua Root)

AFAICT macOS doesn't actually run ntpd, which would be why the clock drift happens in the first place. Pre-Mojave you could run ntpdate to do a one-off sync, but that's gone in the latest ntp, with better alternatives now existing: https://support.ntp.org/bin/view/Dev/DeprecatingNtpdate#Functionality_List

comment:9 Changed 4 years ago by chrstphrchvz (Christopher Chavez)

In f9d70325451b35a34e62d8525c721d1c60c12d92/macports-ports (master):

bootstrap.sh: force VM clock to sync over NTP

See: #58800

comment:10 Changed 2 years ago by ryandesign (Ryan Carsten Schmidt)

Here's a recent example: See first and last lines of excerpt: The fetch phase started at 7:28:35 and, before it could complete, macOS clock synchronization had set the clock back to 7:14:24.

DEBUG: fetch phase started at Tue Nov  1 07:28:35 UTC 2022
--->  Fetching distfiles for p5.34-alien-build-plugin-download-github
DEBUG: Executing org.macports.fetch (p5.34-alien-build-plugin-download-github)
DEBUG: Changing to port directory: /opt/bblocal/var/buildworker/ports/build/ports/perl/p5-alien-build-plugin-download-github
DEBUG: OS darwin/10.8.0 (macOS 10.6.8) arch i386
DEBUG: Re-registering default for configure.universal_args
DEBUG: Sourcing PortGroup perl5 1.0 from /opt/bblocal/var/buildworker/ports/build/ports/_resources/port1.0/group/perl5-1.0.tcl
DEBUG: Re-registering default for livecheck.version
DEBUG: only one arch supported, so not adding the default universal variant
DEBUG: Running callback portconfigure::add_automatic_compiler_dependencies
DEBUG: Finished running callback portconfigure::add_automatic_compiler_dependencies
DEBUG: Running callback portbuild::add_automatic_buildsystem_dependencies
DEBUG: Finished running callback portbuild::add_automatic_buildsystem_dependencies
DEBUG: Running callback portstartupitem::add_notes
DEBUG: Finished running callback portstartupitem::add_notes
DEBUG: dropping privileges: euid changed to 502, egid changed to 505.
DEBUG: Portfile is from the future - check date and time of your system
    while executing
"open_statefile"
    (procedure "check_variants" line 28)
    invoked from within
"check_variants checksum"
    invoked from within
"$workername eval "check_variants $target""
    (procedure "mportexec" line 7)
    invoked from within
"mportexec $workername $target"
Error: Unable to execute port: Portfile is from the future - check date and time of your system
Checksum of 'p5.34-alien-build-plugin-download-github' failed.
DEBUG: Changing to port directory: /opt/bblocal/var/buildworker/ports/build/ports/perl/p5-alien-build-plugin-download-github
DEBUG: OS darwin/10.8.0 (macOS 10.6.8) arch i386
DEBUG: Re-registering default for configure.universal_args
DEBUG: Sourcing PortGroup perl5 1.0 from /opt/bblocal/var/buildworker/ports/build/ports/_resources/port1.0/group/perl5-1.0.tcl
DEBUG: Re-registering default for livecheck.version
DEBUG: only one arch supported, so not adding the default universal variant
DEBUG: Running callback portconfigure::add_automatic_compiler_dependencies
DEBUG: Finished running callback portconfigure::add_automatic_compiler_dependencies
DEBUG: Running callback portbuild::add_automatic_buildsystem_dependencies
DEBUG: Finished running callback portbuild::add_automatic_buildsystem_dependencies
DEBUG: Running callback portstartupitem::add_notes
DEBUG: Finished running callback portstartupitem::add_notes
DEBUG: Executing org.macports.main (p5.34-alien-build-plugin-download-github)
DEBUG: clean phase started at Tue Nov  1 07:14:24 UTC 2022

So the problem is not necessarily or not only that the VM clock runs fast. The problem is that the ESXi host machine's clock is fast. When a VM starts it inherits the clock value from the host, and then some time later the guest OS does its own time synchronization which can move the clock back.

I appear to have forgotten to set up NTP on this ESXi host. I found (in the web UI > Manage > System > Time & date) that "NTP Service Status" was "Stopped" and "NTP Servers" was not filled in. I clicked "Edit Settings", filled in the NTP host, and changed "Start and stop manually" to "Start and stop with port usage" to match the other hosts. I still needed to enable the NTP port in the firewall, which I couldn't figure out how to do in the web UI, so I used the vSphere Client application (Host > Configuration > Security Profile > Firewall: Properties... > NTP Client). After a few minutes, the host's clock did synchronize. Let's hope that fixes it.

Note: See TracTickets for help on using tickets.