#58696 closed defect (fixed)
Don't force the buildbot to build from source
Reported by: | ryandesign (Ryan Carsten Schmidt) | Owned by: | admin@… |
---|---|---|---|
Priority: | High | Milestone: | |
Component: | buildbot/mpbb | Version: | |
Keywords: | Cc: | l2dy (Zero King) | |
Port: |
Description
Please revert [9dfffb1f00c3b9d56e6d83c33a8b3310138758fc/mpbb] or implement the suggestions in https://github.com/macports/mpbb/pull/12#issuecomment-452310520 to allow it to be configured.
Now that we uninstall unneeded ports on the buildbot workers, it is entirely possible for builds to get scheduled for ports whose archives have already been uploaded but which are not installed on the worker. We don't want to waste time rebuilding them on the worker.
Change History (20)
comment:1 follow-up: 6 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Priority: | Normal → High |
---|
comment:2 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Though I don't understand why the build got scheduled in the first place if the archive is already on the server.
comment:3 Changed 5 years ago by raimue (Rainer Müller)
Could it be that it happened because of the default variant +mkl
, but the check for the archive does not take variants into account. Although it is not really clear to me why get_portimage_path
would not respect default variants as that should be the same function that also determines where to store the image.
Could you try to run tools/archive-path.tcl py37-tensorflow
by hand to see what path it actually returns and whether it has the variant? (Note py27-tensorflow no longer exists.)
Just to be sure it was not a networking problem, maybe we should also log the return code from libcurl?
comment:4 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Thanks, I'll check the rest you mentioned, but I'm very doubtful that there is a network problem on the buildbot's local network. (All of this happens locally; there is no Internet traffic involved.) I also don't think nginx would have timed out or otherwise returned an unexpected error.
comment:5 Changed 5 years ago by jmroot (Joshua Root)
The comment about variants not being taken into account only applies to the planned support for building non-default variants. The only reason a port could not be excluded when the archive was already uploaded is if the distributability status changed or curl failed to check whether it existed. Logging should indeed be added for both of those cases.
As for the main request of this ticket, CI always needs to build from source, so the change can't simply be reverted. There needs to be an option as suggested in the PR.
comment:6 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to ryandesign:
The buildbot just wasted 6 hours 25 minutes building just one of py-tensorflow's 4 subports even though the archive is already on the server.
Looking at this build specifically, the gathering step shows the name of the archive:
"py27-tensorflow" is distributable Already uploaded public archive: py27-tensorflow-2.0.0_0+mkl.darwin_16.x86_64.tbz2
The portwatcher that triggered this build started at Tue Nov 5 19:36:47 2019 UTC. My web server logs show this hit to the nondistributable packages server at that time:
192.168.7.41 - - [05/Nov/2019:19:37:43 +0000] "HEAD /py27-tensorflow/py27-tensorflow-2.0.0_0+mkl.darwin_16.x86_64.tbz2 HTTP/1.1" 404 0 "-" "curl/7.66.0" "-"
And 6 hours and change later, the gathering step produced this hit on the distributable packages server:
192.168.7.41 - - [06/Nov/2019:02:20:27 +0000] "HEAD /py27-tensorflow/py27-tensorflow-2.0.0_0+mkl.darwin_16.x86_64.tbz2 HTTP/1.1" 200 0 "-" "curl/7.66.0" "-"
So mpbb list-subports
thought the package was not distributable, but mpbb gather-archives
thought it was.
comment:7 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Possibly relevant: manually running port_binary_distributable.tcl on py37-tensorflow on the High Sierra worker today says it is not distributable:
$ PORTSRC=/opt/bblocal/var/buildworker/ports/build/macports.conf /opt/local/bin/port-tclsh /opt/bblocal/var/buildworker/ports/build/infrastructure/jobs/port_binary_distributable.tcl -v py37-tensorflow "py37-tensorflow" is not distributable because its license "apache" conflicts with license "GPL-2" of dependency "openjdk13"
But there is an archive for it uploaded to the distributable packages server a month ago.
comment:8 follow-up: 11 Changed 5 years ago by mojca (Mojca Miklavec)
We probably made sure that if a build is started, stuff gets built from source (else we didn't spot any problems with Travis/Azure).
And you mentioned at some point that you want to run the builds in certain cases, for example when distributability changes, to ensure that something gets uploaded to the mirror.
comment:9 Changed 5 years ago by l2dy (Zero King)
We could revert https://github.com/macports/mpbb/pull/12 and apply https://github.com/macports/macports-ports/pull/3353.
comment:10 Changed 5 years ago by jmroot (Joshua Root)
Let's try to minimise patching in the CI scripts. Different behaviour is legitimately needed there, so make it an option as Mojca suggested in the comment linked from the ticket description.
comment:11 Changed 5 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to mojca:
We probably made sure that if a build is started, stuff gets built from source (else we didn't spot any problems with Travis/Azure).
I know that that is why the change was made. But it wastes CPU time on the buildbot.
And you mentioned at some point that you want to run the builds in certain cases, for example when distributability changes, to ensure that something gets uploaded to the mirror.
Forcing it to build from source again is not necessary to accomplish that. Allowing it to use the existing binary would be fine.
comment:12 Changed 5 years ago by mojca (Mojca Miklavec)
At the time when we changed this, I wasn't aware of the necessity to "rebuild already built ports" on our buildbot. Now that I know, it clearly makes more sense that we use different behaviour on buildbot and Travis/Azure, so let's use whatever way that's most straightforward to achieve what we want: still build everything from source on Travis, not forcing rebuild from source on buildbot, making sure that we don't skip any ports on the way, ...
It might be best if l2dy takes care of the fix, as it's needed on both mpbb and go side?
comment:13 Changed 5 years ago by l2dy (Zero King)
I could fix the Go side of it, but I don't have enough time to implement the mpbb part.
comment:14 Changed 5 years ago by jmroot (Joshua Root)
In fbe5757754dd9d9eec82ccc736595342d600641b/mpbb (master):
comment:15 follow-up: 16 Changed 5 years ago by jmroot (Joshua Root)
There you go. The Go side should literally just be adding "--source"
to this line: https://github.com/macports/mpbot-github/blob/09bd9654a9ed17577564dede65dfb642ed647def/ci/buildWorker.go#L70
comment:16 Changed 5 years ago by l2dy (Zero King)
Replying to jmroot:
There you go. The Go side should literally just be adding
"--source"
to this line: https://github.com/macports/mpbot-github/blob/09bd9654a9ed17577564dede65dfb642ed647def/ci/buildWorker.go#L70
Thanks! Done in [04dac5f8501f2bddf2ef8c82d57990631445873b/mpbot-github].
I have uploaded binaries compiled with Go 1.13, so it would only run on macOS 10.11 or later. But it's fine because that's the earliest version we have in CI.
comment:17 Changed 5 years ago by jmroot (Joshua Root)
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thanks. Would you mind reviewing this PR while you're at it? https://github.com/macports/mpbot-github/pull/7
comment:18 Changed 4 years ago by l2dy (Zero King)
comment:19 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)
So what else do we need to do to close this ticket?
comment:20 Changed 4 years ago by neverpanic (Clemens Lang)
I think we still need to pack the binary and push it to the releases section of mpbot-github, which I haven't done yet.
What do we need to do to get this issue resolved?
The buildbot just wasted 6 hours 25 minutes building just one of py-tensorflow's 4 subports even though the archive is already on the server. This delays other builds, and also delays mirroring, which can cause builds on the < 10.9 builders to fail.