#49812 closed defect (duplicate)
buildbots: compile failures due to stalled bzr fetches
Reported by: | dbevans (David B. Evans) | Owned by: | admin@… |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | server/hosting | Version: | |
Keywords: | Cc: | ||
Port: |
Description
Recent attempts to build inkscape-devel are causing the buildbots to abort with a compile failure. Examination of the stdio log file shows that builds stall waiting for the bzr fetch (which is admittedly slow) to complete and eventually throw the compile failure.
This is new behavior which has only begun recently. The first svn commit to fail was r142782 dated 2015-11-22 23:34:24 -0800. Subsequent svn commit r142868 also failed. Last successful commit was r142636 dated 2015-11-17 18:36:45 -0800. Builds on my local machine fetch successfully within 2 to 3 minutes in all cases.
Log from the latest failed attempt is https://build.macports.org/builders/buildports-yosemite-x86_64/builds/6367/steps/compile/logs/stdio/text which ends abruptly with
CC_PRINT_OPTIONS='YES' CC_PRINT_OPTIONS_FILE='/opt/local/var/macports/build/_opt_mports_dports_graphics_inkscape-devel/inkscape-devel/work/.CC_PRINT_OPTIONS' CPATH='/opt/local/include' LIBRARY_PATH='/opt/local/lib' MACOSX_DEPLOYMENT_TARGET='10.10' DEBUG: Assembled command: 'cd "/opt/local/var/macports/build/_opt_mports_dports_graphics_inkscape-devel/inkscape-devel/work" && /opt/local/bin/bzr --builtin --no-aliases checkout --lightweight -r 14489 lp:inkscape inkscape-devel-0.91.99.14489' DEBUG: Executing command line: cd "/opt/local/var/macports/build/_opt_mports_dports_graphics_inkscape-devel/inkscape-devel/work" && /opt/local/bin/bzr --builtin --no-aliases checkout --lightweight -r 14489 lp:inkscape inkscape-devel-0.91.99.14489 2>&1 Not checking SSL certificate for xmlrpc.launchpad.net. You have not informed bzr of your Launchpad ID, and you must do this to write to Launchpad or access private data. See "bzr help launchpad-login".
as do the others. Initial warnings from bzr are normal and innocuous.
I'm wondering if you have access to more detailed network logs that would shed some light on what's happening (or not). Thanks.
Change History (24)
comment:1 Changed 9 years ago by dbevans (David B. Evans)
comment:2 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
I'm not aware of the existence of any network logs.
See also #49759 for a similar inexplicable git clone on the buildbot builders.
I propose we fix the problem as explained there: slow network fetches should be permitted. MacPorts should print periodic output during slow fetches so that it is apparent something is happening and things are not stalled.
comment:3 follow-up: 6 Changed 9 years ago by dbevans (David B. Evans)
Agreed. In the bzr case, it should be possible to enable a more verbose transfer mode. Currently, it is completely silent until the transfer is complete so it's hard to tell if something is happening or not. Do the buildbots have a set timeout period when waiting for output on stdio and if so what is it? Are there specific network protocols that are blocked at Apple?
comment:5 follow-up: 12 Changed 9 years ago by dbevans (David B. Evans)
Replying to jmr@…:
The timeout is 20 minutes.
This should be sufficient time for the bzr fetches to complete. Maximum fetch time on local machines is 5 minutes or less.
comment:6 follow-up: 10 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to devans@…:
Are there specific network protocols that are blocked at Apple?
All ports that are not explicitly allowed are blocked. I understand bzr uses port 4155. That is not on the list of ports that are documented to be allowed, but I don't know if the documentation is up to date. If you give me the hostname or IP address of a bzr server I can check if it can be accessed from the servers.
comment:7 Changed 9 years ago by dbevans (David B. Evans)
Based on using tcpdump to watch the network traffic during the fetch, the Launchpad bzr host name is taotie.canonical.com. I saw no traffic on port 4155. All transfers we strictly http (port 80).
comment:9 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
Owner: | changed from ryandesign@… to admin@… |
---|
comment:10 follow-up: 11 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to ryandesign@…:
All ports that are not explicitly allowed are blocked. I understand bzr uses port 4155. That is not on the list of ports that are documented to be allowed, but I don't know if the documentation is up to date.
I think the documentation was out of date. Based on prior internal tickets filed, all buildbot builders should have ports 80 (http), 443 (https), 2401 (cvs), 3690 (svn), 4155 (bzr), 9418 (git) open.
Replying to devans@…:
BTW, I've tried doing an http tarball fetch from Launchpad but for a given bzr revision each fetch yields a different checksum.
This sounds like a launchpad bug. Can you report it to them?
comment:11 Changed 9 years ago by dbevans (David B. Evans)
Replying to devans@…:
BTW, I've tried doing an http tarball fetch from Launchpad but for a given bzr revision each fetch yields a different checksum.
This sounds like a launchpad bug. Can you report it to them?
Will do.
comment:12 Changed 9 years ago by dbevans (David B. Evans)
Replying to devans@…:
Replying to jmr@…:
The timeout is 20 minutes.
This should be sufficient time for the bzr fetches to complete. Maximum fetch time on local machines is 5 minutes or less.
Actually, I need to update this statement as well. I recently ran a sequence of 10 consecutive inkscape-devel bzr fetches and the time required varied all over the map. Minimum was about 3 minutes with a maximum over 19 minutes. In addition, when running with tcpdump as mentioned above the fetch time was also very near 20 minutes.
Given these results, it's quite possible that a fetch on a buildbot VM might well exceed the 20 minute limit. Is it possible to run one or more inkscape-devel fetch by hand on one of the builbots and see how much time it really takes?
comment:13 follow-ups: 15 16 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
I tested on a gigabit Internet connection in Austin and it fetched 198MB in under 2 minutes. I've taken the Snow Leopard builder offline to test fetching there with the --verbose
flag, and so far it's fetching at around 50KB/s; at that rate, it should take around an hour to fetch it all.
So as I said in #49759 for git, we should modify MacPorts base so that bzr fetches print something while fetching. It looks like the --verbose
flag only works in a terminal, so I'm not sure how to make bzr print progress information when called by MacPorts.
comment:14 follow-up: 17 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
If we could hook into the MacPorts progress bar API for all revision control system fetches (bzr, cvs, git, hg, svn) that would be super.
comment:15 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to ryandesign@…:
I've taken the Snow Leopard builder offline to test fetching there with the
--verbose
flag, and so far it's fetching at around 50KB/s; at that rate, it should take around an hour to fetch it all.
It finished in about 32 minutes.
comment:16 follow-up: 18 Changed 9 years ago by dbevans (David B. Evans)
Replying to ryandesign@…:
I tested on a gigabit Internet connection in Austin and it fetched 198MB in under 2 minutes. I've taken the Snow Leopard builder offline to test fetching there with the
--verbose
flag, and so far it's fetching at around 50KB/s; at that rate, it should take around an hour to fetch it all.
I just did another fetch here and the download rate varied from a low of 25KB/s to a max around 256KB/s -- nothing more. So I'm assuming Launchpad has a fairly slow connection or their host is really loaded down. In addition, it looks like the bzr protocol is fairly inefficent -- lots of small packets with handshakes in between which doesn't help either.
So as I said in #49759 for git, we should modify MacPorts base so that bzr fetches print something while fetching. It looks like the
--verbose
flag only works in a terminal, so I'm not sure how to make bzr print progress information when called by MacPorts.
Worst yet the bzr checkout command doesn't accept --verbose according to the docs. Although it does show a progress bar when connected to a terminal.
comment:17 Changed 9 years ago by dbevans (David B. Evans)
Replying to ryandesign@…:
If we could hook into the MacPorts progress bar API for all revision control system fetches (bzr, cvs, git, hg, svn) that would be super.
Agreed.
comment:18 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to devans@…:
Worst yet the bzr checkout command doesn't accept --verbose according to the docs.
The manpage says it does:
bzr checkout [BRANCH_LOCATION] [TO_LOCATION] Options: --files-from ARG Get file contents from this tree. --hardlink Hard-link working tree files where possible. --help, -h Show help message. --lightweight Perform a lightweight checkout. Lightweight checkouts depend on access to the branch for every operation. Normal checkouts can perform common operations like diff and status without such access, and also support local commits. --quiet, -q Only display errors and warnings. --revision ARG, -r See "help revisionspec" for details. --usage Show usage message and options. --verbose, -v Display more information.
Although it does show a progress bar when connected to a terminal.
Can we trick it into thinking it has a terminal?
comment:19 Changed 9 years ago by dbevans (David B. Evans)
Well, I don't know what documentation I was looking at but you're right. However, running the checkout command with --verbose doesn't give any more than the normal.
Perhaps we could patch bzr to output not the running progress bar but a periodic status message if its not connected to a terminal. Depends on how the code works, I guess.
comment:20 Changed 9 years ago by dbevans (David B. Evans)
And now inkscape-devel r143503 just fetched successfully on all active buildbots.
comment:21 Changed 9 years ago by dbevans (David B. Evans)
At this point inkscape-devel bzr fetches seems to be working without problems and have been for several months. So perhaps this is a moot issue? Shall we close this ticket or do you want to keep it open.
comment:22 Changed 8 years ago by dbevans (David B. Evans)
Resolution: | → worksforme |
---|---|
Status: | new → closed |
No response. End to end network performance (root cause of this issue) seems to have improved significantly and is much less variable since this ticket was filed. Bzr fetches of inkscape-devel are no longer timing out and complete within a reasonable period of time. Closing as works for me.
comment:23 Changed 7 years ago by ryandesign (Ryan Carsten Schmidt)
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
Since this ticket was closed, we've moved the buildbot to new hosting that has a slower network connection than what we enjoyed at macOS forge. inkscape-devel has switched from bzr to git now, but the problem can happen with git too:
I've reopened #49759 to address it.
comment:24 Changed 7 years ago by ryandesign (Ryan Carsten Schmidt)
Resolution: | → duplicate |
---|---|
Status: | reopened → closed |
BTW, I've tried doing an http tarball fetch from Launchpad but for a given bzr revision each fetch yields a different checksum.