#54193 closed defect (fixed)
squid3+launchd causes multiple runaway processes when run in daemon mode
Reported by: | essandess (Steve Smith) | Owned by: | jmroot (Joshua Root) |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | ports | Version: | 2.4.1 |
Keywords: | Cc: | ||
Port: | squid3 |
Description
sudo port load squid3
regularly and consistently causes multiple runaway squid process with 100 percent cpu. The problem is severe enough to crash a squid server. I observe this problem, as do others here: https://github.com/essandess/osxfortress/issues/5#issuecomment-292507167 and elsewhere.
This happens after a network interface is changed, or some other change in dis or the network. The only fix is to kill everything:
sudo port unload squid3 sudo killall -9 squid sudo killall -9 '(squid-1)'
This problem is caused because the file /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper
specifies that squid is launched in daemon mode, which confuses launchd.
This is a long standing issue between launchd and Macports squid. It is
The confirmed fix is to launch squid in non-daemon mode by adding the -N
flag to the wrapper's Start()
method.
Here is the corrected, working Squid.wrapper
file:
#!/bin/sh # # MacPorts generated daemondo support script # # # Init # prefix=/opt/local # # Start # Start() { cd /opt/local/var/squid if [ ! -d "/opt/local/var/squid/cache/00" ]; then /opt/local/sbin/squid -s -z fi /opt/local/sbin/squid -s -N } # # Stop # Stop() { cd /opt/local/var/squid /opt/local/sbin/squid -k shutdown while /opt/local/sbin/squid -k check; do sleep 1 done } # # Restart # Restart() { Stop Start } # # Run # Run() { case $1 in start ) Start ;; stop ) Stop ;; restart) Restart ;; * ) echo "$0: unknown argument: $1";; esac } # # Run a phase based on the selector # Run $1
Attachments (1)
Change History (19)
comment:1 Changed 7 years ago by jmroot (Joshua Root)
comment:2 Changed 7 years ago by jmroot (Joshua Root)
Priority: | High → Normal |
---|
comment:3 follow-up: 5 Changed 7 years ago by essandess (Steve Smith)
I see this issue regularly on a variety of boxes and networks: Mac Pros, Minis, MBPs. One near-sure way to trigger it is to sleep a box through a VPN expiration.
Here's squid.conf
: https://raw.githubusercontent.com/essandess/osxfortress/master/squid.conf
And specifying -N
definitely fixes the issue.
What's your setting for http_port
? Do you bind to a fixed IP address?
comment:4 Changed 7 years ago by mf2k (Frank Schima)
Cc: | jmroot removed |
---|---|
Owner: | set to jmroot |
Status: | new → assigned |
comment:5 Changed 7 years ago by jmroot (Joshua Root)
Replying to essandess:
And specifying
-N
definitely fixes the issue.
Works around it at least. Using -N has undesirable effects on privilege separation and SMP scalability. I'd like to understand and fix the actual bug.
What's your setting for
http_port
? Do you bind to a fixed IP address?
http_port 3128
comment:6 Changed 7 years ago by essandess (Steve Smith)
Please let me know if I can help with any diagnostics. This issue is both annoyingly consistent in that it would trigger every few days if there lots of network changes, but also annoyingly unpredictable. I was never able to nail down the precise issue or conditions. At some point I was speculating about loss of DNS service with mDNSRespnder as a trigger, and haven't ruled out that hypothesis.
comment:7 Changed 7 years ago by jmroot (Joshua Root)
When the issue occurs, does squid.pid contain a valid pid?
I'll try to find some time to do a stress test with many net changes in a VM or something. Very hard to diagnose what I can't reproduce.
comment:8 Changed 7 years ago by essandess (Steve Smith)
I believe I have an instance running now that exhibits the underlying issue.
I have set the -N
flag in Squid.wrapper
and do not see the runaway squid processes issue this is intended to fix; however, other problems exist that I believe are related.
Problem 1: The squid.pid
file doesn't match the squid pid.
Problem 2: The squid process is not responding to -k
signals.
Good thing: The running squid process is working fine as a proxy.
Here's what I see:
$ ps -el | grep squid | grep -v grep 0 3621 1 4004 0 20 0 2506820 912 - Ss 0 ?? 0:00.02 /opt/local/bin/daemondo --label=Squid --start-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper restart ; --restart-netchange --pid=fileauto --pidfile /opt/local/var/run/squid/squid.pid 504 3626 3622 4104 0 20 0 2970120 473076 - S 0 ?? 10:58.75 /opt/local/sbin/squid -s -N 504 3639 3626 4004 0 20 0 2432784 212 - Ss 0 ?? 0:05.76 (logfile-daemon) /opt/local/var/squid/logs/access.log
$ cat /opt/local/var/run/squid/squid.pid 74809
Error sending a signal to squid:
$ sudo squid -k rotate squid: ERROR: Could not send signal 30 to process 74809: (3) No such process
Permissions check:
$ ls -l /opt/local/var/run/squid/squid.pid -rw-r--r-- 1 root squid 6 May 20 21:21 /opt/local/var/run/squid/squid.pid
$ ls -ld /opt/local/var/run/squid drwxr-xr-x 4 squid squid 136 May 19 06:48 /opt/local/var/run/squid
(Is this correct? There's a difference between the permissions on squid.pid
file and the directory in which it resides.)
Background info:
- Squid is loaded using the command
sudo port load squid
- I have verified that the launchd plist is loaded:
$ sudo launchctl list | grep -i squid - 0 org.squid-cache.squid-rotate.plist 3621 0 org.macports.Squid
- The state of the current problem arose some time after a fresh reboot several days ago. The launchd plist was loaded at boot time.
Because this issue is running in a stable state (and I have a working proxy), I would be able to query any additional logs that you might need.
I've also launched squid without the -N
flag on a separate box and am waiting for it to fail out with a bunch of runaway processes.
comment:9 Changed 7 years ago by essandess (Steve Smith)
Here's the runaway 100 percent multiple squid processes failure mode when the -N
flag is not used. Again, the pid file /opt/local/var/run/squid/squid.pid
has an inconsistent pid
This is from a different box. It took somewhere between 24 and 36 hours for this issue to appear:
$ ps -el | grep squid | grep -v grep 0 48655 1 104 0 20 0 2465472 628 - Ss 0 ?? 0:00.00 /opt/local/sbin/squid -s 503 48657 48655 4104 0 20 0 2633220 140332 - S 0 ?? 0:12.53 (squid-1) -s 0 48662 1 4004 0 20 0 2515008 2444 - Ss 0 ?? 0:00.02 /opt/local/bin/daemondo --label=Squid --start-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper restart ; --restart-netchange --pid=fileauto --pidfile /opt/local/var/run/squid/squid.pid 0 48665 1 104 0 20 0 2491072 676 - Ss 0 ?? 0:00.04 /opt/local/sbin/squid -s 503 48668 48657 4004 0 20 0 2440976 688 - S 0 ?? 0:00.00 (logfile-daemon) /opt/local/var/squid/logs/access.log 0 48699 1 104 0 20 0 2491072 736 - Ss 0 ?? 0:00.03 /opt/local/sbin/squid -s 0 48744 1 104 0 20 0 2473664 672 - Ss 0 ?? 0:00.03 /opt/local/sbin/squid -s 0 48792 1 104 0 20 0 2465472 668 - Ss 0 ?? 0:00.02 /opt/local/sbin/squid -s 0 48850 1 104 0 20 0 2474688 660 - Ss 0 ?? 0:00.02 /opt/local/sbin/squid -s 0 48913 1 104 0 20 0 2507456 696 - Ss 0 ?? 0:00.02 /opt/local/sbin/squid -s 0 48980 1 104 0 20 0 2473664 628 - Ss 0 ?? 0:00.01 /opt/local/sbin/squid -s 0 49041 1 104 0 20 0 2481856 620 - Ss 0 ?? 0:00.01 /opt/local/sbin/squid -s 0 49171 1 104 0 20 0 2491072 596 - Ss 0 ?? 0:00.00 /opt/local/sbin/squid -s 503 49252 48980 4104 0 0 0 2598404 132680 - R 0 ?? 0:12.28 (squid-1) -s 503 49254 48850 4104 0 0 0 2590212 123884 - R 0 ?? 0:11.30 (squid-1) -s 503 49256 49171 4104 0 0 0 2590212 124064 - R 0 ?? 0:11.26 (squid-1) -s 503 49262 48792 4104 0 0 0 2607620 116032 - R 0 ?? 0:09.88 (squid-1) -s 503 49264 48744 4104 0 0 0 2585092 84676 - R 0 ?? 0:07.04 (squid-1) -s 503 49266 48913 4104 0 0 0 2583044 83692 - R 0 ?? 0:06.96 (squid-1) -s 503 49268 48699 4104 0 0 0 2560516 83476 - R 0 ?? 0:06.95 (squid-1) -s 503 49270 48665 4104 0 0 0 2575876 83348 - R 0 ?? 0:06.91 (squid-1) -s $ cat /opt/local/var/run/squid/squid.pid 49292 $ ls -l /opt/local/var/run/squid/squid.pid -rw-r--r-- 1 root squid 6 May 21 18:08 /opt/local/var/run/squid/squid.pid $ ls -ld /opt/local/var/run/squid/ drwxr-xr-x 4 squid squid 136 May 21 09:36 /opt/local/var/run/squid/ $ sudo squid -k rotate squid: ERROR: Could not send signal 30 to process 49377: (3) No such process
Changed 7 years ago by essandess (Steve Smith)
Attachment: | Screen Shot 2017-05-21 at 6.07.42 PM.png added |
---|
Activity Monitor for Runaway Squid Processes
comment:10 Changed 7 years ago by essandess (Steve Smith)
I've relaunched squid on both boxes (sudo port unload squid; sudo port load squid
) and the PID file /opt/local/var/run/squid/squid.pid
shows the correct pid.
The error must occur over the daemondo relaunch transition. If there's way of tracking or logging this please let me know and I'll keep an eye out for the event.
comment:11 follow-up: 12 Changed 7 years ago by essandess (Steve Smith)
I also see the incorrect pid
in the PID file after a fresh reboot.
Is this an issue that should be upstreamed to daemondo?
comment:12 Changed 7 years ago by jmroot (Joshua Root)
Replying to essandess:
Is this an issue that should be upstreamed to daemondo?
Possibly. However the upstream is still MacPorts.
I still haven't been able to reproduce this, unfortunately.
comment:13 Changed 7 years ago by jmroot (Joshua Root)
Could you try this and see if it helps?
-
net/squid3/Portfile
diff --git a/net/squid3/Portfile b/net/squid3/Portfile index a670aab57f..dd601a991a 100644
a b post-configure { 85 85 startupitem.create yes 86 86 startupitem.name Squid 87 87 startupitem.netchange yes 88 startupitem.pidfile auto${pidfile}88 startupitem.pidfile clean ${pidfile} 89 89 startupitem.start \ 90 90 "cd ${prefix}/var/squid" \ 91 91 "if \[ ! -d \"${prefix}/var/squid/cache/00\" \]; then" \
comment:14 Changed 7 years ago by essandess (Steve Smith)
Sorry, using clean
didn't work.
The issue persists after a full uninstall and rebuild with these steps:
port unload squid3 vi `port file squid3` # auto to clean ${pidfile} port uninstall squid3 port clean --dist squid3 port install squid3 port load squid3
It took less than a day for the problem to appear, and after changing the System Preferences>Network>Location.
Several runaway squid process were being launched, and the pid file had the wrong pid. This is what ps -ef | grep squid | grep -v grep
looks like write after the runaway problem starts:
ps -ef | grep squid | grep -v grep 0 16935 1 0 9:02PM ?? 0:00.00 /opt/local/sbin/squid -s 0 16956 1 0 9:02PM ?? 0:00.00 /opt/local/sbin/squid -s 504 17000 16935 0 9:02PM ?? 0:06.86 (squid-1) -s 504 17003 16956 0 9:03PM ?? 0:00.12 (squid-1) -s 0 83825 1 0 9:23AM ?? 0:00.03 /opt/local/bin/daemondo --label=Squid --start-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper restart ; --restart-netchange --pid=fileclean --pidfile /opt/local/var/run/squid/squid.pid 0 83832 1 0 9:23AM ?? 0:00.00 /opt/local/sbin/squid -s 504 83834 83832 0 9:23AM ?? 1:59.40 (squid-1) -s 504 83841 83834 0 9:23AM ?? 0:01.07 (logfile-daemon) /opt/local/var/squid/logs/access.log The only thing that works to stop the runaway squid processes is to add the {{{-N}}} flag as described above. The PID file eventually is inconsistent with the running squid process, which I believe causes all the problems.
comment:15 Changed 7 years ago by essandess (Steve Smith)
By the way, I have noticed a lag of a few seconds for squid to write the PID file:
$ sudo port load squid3 $ ps -ef | grep squid | grep -v grep 0 2971 1 0 9:25PM ?? 0:00.01 /opt/local/bin/daemondo --label=Squid --start-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper restart ; --restart-netchange --pid=fileclean --pidfile /opt/local/var/run/squid/squid.pid 504 2976 2972 0 9:25PM ?? 0:04.10 /opt/local/sbin/squid -s -N $ cat /opt/local/var/run/squid/squid.pid cat: /opt/local/var/run/squid/squid.pid: No such file or directory $ cat /opt/local/var/run/squid/squid.pid 2976 # the correct PID file appears a few seconds later
Could this PID file lagginess be related to the underlying problem?
comment:16 Changed 7 years ago by essandess (Steve Smith)
Re: https://github.com/essandess/macOS-Fortress/issues/5#issuecomment-300650444 [Squid.wrapper
with a -N
flag.]
I’ve made a few changes that I believe mitigate the problem of squid running multiple processes at boot, but do not address the underlying issue.
- Changed
squid.conf
’sshutdown_lifetime 5 seconds
to be much less thandaemondo
’s 20 second timeout to avoiddaemondo
launching multiple redundant squid processes at boot or on a network change. See https://trac.macports.org/browser/branches/release_2_3/base/src/programs/daemondo/main.c:
const CFTimeInterval kChildDeathTimeout = 20.0;
- Use a robust script https://github.com/essandess/macOS-Fortress/blob/master/macosfortress_boot_check to run by hand at boot time that checks (1) if squid has bound port 3128 and (2) that there is only a single squid process. If not, it kills all running squid and relaunches a controlled squid daemon. 🦑
# commands SUDO=/usr/bin/sudo PORT=/opt/local/bin/port TAIL=/usr/bin/tail LSOF=/usr/sbin/lsof KILLALL=/usr/bin/killall # squid # make sure there aren't multiple squid jobs running if ! [[ `$SUDO $LSOF -i ':3128' | $TAIL -1` && `ps -ef | grep "/opt/local/sbin/squid -s" | grep -v grep | wc -l` -eq 1 ]]; then $SUDO $PORT unload squid3 $SUDO $KILLALL 'squid' $SUDO $KILLALL '(squid-1)' $SUDO $PORT load squid3 fi
- Deprecated the modified
Squid.wrapper
file and use the default Macports daemondo settings.
This has helped on the boxes I run, but doesn’t address the underlying issue of daemondo
launching multiple squid processes at boot or on a network change.
comment:17 follow-up: 18 Changed 6 years ago by jmroot (Joshua Root)
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:18 Changed 6 years ago by essandess (Steve Smith)
@jmroot,
I just wanted to thank you and confirm that the new port file for squid4
along with how squid4 handles its parent and child processes fixes this issue.
Replying to jmroot:
In 2176e957f8389cc830b9605da73ace4e2c3f8071/macports-ports (master):
We're not running squid directly from launchd, it's going via daemondo, the whole purpose of which is to interface between launchd and things running in daemon mode. So I'm not really sold on this analysis of what is going on, or convinced that this is the correct fix.
I don't doubt that you are seeing this issue, but I run squid myself (on a machine that undergoes netchange on occasion) and have never seen it. So it's not as simple as the current config being just plain wrong. Could be a bug in daemondo, maybe a race condition with the pidfile?