Bug #988
closedvm158 (libreoffice.org) crashes irregularly
Added by Florian Effenberger about 9 years ago. Updated about 9 years ago.
0%
Description
vm158 (libreoffice.org) crashes irregularly
Updated by Florian Effenberger about 9 years ago
Some ideas to debug:
investigate top.log after crash to see if there are performance issues-> update: done- do a daily reboot via cron to see if it's a memory leak
have monitoring in place to at least notify, ideally act when something happens-> update: done, nothing suspicious it seems
Updated by Florian Effenberger about 9 years ago
Some more ideas from the admin meeting:
apply the setterm script to ensure screen gets not blanked-> update: doneset the kernel panic reboot timer via sysctl-> update: done
Updated by Florian Effenberger about 9 years ago
top.log shortly after crash has been saved to ~floeff/top.log and ~floeff/top.log2, here are the extracts shortly before the crash:
top - 21:43:02 up 3 days, 1:52, 0 users, load average: 6.75, 7.79, 11.17 Tasks: 264 total, 8 running, 256 sleeping, 0 stopped, 0 zombie %Cpu(s): 33.9 us, 3.6 sy, 0.0 ni, 61.1 id, 1.1 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem: 4046684 total, 3122056 used, 924628 free, 39120 buffers KiB Swap: 4190204 total, 234668 used, 3955536 free. 1341388 cached Mem
top - 18:52:01 up 6 days, 19:26, 0 users, load average: 9.04, 9.19, 9.31 Tasks: 244 total, 9 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s): 39.2 us, 3.2 sy, 0.0 ni, 55.2 id, 2.2 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem: 4046684 total, 3603176 used, 443508 free, 67532 buffers KiB Swap: 4190204 total, 238584 used, 3951620 free. 1656852 cached Mem
Maybe a memory leak? However, 20 minutes before the crash it looks similar:
top - 18:27:01 up 6 days, 19:01, 0 users, load average: 7.93, 9.09, 9.98 Tasks: 245 total, 14 running, 231 sleeping, 0 stopped, 0 zombie %Cpu(s): 39.1 us, 3.2 sy, 0.0 ni, 55.2 id, 2.2 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem: 4046684 total, 3563504 used, 483180 free, 65836 buffers KiB Swap: 4190204 total, 238644 used, 3951560 free. 1625924 cached Mem
Updated by Florian Effenberger about 9 years ago
The syslog also looks rather unsusipcious:
Jan 13 21:23:01 vm158 CRON[6764]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:21:06 vm158 snmpd[5675]: message repeated 14 times: [ Connection from UDP: [redacted]:49320->[redacted]:161] Jan 13 21:23:42 vm158 snmpd[5675]: Connection from UDP: [redacted]:54393->[redacted]:161 Jan 13 21:25:22 vm158 snmpd[5675]: Connection from UDP: [redacted]:35929->[redacted]:161 Jan 13 21:25:36 vm158 kernel: [264928.814514] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=40 TOS=0x00 PREC=0x00 TTL=240 ID=54321 PROTO=TCP SPT=53624 DPT=49154 WINDOW=65535 RES=0x00 SYN URGP=0 Jan 13 21:26:03 vm158 CRON[7079]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:26:57 vm158 kernel: [265009.811573] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=61676 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0 Jan 13 21:27:00 vm158 kernel: [265012.778292] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=61956 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0 Jan 13 21:27:01 vm158 CRON[7164]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:27:06 vm158 kernel: [265018.806769] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=62510 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0 Jan 13 21:28:01 vm158 CRON[7327]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:30:01 vm158 CRON[7560]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:30:16 vm158 kernel: [265208.860042] Peer redacted:49503/80 unexpectedly shrunk window 3219335626:3219373738 (repaired) Jan 13 21:30:16 vm158 kernel: [265209.060112] Peer redacted:49491/80 unexpectedly shrunk window 4084028138:4084040474 (repaired) Jan 13 21:30:17 vm158 kernel: [265209.372073] Peer redacted:49506/80 unexpectedly shrunk window 2495513151:2495537808 (repaired) Jan 13 21:31:01 vm158 CRON[7642]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:25:22 vm158 snmpd[5675]: message repeated 24 times: [ Connection from UDP: [redacted]:35929->[redacted]:161] Jan 13 21:31:06 vm158 snmpd[5675]: Connection from UDP: [redacted]:43382->[redacted]:161 Jan 13 21:32:01 vm158 CRON[7696]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:32:32 vm158 kernel: [265344.867580] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42483 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 Jan 13 21:32:35 vm158 kernel: [265347.866534] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42484 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 Jan 13 21:32:41 vm158 kernel: [265353.866495] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42485 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 Jan 13 21:33:01 vm158 CRON[7855]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:33:15 vm158 kernel: [265387.820707] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=58 ID=8368 DF PROTO=TCP SPT=39069 DPT=10000 WINDOW=5840 RES=0x00 SYN URGP=0 Jan 13 21:31:06 vm158 snmpd[5675]: message repeated 14 times: [ Connection from UDP: [redacted]:43382->[redacted]:161] Jan 13 21:33:42 vm158 snmpd[5675]: Connection from UDP: [redacted]:48723->[redacted]:161 Jan 13 21:34:35 vm158 kernel: [265468.320399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8188 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0 Jan 13 21:34:38 vm158 kernel: [265471.328399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8195 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0 Jan 13 21:34:44 vm158 kernel: [265477.328704] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8227 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0 Jan 13 21:35:01 vm158 CRON[8078]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:35:22 vm158 snmpd[5675]: Connection from UDP: [redacted]:34822->[redacted]:161 Jan 13 21:36:01 vm158 CRON[8167]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:37:01 vm158 CRON[8286]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:38:01 vm158 CRON[8365]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:39:01 vm158 CRON[8426]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime)) Jan 13 21:39:01 vm158 CRON[8428]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:40:01 vm158 CRON[8561]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:41:01 vm158 CRON[8673]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:35:22 vm158 snmpd[5675]: message repeated 24 times: [ Connection from UDP: [redacted]:34822->[redacted]:161] Jan 13 21:41:06 vm158 snmpd[5675]: Connection from UDP: [redacted]:54895->[redacted]:161 Jan 13 21:42:01 vm158 CRON[8787]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) Jan 13 21:43:01 vm158 CRON[8851]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log) --- crash occured here --- Jan 13 22:38:18 vm158 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="740" x-info="http://www.rsyslog.com"] start Jan 13 22:38:18 vm158 rsyslogd: cannot create '/var/spool/postfix/dev/log': No such file or directory Jan 13 22:38:18 vm158 rsyslogd: rsyslogd's groupid changed to 104 Jan 13 22:38:18 vm158 rsyslogd: rsyslogd's userid changed to 101
So does the kern.log:
Jan 13 21:30:16 vm158 kernel: [265208.860042] Peer redacted:49503/80 unexpectedly shrunk window 3219335626:3219373738 (repaired) Jan 13 21:30:16 vm158 kernel: [265209.060112] Peer redacted:49491/80 unexpectedly shrunk window 4084028138:4084040474 (repaired) Jan 13 21:30:17 vm158 kernel: [265209.372073] Peer redacted:49506/80 unexpectedly shrunk window 2495513151:2495537808 (repaired) Jan 13 21:32:32 vm158 kernel: [265344.867580] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42483 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 Jan 13 21:32:35 vm158 kernel: [265347.866534] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42484 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 Jan 13 21:32:41 vm158 kernel: [265353.866495] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42485 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 Jan 13 21:33:15 vm158 kernel: [265387.820707] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=58 ID=8368 DF PROTO=TCP SPT=39069 DPT=10000 WINDOW=5840 RES=0x00 SYN URGP=0 Jan 13 21:34:35 vm158 kernel: [265468.320399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8188 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0 Jan 13 21:34:38 vm158 kernel: [265471.328399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8195 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0 Jan 13 21:34:44 vm158 kernel: [265477.328704] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8227 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0 --- crash occured here --- Jan 13 22:38:18 vm158 kernel: [ 0.000000] Initializing cgroup subsys cpuset
Similar for auth.log:
Jan 13 21:38:01 vm158 CRON[8364]: pam_unix(cron:session): session opened for user root by (uid=0) Jan 13 21:38:01 vm158 CRON[8364]: pam_unix(cron:session): session closed for user root Jan 13 21:39:01 vm158 CRON[8424]: pam_unix(cron:session): session opened for user root by (uid=0) Jan 13 21:39:01 vm158 CRON[8425]: pam_unix(cron:session): session opened for user root by (uid=0) Jan 13 21:39:02 vm158 CRON[8425]: pam_unix(cron:session): session closed for user root Jan 13 21:39:02 vm158 CRON[8424]: pam_unix(cron:session): session closed for user root Jan 13 21:40:01 vm158 CRON[8560]: pam_unix(cron:session): session opened for user root by (uid=0) Jan 13 21:40:01 vm158 CRON[8560]: pam_unix(cron:session): session closed for user root Jan 13 21:41:01 vm158 CRON[8672]: pam_unix(cron:session): session opened for user root by (uid=0) Jan 13 21:41:01 vm158 CRON[8672]: pam_unix(cron:session): session closed for user root Jan 13 21:42:01 vm158 CRON[8786]: pam_unix(cron:session): session opened for user root by (uid=0) Jan 13 21:42:01 vm158 CRON[8786]: pam_unix(cron:session): session closed for user root Jan 13 21:43:01 vm158 CRON[8850]: pam_unix(cron:session): session opened for user root by (uid=0) Jan 13 21:43:02 vm158 CRON[8850]: pam_unix(cron:session): session closed for user root --- crash occured here --- Jan 13 22:38:18 vm158 systemd-logind[1014]: New seat seat0. Jan 13 22:38:18 vm158 systemd-logind[1014]: Watching system buttons on /dev/input/event0 (Power Button)
dmesg.log doesn't show anything suspicious either:
[ 136.672661] ip_tables: (C) 2000-2006 Netfilter Core Team [ 152.682193] Netfilter messages via NETLINK v0.30. [ 152.767856] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 153.415271] xt_time: kernel timezone is +0100 [ 153.477146] ctnetlink v0.93: registering with nfnetlink. [ 153.589937] ip_set: protocol 6 [ 153.623677] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully [ 153.956382] ipt_ULOG: ULOG is deprecated and it will be removed soon, use NFLOG instead [ 155.378975] u32 classifier [ 155.378980] Actions configured
Updated by Christian Lohmaier about 9 years ago
Silverstripe's log shows:
[13-Jan-2015 20:35:14] Warning at framework/model/MySQLDatabase.php line 68: mysqli::mysqli(): (HY000/2013): Lost connection to MySQL server at 'reading authorization packet', system error: 104 (http://de.libreoffice.org/download/libreoffice-fresh/) [13-Jan-2015 20:35:15] Error at framework/model/MySQLDatabase.php line 598: Couldn't connect to MySQL database | Lost connection to MySQL server at 'reading authorization packet', system error: 104 (http://de.libreoffice.org/download/libreoffice-fresh/)
c
Strange thing is that it is only printed once - usually when mysql dies there are tons of those.
Updated by Alexander Werner about 9 years ago
Monitoring of vm158 was expanded, for current state see:
https://monitoring.documentfoundation.org/tdf/thruk/#cgi-bin/status.cgi?host=vm158.documentfoundation.org
For historical data and graphs have a look at (choose the services in the list on the right):
https://monitoring.documentfoundation.org/tdf/pnp4nagios/index.php/graph?host=vm158.documentfoundation.org
Updated by Florian Effenberger about 9 years ago
- Status changed from New to In Progress
Extensive monitoring didn't yield any issues either, but it might be related to one of these:
- https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1333553
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1307473
Options:
- migrate to Debian 7 => had problems with PHP and nginx
- go with kernel PPA and Ubuntu 14.04
- downgrade to Ubuntu 12.04 with trusty LTS enablement stack
Variant #3 is combination that is working and should have most impact, so going with this first, to see if it helps. Later on, option #2 can be tried, as we would likely go with 14.04 in the end.
Updated by Christian Lohmaier about 9 years ago
vm157 is setup and runs test version of silverstripe on 12.04
what I noticed when trying to get it to run after installing the necessary packages:
silverstripe's php failed to launch, it requested tons of memory (gave up after rising the limit to 512MB and still not being enough) - turns out that trying to read some files in webroot/silverstripe-cache triggers a bug/leak in the version of php that causes it to fail.
There also were old files in the cache directory in the current live instance, so I deleted the directory there as well.
(default when it doesn't exist in the webroot is to create it in /tmp, so that's where the cache files are now)
Might be completely unrelated to the problem of the server just not responding anymore, but a pitfall nevertheless.
Updated by Christian Lohmaier about 9 years ago
the proxying to the 12.x based install showed that that one suffers from the segfaulting php processes just like the debian based VM did. So not really a solution either.
Database on the 14.04 host got damaged, resulting in mysql crashing frequently when trying to update the download lists - did uninstall and reinstall mysql and reimported fresh dumps... will undo the proxying in ~1h or so.
Updated by Florian Effenberger about 9 years ago
Regular (daily) reboots of the VM seem to help indeed, which make it look like some buffer is running full, and thus it's VM guest problem, not a VM host one
Alex proposed to install a newer kernel from the backports repository
Updated by Florian Effenberger about 9 years ago
- Status changed from In Progress to Rejected
Rejecting due to change of platform