Project

General

Profile

Bug #988

vm158 (libreoffice.org) crashes irregularly

Added by Florian Effenberger almost 7 years ago. Updated over 6 years ago.

Status:
Rejected
Priority:
Immediate
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Tags:
URL:

Description

vm158 (libreoffice.org) crashes irregularly

#1

Updated by Florian Effenberger almost 7 years ago

Some ideas to debug:

  • investigate top.log after crash to see if there are performance issues -> update: done
  • do a daily reboot via cron to see if it's a memory leak
  • have monitoring in place to at least notify, ideally act when something happens -> update: done, nothing suspicious it seems
#2

Updated by Florian Effenberger almost 7 years ago

Some more ideas from the admin meeting:

  • apply the setterm script to ensure screen gets not blanked -> update: done
  • set the kernel panic reboot timer via sysctl -> update: done
#3

Updated by Florian Effenberger almost 7 years ago

top.log shortly after crash has been saved to ~floeff/top.log and ~floeff/top.log2, here are the extracts shortly before the crash:

top - 21:43:02 up 3 days,  1:52,  0 users,  load average: 6.75, 7.79, 11.17
Tasks: 264 total,   8 running, 256 sleeping,   0 stopped,   0 zombie
%Cpu(s): 33.9 us,  3.6 sy,  0.0 ni, 61.1 id,  1.1 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem:   4046684 total,  3122056 used,   924628 free,    39120 buffers
KiB Swap:  4190204 total,   234668 used,  3955536 free.  1341388 cached Mem
top - 18:52:01 up 6 days, 19:26,  0 users,  load average: 9.04, 9.19, 9.31
Tasks: 244 total,   9 running, 235 sleeping,   0 stopped,   0 zombie
%Cpu(s): 39.2 us,  3.2 sy,  0.0 ni, 55.2 id,  2.2 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem:   4046684 total,  3603176 used,   443508 free,    67532 buffers
KiB Swap:  4190204 total,   238584 used,  3951620 free.  1656852 cached Mem

Maybe a memory leak? However, 20 minutes before the crash it looks similar:

top - 18:27:01 up 6 days, 19:01,  0 users,  load average: 7.93, 9.09, 9.98
Tasks: 245 total,  14 running, 231 sleeping,   0 stopped,   0 zombie
%Cpu(s): 39.1 us,  3.2 sy,  0.0 ni, 55.2 id,  2.2 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem:   4046684 total,  3563504 used,   483180 free,    65836 buffers
KiB Swap:  4190204 total,   238644 used,  3951560 free.  1625924 cached Mem
#4

Updated by Florian Effenberger almost 7 years ago

The syslog also looks rather unsusipcious:

Jan 13 21:23:01 vm158 CRON[6764]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:21:06 vm158 snmpd[5675]: message repeated 14 times: [ Connection from UDP: [redacted]:49320->[redacted]:161]
Jan 13 21:23:42 vm158 snmpd[5675]: Connection from UDP: [redacted]:54393->[redacted]:161
Jan 13 21:25:22 vm158 snmpd[5675]: Connection from UDP: [redacted]:35929->[redacted]:161
Jan 13 21:25:36 vm158 kernel: [264928.814514] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=40 TOS=0x00 PREC=0x00 TTL=240 ID=54321 PROTO=TCP SPT=53624 DPT=49154 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 13 21:26:03 vm158 CRON[7079]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:26:57 vm158 kernel: [265009.811573] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=61676 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 13 21:27:00 vm158 kernel: [265012.778292] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=61956 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 13 21:27:01 vm158 CRON[7164]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:27:06 vm158 kernel: [265018.806769] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=62510 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 13 21:28:01 vm158 CRON[7327]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:30:01 vm158 CRON[7560]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:30:16 vm158 kernel: [265208.860042] Peer redacted:49503/80 unexpectedly shrunk window 3219335626:3219373738 (repaired)
Jan 13 21:30:16 vm158 kernel: [265209.060112] Peer redacted:49491/80 unexpectedly shrunk window 4084028138:4084040474 (repaired)
Jan 13 21:30:17 vm158 kernel: [265209.372073] Peer redacted:49506/80 unexpectedly shrunk window 2495513151:2495537808 (repaired)
Jan 13 21:31:01 vm158 CRON[7642]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:25:22 vm158 snmpd[5675]: message repeated 24 times: [ Connection from UDP: [redacted]:35929->[redacted]:161]
Jan 13 21:31:06 vm158 snmpd[5675]: Connection from UDP: [redacted]:43382->[redacted]:161
Jan 13 21:32:01 vm158 CRON[7696]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:32:32 vm158 kernel: [265344.867580] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42483 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:32:35 vm158 kernel: [265347.866534] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42484 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:32:41 vm158 kernel: [265353.866495] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42485 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:33:01 vm158 CRON[7855]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:33:15 vm158 kernel: [265387.820707] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=58 ID=8368 DF PROTO=TCP SPT=39069 DPT=10000 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 21:31:06 vm158 snmpd[5675]: message repeated 14 times: [ Connection from UDP: [redacted]:43382->[redacted]:161]
Jan 13 21:33:42 vm158 snmpd[5675]: Connection from UDP: [redacted]:48723->[redacted]:161
Jan 13 21:34:35 vm158 kernel: [265468.320399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8188 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:34:38 vm158 kernel: [265471.328399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8195 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:34:44 vm158 kernel: [265477.328704] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8227 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:35:01 vm158 CRON[8078]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:35:22 vm158 snmpd[5675]: Connection from UDP: [redacted]:34822->[redacted]:161
Jan 13 21:36:01 vm158 CRON[8167]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:37:01 vm158 CRON[8286]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:38:01 vm158 CRON[8365]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:39:01 vm158 CRON[8426]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Jan 13 21:39:01 vm158 CRON[8428]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:40:01 vm158 CRON[8561]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:41:01 vm158 CRON[8673]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:35:22 vm158 snmpd[5675]: message repeated 24 times: [ Connection from UDP: [redacted]:34822->[redacted]:161]
Jan 13 21:41:06 vm158 snmpd[5675]: Connection from UDP: [redacted]:54895->[redacted]:161
Jan 13 21:42:01 vm158 CRON[8787]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:43:01 vm158 CRON[8851]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)

--- crash occured here ---

Jan 13 22:38:18 vm158 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="740" x-info="http://www.rsyslog.com"] start
Jan 13 22:38:18 vm158 rsyslogd: cannot create '/var/spool/postfix/dev/log': No such file or directory
Jan 13 22:38:18 vm158 rsyslogd: rsyslogd's groupid changed to 104
Jan 13 22:38:18 vm158 rsyslogd: rsyslogd's userid changed to 101

So does the kern.log:

Jan 13 21:30:16 vm158 kernel: [265208.860042] Peer redacted:49503/80 unexpectedly shrunk window 3219335626:3219373738 (repaired)
Jan 13 21:30:16 vm158 kernel: [265209.060112] Peer redacted:49491/80 unexpectedly shrunk window 4084028138:4084040474 (repaired)
Jan 13 21:30:17 vm158 kernel: [265209.372073] Peer redacted:49506/80 unexpectedly shrunk window 2495513151:2495537808 (repaired)
Jan 13 21:32:32 vm158 kernel: [265344.867580] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42483 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:32:35 vm158 kernel: [265347.866534] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42484 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:32:41 vm158 kernel: [265353.866495] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42485 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:33:15 vm158 kernel: [265387.820707] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=58 ID=8368 DF PROTO=TCP SPT=39069 DPT=10000 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 21:34:35 vm158 kernel: [265468.320399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8188 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:34:38 vm158 kernel: [265471.328399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8195 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:34:44 vm158 kernel: [265477.328704] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8227 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0

--- crash occured here ---

Jan 13 22:38:18 vm158 kernel: [    0.000000] Initializing cgroup subsys cpuset

Similar for auth.log:

Jan 13 21:38:01 vm158 CRON[8364]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:38:01 vm158 CRON[8364]: pam_unix(cron:session): session closed for user root
Jan 13 21:39:01 vm158 CRON[8424]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:39:01 vm158 CRON[8425]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:39:02 vm158 CRON[8425]: pam_unix(cron:session): session closed for user root
Jan 13 21:39:02 vm158 CRON[8424]: pam_unix(cron:session): session closed for user root
Jan 13 21:40:01 vm158 CRON[8560]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:40:01 vm158 CRON[8560]: pam_unix(cron:session): session closed for user root
Jan 13 21:41:01 vm158 CRON[8672]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:41:01 vm158 CRON[8672]: pam_unix(cron:session): session closed for user root
Jan 13 21:42:01 vm158 CRON[8786]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:42:01 vm158 CRON[8786]: pam_unix(cron:session): session closed for user root
Jan 13 21:43:01 vm158 CRON[8850]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:43:02 vm158 CRON[8850]: pam_unix(cron:session): session closed for user root

--- crash occured here ---

Jan 13 22:38:18 vm158 systemd-logind[1014]: New seat seat0.
Jan 13 22:38:18 vm158 systemd-logind[1014]: Watching system buttons on /dev/input/event0 (Power Button)

dmesg.log doesn't show anything suspicious either:

[  136.672661] ip_tables: (C) 2000-2006 Netfilter Core Team
[  152.682193] Netfilter messages via NETLINK v0.30.
[  152.767856] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[  153.415271] xt_time: kernel timezone is +0100
[  153.477146] ctnetlink v0.93: registering with nfnetlink.
[  153.589937] ip_set: protocol 6
[  153.623677] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully
[  153.956382] ipt_ULOG: ULOG is deprecated and it will be removed soon, use NFLOG instead
[  155.378975] u32 classifier
[  155.378980]     Actions configured
#5

Updated by Christian Lohmaier almost 7 years ago

Silverstripe's log shows:

[13-Jan-2015 20:35:14] Warning at framework/model/MySQLDatabase.php line 68: mysqli::mysqli(): (HY000/2013): Lost connection to MySQL server at 'reading authorization packet', system error: 104 (http://de.libreoffice.org/download/libreoffice-fresh/)
[13-Jan-2015 20:35:15] Error at framework/model/MySQLDatabase.php line 598: Couldn't connect to MySQL database | Lost connection to MySQL server at 'reading authorization packet', system error: 104 (http://de.libreoffice.org/download/libreoffice-fresh/)

c
Strange thing is that it is only printed once - usually when mysql dies there are tons of those.

#6

Updated by Alexander Werner almost 7 years ago

Monitoring of vm158 was expanded, for current state see:
https://monitoring.documentfoundation.org/tdf/thruk/#cgi-bin/status.cgi?host=vm158.documentfoundation.org
For historical data and graphs have a look at (choose the services in the list on the right):
https://monitoring.documentfoundation.org/tdf/pnp4nagios/index.php/graph?host=vm158.documentfoundation.org

#7

Updated by Florian Effenberger almost 7 years ago

  • Status changed from New to In Progress

Extensive monitoring didn't yield any issues either, but it might be related to one of these:

Options:

  1. migrate to Debian 7 => had problems with PHP and nginx
  2. go with kernel PPA and Ubuntu 14.04
  3. downgrade to Ubuntu 12.04 with trusty LTS enablement stack

Variant #3 is combination that is working and should have most impact, so going with this first, to see if it helps. Later on, option #2 can be tried, as we would likely go with 14.04 in the end.

#8

Updated by Christian Lohmaier over 6 years ago

vm157 is setup and runs test version of silverstripe on 12.04

what I noticed when trying to get it to run after installing the necessary packages:
silverstripe's php failed to launch, it requested tons of memory (gave up after rising the limit to 512MB and still not being enough) - turns out that trying to read some files in webroot/silverstripe-cache triggers a bug/leak in the version of php that causes it to fail.

There also were old files in the cache directory in the current live instance, so I deleted the directory there as well.

(default when it doesn't exist in the webroot is to create it in /tmp, so that's where the cache files are now)

Might be completely unrelated to the problem of the server just not responding anymore, but a pitfall nevertheless.

#9

Updated by Christian Lohmaier over 6 years ago

the proxying to the 12.x based install showed that that one suffers from the segfaulting php processes just like the debian based VM did. So not really a solution either.

Database on the 14.04 host got damaged, resulting in mysql crashing frequently when trying to update the download lists - did uninstall and reinstall mysql and reimported fresh dumps... will undo the proxying in ~1h or so.

#10

Updated by Florian Effenberger over 6 years ago

Regular (daily) reboots of the VM seem to help indeed, which make it look like some buffer is running full, and thus it's VM guest problem, not a VM host one
Alex proposed to install a newer kernel from the backports repository

#11

Updated by Florian Effenberger over 6 years ago

  • Status changed from In Progress to Rejected

Rejecting due to change of platform

Also available in: Atom PDF