Project

General

Profile

Actions

Bug #988

closed

vm158 (libreoffice.org) crashes irregularly

Added by Florian Effenberger about 10 years ago. Updated about 10 years ago.

Status:
Rejected
Priority:
Immediate
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Tags:

Description

vm158 (libreoffice.org) crashes irregularly

Actions #1

Updated by Florian Effenberger about 10 years ago

Some ideas to debug:

  • investigate top.log after crash to see if there are performance issues -> update: done
  • do a daily reboot via cron to see if it's a memory leak
  • have monitoring in place to at least notify, ideally act when something happens -> update: done, nothing suspicious it seems
Actions #2

Updated by Florian Effenberger about 10 years ago

Some more ideas from the admin meeting:

  • apply the setterm script to ensure screen gets not blanked -> update: done
  • set the kernel panic reboot timer via sysctl -> update: done
Actions #3

Updated by Florian Effenberger about 10 years ago

top.log shortly after crash has been saved to ~floeff/top.log and ~floeff/top.log2, here are the extracts shortly before the crash:

top - 21:43:02 up 3 days,  1:52,  0 users,  load average: 6.75, 7.79, 11.17
Tasks: 264 total,   8 running, 256 sleeping,   0 stopped,   0 zombie
%Cpu(s): 33.9 us,  3.6 sy,  0.0 ni, 61.1 id,  1.1 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem:   4046684 total,  3122056 used,   924628 free,    39120 buffers
KiB Swap:  4190204 total,   234668 used,  3955536 free.  1341388 cached Mem
top - 18:52:01 up 6 days, 19:26,  0 users,  load average: 9.04, 9.19, 9.31
Tasks: 244 total,   9 running, 235 sleeping,   0 stopped,   0 zombie
%Cpu(s): 39.2 us,  3.2 sy,  0.0 ni, 55.2 id,  2.2 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem:   4046684 total,  3603176 used,   443508 free,    67532 buffers
KiB Swap:  4190204 total,   238584 used,  3951620 free.  1656852 cached Mem

Maybe a memory leak? However, 20 minutes before the crash it looks similar:

top - 18:27:01 up 6 days, 19:01,  0 users,  load average: 7.93, 9.09, 9.98
Tasks: 245 total,  14 running, 231 sleeping,   0 stopped,   0 zombie
%Cpu(s): 39.1 us,  3.2 sy,  0.0 ni, 55.2 id,  2.2 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem:   4046684 total,  3563504 used,   483180 free,    65836 buffers
KiB Swap:  4190204 total,   238644 used,  3951560 free.  1625924 cached Mem
Actions #4

Updated by Florian Effenberger about 10 years ago

The syslog also looks rather unsusipcious:

Jan 13 21:23:01 vm158 CRON[6764]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:21:06 vm158 snmpd[5675]: message repeated 14 times: [ Connection from UDP: [redacted]:49320->[redacted]:161]
Jan 13 21:23:42 vm158 snmpd[5675]: Connection from UDP: [redacted]:54393->[redacted]:161
Jan 13 21:25:22 vm158 snmpd[5675]: Connection from UDP: [redacted]:35929->[redacted]:161
Jan 13 21:25:36 vm158 kernel: [264928.814514] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=40 TOS=0x00 PREC=0x00 TTL=240 ID=54321 PROTO=TCP SPT=53624 DPT=49154 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 13 21:26:03 vm158 CRON[7079]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:26:57 vm158 kernel: [265009.811573] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=61676 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 13 21:27:00 vm158 kernel: [265012.778292] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=61956 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 13 21:27:01 vm158 CRON[7164]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:27:06 vm158 kernel: [265018.806769] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=52 TOS=0x00 PREC=0x00 TTL=50 ID=62510 DF PROTO=TCP SPT=61808 DPT=25 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 13 21:28:01 vm158 CRON[7327]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:30:01 vm158 CRON[7560]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:30:16 vm158 kernel: [265208.860042] Peer redacted:49503/80 unexpectedly shrunk window 3219335626:3219373738 (repaired)
Jan 13 21:30:16 vm158 kernel: [265209.060112] Peer redacted:49491/80 unexpectedly shrunk window 4084028138:4084040474 (repaired)
Jan 13 21:30:17 vm158 kernel: [265209.372073] Peer redacted:49506/80 unexpectedly shrunk window 2495513151:2495537808 (repaired)
Jan 13 21:31:01 vm158 CRON[7642]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:25:22 vm158 snmpd[5675]: message repeated 24 times: [ Connection from UDP: [redacted]:35929->[redacted]:161]
Jan 13 21:31:06 vm158 snmpd[5675]: Connection from UDP: [redacted]:43382->[redacted]:161
Jan 13 21:32:01 vm158 CRON[7696]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:32:32 vm158 kernel: [265344.867580] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42483 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:32:35 vm158 kernel: [265347.866534] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42484 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:32:41 vm158 kernel: [265353.866495] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=50 ID=42485 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:33:01 vm158 CRON[7855]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:33:15 vm158 kernel: [265387.820707] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=60 TOS=0x00 PREC=0x00 TTL=58 ID=8368 DF PROTO=TCP SPT=39069 DPT=10000 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 21:31:06 vm158 snmpd[5675]: message repeated 14 times: [ Connection from UDP: [redacted]:43382->[redacted]:161]
Jan 13 21:33:42 vm158 snmpd[5675]: Connection from UDP: [redacted]:48723->[redacted]:161
Jan 13 21:34:35 vm158 kernel: [265468.320399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8188 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:34:38 vm158 kernel: [265471.328399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8195 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:34:44 vm158 kernel: [265477.328704] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=8227 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:35:01 vm158 CRON[8078]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:35:22 vm158 snmpd[5675]: Connection from UDP: [redacted]:34822->[redacted]:161
Jan 13 21:36:01 vm158 CRON[8167]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:37:01 vm158 CRON[8286]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:38:01 vm158 CRON[8365]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:39:01 vm158 CRON[8426]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Jan 13 21:39:01 vm158 CRON[8428]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:40:01 vm158 CRON[8561]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:41:01 vm158 CRON[8673]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:35:22 vm158 snmpd[5675]: message repeated 24 times: [ Connection from UDP: [redacted]:34822->[redacted]:161]
Jan 13 21:41:06 vm158 snmpd[5675]: Connection from UDP: [redacted]:54895->[redacted]:161
Jan 13 21:42:01 vm158 CRON[8787]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)
Jan 13 21:43:01 vm158 CRON[8851]: (root) CMD (( date; COLUMNS=156 top -b -n 1 2>&1 ) >> /var/log/top.log)

--- crash occured here ---

Jan 13 22:38:18 vm158 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="740" x-info="http://www.rsyslog.com"] start
Jan 13 22:38:18 vm158 rsyslogd: cannot create '/var/spool/postfix/dev/log': No such file or directory
Jan 13 22:38:18 vm158 rsyslogd: rsyslogd's groupid changed to 104
Jan 13 22:38:18 vm158 rsyslogd: rsyslogd's userid changed to 101

So does the kern.log:

Jan 13 21:30:16 vm158 kernel: [265208.860042] Peer redacted:49503/80 unexpectedly shrunk window 3219335626:3219373738 (repaired)
Jan 13 21:30:16 vm158 kernel: [265209.060112] Peer redacted:49491/80 unexpectedly shrunk window 4084028138:4084040474 (repaired)
Jan 13 21:30:17 vm158 kernel: [265209.372073] Peer redacted:49506/80 unexpectedly shrunk window 2495513151:2495537808 (repaired)
Jan 13 21:32:32 vm158 kernel: [265344.867580] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42483 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:32:35 vm158 kernel: [265347.866534] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42484 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:32:41 vm158 kernel: [265353.866495] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=50 ID=42485 DF PROTO=TCP SPT=35533 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0
Jan 13 21:33:15 vm158 kernel: [265387.820707] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=60 TOS=0x00 PREC=0x00 TTL=58 ID=8368 DF PROTO=TCP SPT=39069 DPT=10000 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 21:34:35 vm158 kernel: [265468.320399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8188 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:34:38 vm158 kernel: [265471.328399] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8195 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0
Jan 13 21:34:44 vm158 kernel: [265477.328704] Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=redacted SRC=redacted DST=redacted=48 TOS=0x00 PREC=0x00 TTL=121 ID=8227 DF PROTO=TCP SPT=58320 DPT=25 WINDOW=8192 RES=0x00 SYN URGP=0

--- crash occured here ---

Jan 13 22:38:18 vm158 kernel: [    0.000000] Initializing cgroup subsys cpuset

Similar for auth.log:

Jan 13 21:38:01 vm158 CRON[8364]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:38:01 vm158 CRON[8364]: pam_unix(cron:session): session closed for user root
Jan 13 21:39:01 vm158 CRON[8424]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:39:01 vm158 CRON[8425]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:39:02 vm158 CRON[8425]: pam_unix(cron:session): session closed for user root
Jan 13 21:39:02 vm158 CRON[8424]: pam_unix(cron:session): session closed for user root
Jan 13 21:40:01 vm158 CRON[8560]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:40:01 vm158 CRON[8560]: pam_unix(cron:session): session closed for user root
Jan 13 21:41:01 vm158 CRON[8672]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:41:01 vm158 CRON[8672]: pam_unix(cron:session): session closed for user root
Jan 13 21:42:01 vm158 CRON[8786]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:42:01 vm158 CRON[8786]: pam_unix(cron:session): session closed for user root
Jan 13 21:43:01 vm158 CRON[8850]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 13 21:43:02 vm158 CRON[8850]: pam_unix(cron:session): session closed for user root

--- crash occured here ---

Jan 13 22:38:18 vm158 systemd-logind[1014]: New seat seat0.
Jan 13 22:38:18 vm158 systemd-logind[1014]: Watching system buttons on /dev/input/event0 (Power Button)

dmesg.log doesn't show anything suspicious either:

[  136.672661] ip_tables: (C) 2000-2006 Netfilter Core Team
[  152.682193] Netfilter messages via NETLINK v0.30.
[  152.767856] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[  153.415271] xt_time: kernel timezone is +0100
[  153.477146] ctnetlink v0.93: registering with nfnetlink.
[  153.589937] ip_set: protocol 6
[  153.623677] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully
[  153.956382] ipt_ULOG: ULOG is deprecated and it will be removed soon, use NFLOG instead
[  155.378975] u32 classifier
[  155.378980]     Actions configured
Actions #5

Updated by Christian Lohmaier about 10 years ago

Silverstripe's log shows:

[13-Jan-2015 20:35:14] Warning at framework/model/MySQLDatabase.php line 68: mysqli::mysqli(): (HY000/2013): Lost connection to MySQL server at 'reading authorization packet', system error: 104 (http://de.libreoffice.org/download/libreoffice-fresh/)
[13-Jan-2015 20:35:15] Error at framework/model/MySQLDatabase.php line 598: Couldn't connect to MySQL database | Lost connection to MySQL server at 'reading authorization packet', system error: 104 (http://de.libreoffice.org/download/libreoffice-fresh/)

c
Strange thing is that it is only printed once - usually when mysql dies there are tons of those.

Actions #6

Updated by Alexander Werner about 10 years ago

Monitoring of vm158 was expanded, for current state see:
https://monitoring.documentfoundation.org/tdf/thruk/#cgi-bin/status.cgi?host=vm158.documentfoundation.org
For historical data and graphs have a look at (choose the services in the list on the right):
https://monitoring.documentfoundation.org/tdf/pnp4nagios/index.php/graph?host=vm158.documentfoundation.org

Actions #7

Updated by Florian Effenberger about 10 years ago

  • Status changed from New to In Progress

Extensive monitoring didn't yield any issues either, but it might be related to one of these:

Options:

  1. migrate to Debian 7 => had problems with PHP and nginx
  2. go with kernel PPA and Ubuntu 14.04
  3. downgrade to Ubuntu 12.04 with trusty LTS enablement stack

Variant #3 is combination that is working and should have most impact, so going with this first, to see if it helps. Later on, option #2 can be tried, as we would likely go with 14.04 in the end.

Actions #8

Updated by Christian Lohmaier about 10 years ago

vm157 is setup and runs test version of silverstripe on 12.04

what I noticed when trying to get it to run after installing the necessary packages:
silverstripe's php failed to launch, it requested tons of memory (gave up after rising the limit to 512MB and still not being enough) - turns out that trying to read some files in webroot/silverstripe-cache triggers a bug/leak in the version of php that causes it to fail.

There also were old files in the cache directory in the current live instance, so I deleted the directory there as well.

(default when it doesn't exist in the webroot is to create it in /tmp, so that's where the cache files are now)

Might be completely unrelated to the problem of the server just not responding anymore, but a pitfall nevertheless.

Actions #9

Updated by Christian Lohmaier about 10 years ago

the proxying to the 12.x based install showed that that one suffers from the segfaulting php processes just like the debian based VM did. So not really a solution either.

Database on the 14.04 host got damaged, resulting in mysql crashing frequently when trying to update the download lists - did uninstall and reinstall mysql and reimported fresh dumps... will undo the proxying in ~1h or so.

Actions #10

Updated by Florian Effenberger about 10 years ago

Regular (daily) reboots of the VM seem to help indeed, which make it look like some buffer is running full, and thus it's VM guest problem, not a VM host one
Alex proposed to install a newer kernel from the backports repository

Actions #11

Updated by Florian Effenberger about 10 years ago

  • Status changed from In Progress to Rejected

Rejecting due to change of platform

Actions

Also available in: Atom PDF