Project

General

Profile

Task #1969

drives dropping out of raid on berta and antares with errors despite no problem with the hd itself

Added by Christian Lohmaier over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Low
Category:
Backups
Target version:
-
Start date:
Due date:
% Done:

70%

Estimated time:
Tags:
Salt
URL:

Description

when the machines are under load, drives keep dropping off the raid. maybe controller bug or something that can be fixed with a more relaxed timeout somewhere...

History

#2 Updated by Christian Lohmaier over 1 year ago

additionally:
  • possible conflicts in bios-settings re software/hw raid
  • controller firmware bug?

#3 Updated by Christian Lohmaier over 1 year ago

changing timeout did not fix stuff, neither did setting queue to 31.

What does help (at least it seems like), is disabling the drive queue/setting it to 1

#4 Updated by Christian Lohmaier over 1 year ago

  • Category set to Backups
  • Priority changed from Immediate to Low
  • % Done changed from 0 to 70
  • Tags Salt added

setting prio to low, as workaround with the nqueue has been set in place and that seems to successfully workaround the problem.
What is remaining is to either salt that configuration change or document the requirement as manual instructions.

(Feel free to close if that manual setup step is enough)

  • install sysfsutils
  • add entries for all block devices to /etc/sysfs.conf
    block/sda/device/queue_depth = 1
    block/sdb/device/queue_depth = 1
    block/sdc/device/queue_depth = 1
    […]
    

(did this manually on both berta and antares)

Also available in: Atom PDF