drives dropping out of raid on berta and antares with errors despite no problem with the hd itself
when the machines are under load, drives keep dropping off the raid. maybe controller bug or something that can be fixed with a more relaxed timeout somewhere...
#1 Updated by Alexander Werner about 2 years ago
#2 Updated by Christian Lohmaier about 2 years ago
- possible conflicts in bios-settings re software/hw raid
- controller firmware bug?
#3 Updated by Christian Lohmaier about 2 years ago
changing timeout did not fix stuff, neither did setting queue to 31.
What does help (at least it seems like), is disabling the drive queue/setting it to 1
#4 Updated by Christian Lohmaier about 2 years ago
- Category set to Backups
- Priority changed from Immediate to Low
- % Done changed from 0 to 70
- Tags Salt added
setting prio to low, as workaround with the nqueue has been set in place and that seems to successfully workaround the problem.
What is remaining is to either salt that configuration change or document the requirement as manual instructions.
(Feel free to close if that manual setup step is enough)
- add entries for all block devices to
block/sda/device/queue_depth = 1 block/sdb/device/queue_depth = 1 block/sdc/device/queue_depth = 1 […]
(did this manually on both berta and antares)
#5 Updated by Alexander Werner about 2 years ago
- Status changed from New to Closed