Project

General

Profile

Actions

Bug #519

closed

rework monitoring

Added by Alexander Werner almost 10 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Tags:

Description

Most important:
  • integrate SMS gateway for important notices, making it work with e.g. Alarmbox
  • evaluate www.groupalarm.de as possible gateway if our existing one doesn't fit
  • change thresholds to lower e-mail noise
  • add falco as (new) host
  • find volunteers to be included into the notification
Other tasks:
  • Check not only if port is opened, but also for possibly correct content
  • Check which services and servers need to be added to the monitoring
  • monitor ownCloud public share
  • Etherpad status (see also #114)

Related issues

Related to Infrastructure - Feature #90: setup SMS monitoring systemRejectedAlexander Werner

Actions
Related to Infrastructure - Bug #697: Etherpad is downClosed

Actions
Blocked by Infrastructure - Support #517: Evaluate possible Cloud solutionsClosedAlexander Werner

Actions
Follows Infrastructure - Task #997: send follow-up e-mail after admin meetingRejectedAlexander Werner2015-03-08

Actions
Actions #1

Updated by Alexander Werner almost 10 years ago

  • Related to Feature #90: setup SMS monitoring system added
Actions #2

Updated by Alexander Werner over 9 years ago

  • Target version set to 1
Actions #3

Updated by Alexander Werner over 9 years ago

  • Blocked by Support #517: Evaluate possible Cloud solutions added
Actions #4

Updated by Alexander Werner over 9 years ago

  • Status changed from New to Feedback

Monitoring VM ordered at OVH.
Waiting for it to arrive.

Actions #5

Updated by Norbert Thiebaud over 9 years ago

for gerrit:
Monitor gerrit log-size. it rotate daily but typically when the current log (/opt/gerrit/gerrit_site/logs/error_log goes above 2M it is usually sign of trouble

also git-deamon-spanned process seem to get stuck sometimes, blocking resources... counting the number of there running could be a useful indicator that some clean-up is needed...

Actions #6

Updated by Florian Effenberger over 9 years ago

  • Due date set to 2014-12-19
  • Status changed from Feedback to In Progress

Hardware available

Actions #7

Updated by Dennis Roczek over 9 years ago

  • Related to Bug #697: Etherpad is down added
Actions #8

Updated by Florian Effenberger over 9 years ago

IMHO we made quite some good progress, that is, Alex and Robert made so, I was just lurking ;-)
What needs doing is setup the SMS gateway (besides Alex, Robert and me nobody volunteered to be in the monitoring, sadly), and the e-mail notification
Alex, can you reach out to all tdf-admin subscribers (sudo list_members tdf-admin on pumbaa) again to ask for the SMS monitoring, whether they want to be included
For e-mail, I'd simply add them without asking, as they are on the admin list anyways

Actions #9

Updated by Florian Effenberger over 9 years ago

  • Description updated (diff)
Actions #10

Updated by Florian Effenberger over 9 years ago

We should put the monitoring notifications live soon
For that to happen:

Alex, can you reach out to all tdf-admin subscribers (sudo list_members tdf-admin on pumbaa) again to ask for the SMS monitoring, whether they want to be included

Actions #11

Updated by Florian Effenberger over 9 years ago

  • Description updated (diff)
Actions #12

Updated by Alexander Werner over 9 years ago

  • Follows Task #997: send follow-up e-mail after admin meeting added
Actions #13

Updated by Florian Effenberger over 9 years ago

  • Due date changed from 2015-01-31 to 2015-02-28
  • Start date deleted (2015-01-31)

Work in progress, trying to cooperate with Robert and Alin on that, major part missing is SMS notification, which will gradually be worked on over time

Actions #14

Updated by Florian Effenberger about 9 years ago

  • Subject changed from Rework Monitoring to Rework monitoring
  • Due date changed from 2015-03-09 to 2015-03-13
  • Priority changed from Normal to High
  • Start date deleted (2015-03-09)

This item becomes more urgent after the recent infra story, so re-prioritizing and setting due date accordingly

Actions #15

Updated by Florian Effenberger about 9 years ago

  • Description updated (diff)
Actions #16

Updated by Florian Effenberger about 9 years ago

  • Subject changed from Rework monitoring to rework monitoring
  • Description updated (diff)
Actions #17

Updated by Florian Effenberger about 9 years ago

  • Description updated (diff)
Actions #18

Updated by Florian Effenberger about 9 years ago

Seems our monitoring vServer at a third-party hoster is causing the issue wrt. false positives, Robert now testing on his own machine

Actions #19

Updated by Florian Effenberger almost 9 years ago

  • Due date deleted (2015-03-13)
Actions #20

Updated by Florian Effenberger almost 9 years ago

Seems like a problem with the vHost at OVH, as Robert cannot confirm the issue; let's shift the monitoring to the upcoming backup machine (#1224)

Actions #21

Updated by Florian Effenberger almost 9 years ago

Actions #22

Updated by Alexander Werner almost 9 years ago

Ordered new monitoring vm at fillo, next step is to move monitoring setup there.

Actions #23

Updated by Alexander Werner almost 9 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from Alexander Werner to Robert Einsle

Robert will setup icinga2 an come back as soon as it is working.

Actions #24

Updated by Florian Effenberger over 8 years ago

  • Target version deleted (1)
Actions #25

Updated by Alexander Werner almost 8 years ago

  • Status changed from Feedback to Closed
Actions #26

Updated by Florian Effenberger almost 6 years ago

  • Related to deleted (Bug #1079: Status page)
Actions

Also available in: Atom PDF