Project

General

Profile

Bug #1072

ovirt ha-agent fails on excelsior

Added by Alexander Werner almost 4 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Urgent
Category:
Virtualization
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Tags:
URL:

Description

After upgrading to 3.5.1, the ovirt-hosted-engine-ha-agent fails:

MainThread::INFO::2015-02-03 11:28:00,542::hosted_engine::662::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::ERROR::2015-02-03 11:28:00,543::hosted_engine::632::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=cd471cfc-078e-4e59-9ede-4c162c5a0c6c, host_id=1): timeout during domain acquisition
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 159, in _run_agent
    hosted_engine.HostedEngine(self.shutdown_requested)\
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 302, in start_monitoring
    self._initialize_domain_monitor()
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 633, in _initialize_domain_monitor
    raise Exception(msg)
Exception: Failed to start monitoring domain (sd_uuid=cd471cfc-078e-4e59-9ede-4c162c5a0c6c, host_id=1): timeout during domain acquisition
MainThread::ERROR::2015-02-03 11:28:00,552::agent::172::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitoring domain (sd_uuid=cd471cfc-078e-4e59-9ede-4c162c5a0c6c, host_id=1): timeout during domain acquisition' - trying to restart agent
MainThread::WARNING::2015-02-03 11:28:05,557::agent::175::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '9'
MainThread::ERROR::2015-02-03 11:28:05,557::agent::177::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Too many errors occurred, giving up. Please review the log and consider filing a bug.
MainThread::INFO::2015-02-03 11:28:05,558::agent::118::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down

The installed packages are:

# rpm -qa |egrep "(ovirt|vdsm)"|sort
ovirt-engine-sdk-python-3.5.1.0-1.el6.noarch
ovirt-host-deploy-1.3.1-1.el6.noarch
ovirt-hosted-engine-ha-1.2.5-1.el6.noarch
ovirt-hosted-engine-setup-1.2.2-1.el6.noarch
ovirt-release34-1.0.3-1.noarch
ovirt-release35-002-1.noarch
vdsm-4.16.10-8.gitc937927.el6.x86_64
vdsm-cli-4.16.10-8.gitc937927.el6.noarch
vdsm-gluster-4.16.10-8.gitc937927.el6.noarch
vdsm-jsonrpc-4.16.10-8.gitc937927.el6.noarch
vdsm-python-4.16.10-8.gitc937927.el6.noarch
vdsm-python-zombiereaper-4.16.10-8.gitc937927.el6.noarch
vdsm-xmlrpc-4.16.10-8.gitc937927.el6.noarch
vdsm-yajsonrpc-4.16.10-8.gitc937927.el6.noarch

On 89.238.68.130 (excelsior) with the other host in local maintenance :

# hosted-engine --vm-status
[root@excelsior ~]# hosted-engine --vm-status

--== Host 1 status ==--

Status up-to-date                  : False
Hostname                           : 89.238.68.130
Host ID                            : 1
Engine status                      : unknown stale-data
Score                              : 0
Local maintenance                  : True
Host timestamp                     : 1422885241
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=1422885241 (Mon Feb  2 14:54:01 2015)
    host-id=1
    score=0
    maintenance=True
    state=LocalMaintenance

--== Host 2 status ==--

Status up-to-date                  : False
Hostname                           : 89.238.67.210
Host ID                            : 2
Engine status                      : unknown stale-data
Score                              : 0
Local maintenance                  : True
Host timestamp                     : 1422963839
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=1422963839 (Tue Feb  3 12:43:59 2015)
    host-id=2
    score=0
    maintenance=True
    state=LocalMaintenance

History

#1 Updated by Alexander Werner almost 4 years ago

  • Assignee set to Alexander Werner

#3 Updated by Florian Effenberger over 3 years ago

  • Status changed from New to Rejected

Rejecting due to change of platform

Also available in: Atom PDF