Task #3437
closeddashboard.documentfoundation.org fails with 500 errors
0%
Description
Any /elasticsearch hits result in a 500. I took a quick look through the logs in the VM and it seems possibly related to some changes in Dec 2020.
The front-end is also throwing the following:
Error: Internal Server Error
ErrorAbstract@https://dashboard.documentfoundation.org/bundles/kibana.bundle.js?v=16380:61:151667
StatusCodeError@https://dashboard.documentfoundation.org/bundles/kibana.bundle.js?v=16380:61:155126
respond@https://dashboard.documentfoundation.org/bundles/kibana.bundle.js?v=16380:61:161556
checkRespForFailure@https://dashboard.documentfoundation.org/bundles/kibana.bundle.js?v=16380:61:160796
AngularConnector.prototype.request/<@https://dashboard.documentfoundation.org/bundles/kibana.bundle.js?v=16380:55:33588
processQueue@https://dashboard.documentfoundation.org/bundles/commons.bundle.js?v=16380:35:132456
scheduleProcessQueue/<@https://dashboard.documentfoundation.org/bundles/commons.bundle.js?v=16380:35:133361
$digest@https://dashboard.documentfoundation.org/bundles/commons.bundle.js?v=16380:35:144239
$apply@https://dashboard.documentfoundation.org/bundles/commons.bundle.js?v=16380:35:147018
done@https://dashboard.documentfoundation.org/bundles/commons.bundle.js?v=16380:35:100026
completeRequest@https://dashboard.documentfoundation.org/bundles/commons.bundle.js?v=16380:35:104697
createHttpBackend/</xhr.onload@https://dashboard.documentfoundation.org/bundles/commons.bundle.js?v=16380:35:105435
Updated by Guilhem Moulin about 4 years ago
- Status changed from New to Closed
- Assignee set to Guilhem Moulin
This happens from time to time (also before Dec 2020) and unfortunately isn't caught by the monitoring system because AFAIK in order to detect this one needs to send a POST request to the backend (for instance an empty POST request to https://dashboard.documentfoundation.org/elasticsearch/.kibana/_search with HTTP headers Content-Type: application/json
and kbn-xsrf: reporting
). IIRC GET requests to /api/status don't report the error, but I might misremember so I just updated the the blackbox export to point to that URL instead. If it doesn't help, what would you say is the best way to make the blackbox exporter do that health status check? An HTTP method conversion on the nginx side?
Anyway, restarting kibana fixed this. (Sometimes we've had to restart elastic and searchguard also.)
Updated by Guilhem Moulin about 4 years ago
Guilhem Moulin wrote:
IIRC GET requests to /api/status don't report the error, but I might misremember so I just updated the the blackbox export to point to that URL instead.
I confirm GET /api/status isn't suitable for monitoring: kibana had trouble again even though the reply was 200 and the JSON document had .status.overall.state = green.
If it doesn't help, what would you say is the best way to make the blackbox exporter do that health status check? An HTTP method conversion on the nginx side?
Updated by Guilhem Moulin about 4 years ago
Guilhem Moulin wrote:
If it doesn't help, what would you say is the best way to make the blackbox exporter do that health status check? An HTTP method conversion on the nginx side?
For the record it failed again so I tried that approach (proxy_method POST
with the relevant proxy_set_header
directives) and this time we get an HTTP 500. So we can monitor that service finally :-)