Task #2145
closedFrequent 504 Gateway timeouts in Bugzilla upon bug change
0%
Description
When adding a comment/changing a bug we often hit the following error: "504 Gateway Time-out" / "nginx/1.2.1"
The comment/change goes through, but the user could incorrectly think it didn't, goes back and comments again.
This is somewhat of a recurring issue, sometimes I don't see it for weeks, other times hit it multiple times.
Updated by Aron Budea almost 8 years ago
It's also coupled with extreme Bugzilla slowness :/.
Updated by Florian Effenberger almost 8 years ago
- Assignee set to Guilhem Moulin
- Target version set to Pool
Guilhem, can you have a look?
Updated by Guilhem Moulin almost 8 years ago
- Assignee deleted (
Guilhem Moulin) - Target version deleted (
Pool)
Yes I'm not it. I restarted PostgreSQL with some tweaks; it's usually ~instant, but now it seems insanely slow at starting up… bugs.tdf has been down for 20mins already :-( :-(
Updated by Guilhem Moulin almost 8 years ago
- Assignee set to Guilhem Moulin
Sorry, I didn't see I removed myself from the Assignee, I guess it's because I had the page loaded before I got your message.
And of course I meant "I on it", sorry for the confusion
Updated by Guilhem Moulin almost 8 years ago
Restarting PostgreSQL and forcing VACUUM seem to have significant improvements on auto completion. I also tweaked the config (which was the reason of the restart to start with), which should improve write queries.
I leave the bug priority on High in the meantime as it's not a proper fix, though; I'll keep investigating.
Updated by Florian Effenberger over 7 years ago
- Target version set to Pool
Any update? Have the problems been solved?
Updated by Beluga Beluga over 7 years ago
Florian Effenberger wrote:
Any update? Have the problems been solved?
More investigation is needed as the problem has reappeared several times after this was filed. It should be noted that these have always appeared with our self-hosted BZ. Not sure, if the cause has always been the same.
Updated by Florian Effenberger over 7 years ago
More investigation is needed as the problem has reappeared several times
after this was filed. It should be noted that these have always appeared
with our self-hosted BZ. Not sure, if the cause has always been the same.
Do you have any timestamps, so we could look into the logs?
Updated by Florian Effenberger over 7 years ago
Any updates, or can we close this ticket?
Updated by Florian Effenberger over 7 years ago
Florian Effenberger wrote:
Any updates, or can we close this ticket?
Ping?
Updated by Guilhem Moulin over 7 years ago
I'm still doing regular manual vacuums for now. I think it's best to keep the ticket open until we find a decent autovacuum configuration.
Updated by Florian Effenberger over 7 years ago
Guilhem Moulin wrote:
I'm still doing regular manual vacuums for now. I think it's best to keep the ticket open until we find a decent autovacuum configuration.
Any updates on the situation?
Updated by Guilhem Moulin about 7 years ago
During the past 52 days we've had "only" 40 of these Gateway Time-out, for a total of just under 1.4M requests to the fastcgi server (incl. 64k requests to the REST API). So while we could probably tune PostgreSQL better, I'm now tempted to close this, or at least downgrade the severity.
Moreover 12 of these 40 failed requests came from our own infra (the wiki querying the REST API). Looking at the timestamp they mostly come in batch and I could correlate 2 batches with the following guster heals (dates are UTC):
- 6x on 2017-07-14 from 10:10 to 10:15 [freeze+reboot of charly]
- 9x on 2017-08-03 from 15:30 to 16:00 [corruption of delta volume]
Updated by Florian Effenberger about 7 years ago
During the past 52 days we've had "only" 40 of these Gateway Time-out,
for a total of just under 1.4M requests to the fastcgi server (incl. 64k
requests to the REST API). So while we could probably tune PostgreSQL
better, I'm now tempted to close this, or at least downgrade the severity.
I heard no complaints either, so how about having a normal priority and
Qlater, so we can revisit later the year?
Updated by Guilhem Moulin about 7 years ago
- Priority changed from High to Normal
- Target version changed from Pool to Qlater
Florian Effenberger wrote:
I heard no complaints either, so how about having a normal priority and
Qlater, so we can revisit later the year?
Sure, done.
Updated by Guilhem Moulin about 7 years ago
- Status changed from New to In Progress
Updated by Florian Effenberger over 6 years ago
Any update? Can this be closed?
Updated by Guilhem Moulin over 6 years ago
- Status changed from In Progress to Closed
Closing indeed. I still see a handful of timeouts in the logs, but was about .0005% of all CGI/REST requests issued during the past 2 months. And we haven't heard any further complaint.