Task #2145: Frequent 504 Gateway timeouts in Bugzilla upon bug change - Infrastructure - The Document Foundation Redmine

Actions

Copy link

Task #2145

closed

Frequent 504 Gateway timeouts in Bugzilla upon bug change

Added by Aron Budea about 9 years ago. Updated about 8 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Guilhem Moulin

Category:

Target version:

Team - Qlater

Start date:

Due date:

% Done:

Tags:

URL:

Description

When adding a comment/changing a bug we often hit the following error: "504 Gateway Time-out" / "nginx/1.2.1"
The comment/change goes through, but the user could incorrectly think it didn't, goes back and comments again.

This is somewhat of a recurring issue, sometimes I don't see it for weeks, other times hit it multiple times.

Actions

Copy link

Updated by Aron Budea about 9 years ago

It's also coupled with extreme Bugzilla slowness :/.

Actions

Copy link

Updated by Florian Effenberger about 9 years ago

Assignee set to Guilhem Moulin
Target version set to Pool

Guilhem, can you have a look?

Actions

Copy link

Updated by Guilhem Moulin about 9 years ago

Assignee deleted (~~Guilhem Moulin~~)
Target version deleted (~~Pool~~)

Yes I'm not it. I restarted PostgreSQL with some tweaks; it's usually ~instant, but now it seems insanely slow at starting up… bugs.tdf has been down for 20mins already :-( :-(

Actions

Copy link

Updated by Guilhem Moulin about 9 years ago

Assignee set to Guilhem Moulin

Sorry, I didn't see I removed myself from the Assignee, I guess it's because I had the page loaded before I got your message.

And of course I meant "I on it", sorry for the confusion

Actions

Copy link

Updated by Guilhem Moulin about 9 years ago

Restarting PostgreSQL and forcing VACUUM seem to have significant improvements on auto completion. I also tweaked the config (which was the reason of the restart to start with), which should improve write queries.

I leave the bug priority on High in the meantime as it's not a proper fix, though; I'll keep investigating.

Actions

Copy link

Updated by Florian Effenberger about 9 years ago

Target version set to Pool

Any update? Have the problems been solved?

Actions

Copy link

Updated by Beluga Beluga about 9 years ago

Florian Effenberger wrote:

Any update? Have the problems been solved?

More investigation is needed as the problem has reappeared several times after this was filed. It should be noted that these have always appeared with our self-hosted BZ. Not sure, if the cause has always been the same.

Actions

Copy link

Updated by Florian Effenberger about 9 years ago

More investigation is needed as the problem has reappeared several times
after this was filed. It should be noted that these have always appeared
with our self-hosted BZ. Not sure, if the cause has always been the same.

Do you have any timestamps, so we could look into the logs?

Actions

Copy link

Updated by Florian Effenberger almost 9 years ago

Any updates, or can we close this ticket?

Actions

Copy link

#10

Updated by Florian Effenberger almost 9 years ago

Florian Effenberger wrote:

Any updates, or can we close this ticket?

Ping?

Actions

Copy link

#11

Updated by Guilhem Moulin almost 9 years ago

I'm still doing regular manual vacuums for now. I think it's best to keep the ticket open until we find a decent autovacuum configuration.

Actions

Copy link

#12

Updated by Florian Effenberger over 8 years ago

Guilhem Moulin wrote:

I'm still doing regular manual vacuums for now. I think it's best to keep the ticket open until we find a decent autovacuum configuration.

Any updates on the situation?

Actions

Copy link

#13

Updated by Guilhem Moulin over 8 years ago

During the past 52 days we've had "only" 40 of these Gateway Time-out, for a total of just under 1.4M requests to the fastcgi server (incl. 64k requests to the REST API). So while we could probably tune PostgreSQL better, I'm now tempted to close this, or at least downgrade the severity.

Moreover 12 of these 40 failed requests came from our own infra (the wiki querying the REST API). Looking at the timestamp they mostly come in batch and I could correlate 2 batches with the following guster heals (dates are UTC):

- 6x on 2017-07-14 from 10:10 to 10:15 [freeze+reboot of charly]
  - 9x on 2017-08-03 from 15:30 to 16:00 [corruption of delta volume]

Actions

Copy link

#14

Updated by Florian Effenberger over 8 years ago

During the past 52 days we've had "only" 40 of these Gateway Time-out,
for a total of just under 1.4M requests to the fastcgi server (incl. 64k
requests to the REST API). So while we could probably tune PostgreSQL
better, I'm now tempted to close this, or at least downgrade the severity.

I heard no complaints either, so how about having a normal priority and
Qlater, so we can revisit later the year?

Actions

Copy link

#15

Updated by Guilhem Moulin over 8 years ago

Priority changed from High to Normal
Target version changed from Pool to Qlater

Florian Effenberger wrote:

I heard no complaints either, so how about having a normal priority and
Qlater, so we can revisit later the year?

Sure, done.

Actions

Copy link

#16

Updated by Guilhem Moulin over 8 years ago

Status changed from New to In Progress

Actions

Copy link

#17

Updated by Florian Effenberger about 8 years ago

Any update? Can this be closed?

Actions

Copy link

#18

Updated by Guilhem Moulin about 8 years ago

Status changed from In Progress to Closed

Closing indeed. I still see a handful of timeouts in the logs, but was about .0005% of all CGI/REST requests issued during the past 2 months. And we haven't heard any further complaint.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Infrastructure

Custom queries

Task #2145

Frequent 504 Gateway timeouts in Bugzilla upon bug change

Updated by Aron Budea about 9 years ago

Updated by Florian Effenberger about 9 years ago

Updated by Guilhem Moulin about 9 years ago

Updated by Guilhem Moulin about 9 years ago

Updated by Guilhem Moulin about 9 years ago

Updated by Florian Effenberger about 9 years ago

Updated by Beluga Beluga about 9 years ago

Updated by Florian Effenberger about 9 years ago

Updated by Florian Effenberger almost 9 years ago

Updated by Florian Effenberger almost 9 years ago

Updated by Guilhem Moulin almost 9 years ago

Updated by Florian Effenberger over 8 years ago

Updated by Guilhem Moulin over 8 years ago

Updated by Florian Effenberger over 8 years ago

Updated by Guilhem Moulin over 8 years ago

Updated by Guilhem Moulin over 8 years ago

Updated by Florian Effenberger about 8 years ago

Updated by Guilhem Moulin about 8 years ago