[clue] postgres questions?
Quentin Hartman
qhartman at gmail.com
Wed Dec 7 08:37:24 MST 2016
Sorry to hear that. Such is the life of us Internet janitors, always
cleaning up someone else's spilled lunch...
On Tue, Dec 6, 2016 at 2:13 PM, Mike Bean <beandaemon at gmail.com> wrote:
> Yeah, tell me about it. Problem is just to make things super interesting
> there is way too much of the network with fish-hooks into thing, it's not
> easy to patch. Apparently the term "single point of failure" - wasn't
> thoroughly considered when designing/implementing this environment. I
> just recently met a DBA at my company, asked their opinion. After
> answering some questions we arrived at basically the same conclusion. If
> you can't get into the code and look at the actual queries, your only real
> practical option is to patch it; which I definitely cannot do easily.
> (sigh)
>
>
> On Mon, Dec 5, 2016 at 1:50 PM, Quentin Hartman <qhartman at gmail.com>
> wrote:
>
>> I should add that there have been some pretty massive improvements to
>> postgres since 8.4, and it wouldn't be at all surprising if your problem
>> just disappeared after an upgrade.
>>
>> QH
>>
>> On Mon, Dec 5, 2016 at 1:48 PM, Quentin Hartman <qhartman at gmail.com>
>> wrote:
>>
>>> Oof, 8.4, that's tough. It's been long enough since I touched an 8.x
>>> box, I'm not sure off hand. It should be in the main conf though. Here's
>>> what should be the relevant section i nthe official docs:
>>> https://www.postgresql.org/docs/8.4/static/runtime-config-logging.html
>>>
>>> As an aside, that's not a supported version anymore. Unless you have a
>>> compelling reason not to, upgrading to a 9.2+ (9.6 is current) should be at
>>> or near the top of your to-do list...
>>>
>>> QH
>>>
>>> On Mon, Dec 5, 2016 at 1:41 PM, Mike Bean <beandaemon at gmail.com> wrote:
>>>
>>>> It's 8.4 - unfortunately we're having some problems with
>>>> autovacum appearing to be configured correctly, not realistically running
>>>> when it should
>>>>
>>>> POSTGRES_LAST_AUTOVACUUM CRITICAL: DB "xe" (host:localhost) (port=1935)
>>>> DB: xe TABLE: pg_catalog.pg_class: 17:11 November 18, 2016 (16 days 20
>>>> hours 23 minutes)
>>>>
>>>> the thursday log is largely full of checkpoint_segments spam, and ONE
>>>> deadlock condition:
>>>>
>>>> DETAIL: Process 9758 waits for ShareLock on transaction 40741339;
>>>> blocked by process 9794.
>>>> Process 9794 waits for ShareLock on transaction 40740891;
>>>> blocked by process 9758.
>>>> Process 9758: select * from rhn_channel.update_needed_cache($1)
>>>> as result
>>>> Process 9794:
>>>>
>>>> there hasn't been much in the way of logging since the deadlock, just
>>>> more checkpoint_segment spam
>>>> HINT: Consider increasing the configuration parameter
>>>> "checkpoint_segments".
>>>> LOG: checkpoints are occurring too frequently (16 seconds apart)
>>>>
>>>> Since as near as I can tell the only meaningful logging over the last 4
>>>> days is the deadlock, I'm inclined to think the deadlock has some
>>>> relationship with the DB growing 40GB over the weekend. Quentin's
>>>> probably right, I probably need to turn up the logging levels. Do we know
>>>> where those are? in the .conf file?
>>>>
>>>> On Mon, Dec 5, 2016 at 11:06 AM, Grant Johnson <grant at amadensor.com>
>>>> wrote:
>>>>
>>>>> There is a good chance that the Thursday log holds some clues about
>>>>> what
>>>>> the activity was. It is likely repeated often, and will be easy to
>>>>> find, as it was happening a lot. If the volume was very high, you
>>>>> also
>>>>> see messages about checkpoints, because changes were out pacing the
>>>>> normal background DB writes. If these were happening only during this
>>>>> anomalous time, I would not worry about it, but if it happens during
>>>>> normal operations, some tuning changes may be needed.
>>>>>
>>>>> If you are having updates that exceed your free space map, the normal
>>>>> vacuum will not release the space, and instead you will need to run a
>>>>> vacuum full, which has the unfortunate side effect of locking tables
>>>>> while it runs, so it impedes write operations.
>>>>>
>>>>> So, I would look at the log, see if anything starts making sense, and
>>>>> then clean up with a full vacuum during a low activity time, and
>>>>> perhaps
>>>>> a reindex all. I have found that doing full clean up every year or so
>>>>> helps, but generally, leaving the free space in the files with normal
>>>>> vacuum/auto-vacuum helps performance.
>>>>>
>>>>>
>>>>> On Mon, 2016-12-05 at 07:47 -0700, Mike Bean wrote:
>>>>> > Of particular interest is
>>>>> >
>>>>> >
>>>>> > -rw------- 1 postgres postgres 94479 Dec 2 21:29
>>>>> postgresql-Fri.log
>>>>> > -rw------- 1 postgres postgres 0 Dec 5 00:00
>>>>> postgresql-Mon.log
>>>>> > -rw------- 1 postgres postgres 0 Dec 3 00:00
>>>>> postgresql-Sat.log
>>>>> > -rw------- 1 postgres postgres 0 Dec 4 00:00
>>>>> postgresql-Sun.log
>>>>> > -rw------- 1 postgres postgres 208184 Dec 1 23:59
>>>>> postgresql-Thu.log
>>>>> > -rw------- 1 postgres postgres 86154 Nov 29 21:57
>>>>> postgresql-Tue.log
>>>>> > -rw------- 1 postgres postgres 89399 Nov 30 21:48
>>>>> postgresql-Wed.log
>>>>> >
>>>>> >
>>>>> > Huh - want to bet the writes were actually on thursday? The plot
>>>>> > thickens!
>>>>> >
>>>>> > On Mon, Dec 5, 2016 at 7:43 AM, Mike Bean <beandaemon at gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> > Do we have members who are comfortable/skilled with postgres?
>>>>> > Long story short we have a spacewalk server running
>>>>> > postgres and the DB is showing really erratic behavior. We
>>>>> > used to do weekly vacuums to keep it from running out of
>>>>> > space, I figured out that doing a reindex bumped the free
>>>>> > space from 20-80 GB. Over the weekend, it dropped to roughly
>>>>> > 60, we went one week without doing a vacuum reindex, and it's
>>>>> > back down to 20.
>>>>> >
>>>>> >
>>>>> > I know for a fact there's no one else in the organization
>>>>> > making major changes on this DB, and certainly nothing that
>>>>> > could justify 40GB of writes in 3 days. Not really being a
>>>>> > DBA, I'm running low on my bag of tricks, thought I'd see if
>>>>> > CLUE had any advice. Running vacuums once a week to keep it
>>>>> > from filling the disk feels like taking aspirin to get over
>>>>> > your gunshot wound. Sure, it'll keep the server running, but
>>>>> > ultimately what I really want to know is why a largely static
>>>>> > spacewalk DB is writing 60GB to the disk in a 2 week
>>>>> interval.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > clue mailing list: clue at cluedenver.org
>>>>> > For information, account preferences, or to unsubscribe see:
>>>>> > http://cluedenver.org/mailman/listinfo/clue
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> clue mailing list: clue at cluedenver.org
>>>>> For information, account preferences, or to unsubscribe see:
>>>>> http://cluedenver.org/mailman/listinfo/clue
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> clue mailing list: clue at cluedenver.org
>>>> For information, account preferences, or to unsubscribe see:
>>>> http://cluedenver.org/mailman/listinfo/clue
>>>>
>>>
>>>
>>
>> _______________________________________________
>> clue mailing list: clue at cluedenver.org
>> For information, account preferences, or to unsubscribe see:
>> http://cluedenver.org/mailman/listinfo/clue
>>
>
>
> _______________________________________________
> clue mailing list: clue at cluedenver.org
> For information, account preferences, or to unsubscribe see:
> http://cluedenver.org/mailman/listinfo/clue
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue/attachments/20161207/e75a580d/attachment-0001.html
More information about the clue
mailing list