[clue] postgres questions?

Wed Dec 7 08:37:24 MST 2016

Sorry to hear that. Such is the life of us Internet janitors, always
cleaning up someone else's spilled lunch...

On Tue, Dec 6, 2016 at 2:13 PM, Mike Bean <beandaemon at gmail.com> wrote:

> Yeah, tell me about it.   Problem is just to make things super interesting
> there is way too much of the network with fish-hooks into thing, it's not
> easy to patch.  Apparently the term "single point of failure" - wasn't
> thoroughly considered when designing/implementing this environment.   I
> just recently met a DBA at my company, asked their opinion.   After
> answering some questions we arrived at basically the same conclusion.  If
> you can't get into the code and look at the actual queries, your only real
> practical option is to patch it; which I definitely cannot do easily.
>  (sigh)
>
>
> On Mon, Dec 5, 2016 at 1:50 PM, Quentin Hartman <qhartman at gmail.com>
> wrote:
>
>> I should add that there have been some pretty massive improvements to
>> postgres since 8.4, and it wouldn't be at all surprising if your problem
>> just disappeared after an upgrade.
>>
>> QH
>>
>> On Mon, Dec 5, 2016 at 1:48 PM, Quentin Hartman <qhartman at gmail.com>
>> wrote:
>>
>>> Oof, 8.4, that's tough. It's been long enough since I touched an 8.x
>>> box, I'm not sure off hand. It should be in the main conf though. Here's
>>> what should be the relevant section i nthe official docs:
>>> https://www.postgresql.org/docs/8.4/static/runtime-config-logging.html
>>>
>>> As an aside, that's not a supported version anymore. Unless you have a
>>> compelling reason not to, upgrading to a 9.2+ (9.6 is current) should be at
>>> or near the top of your to-do list...
>>>
>>> QH
>>>
>>> On Mon, Dec 5, 2016 at 1:41 PM, Mike Bean <beandaemon at gmail.com> wrote:
>>>
>>>> It's 8.4  - unfortunately we're having some problems with
>>>> autovacum appearing to be configured correctly, not realistically running
>>>> when it should
>>>>
>>>> POSTGRES_LAST_AUTOVACUUM CRITICAL: DB "xe" (host:localhost) (port=1935)
>>>> DB: xe TABLE: pg_catalog.pg_class: 17:11 November 18, 2016 (16 days 20
>>>> hours 23 minutes)
>>>>
>>>> the thursday log is largely full of checkpoint_segments spam, and ONE
>>>> deadlock condition:
>>>>
>>>> DETAIL:  Process 9758 waits for ShareLock on transaction 40741339;
>>>> blocked by process 9794.
>>>>         Process 9794 waits for ShareLock on transaction 40740891;
>>>> blocked by process 9758.
>>>>         Process 9758: select * from rhn_channel.update_needed_cache($1)
>>>> as result
>>>>         Process 9794:
>>>>
>>>> there hasn't been much in the way of logging since the deadlock, just
>>>> more checkpoint_segment spam
>>>> HINT:  Consider increasing the configuration parameter
>>>> "checkpoint_segments".
>>>> LOG:  checkpoints are occurring too frequently (16 seconds apart)
>>>>
>>>> Since as near as I can tell the only meaningful logging over the last 4
>>>> days is the deadlock, I'm inclined to think the deadlock has some
>>>> relationship with the DB growing 40GB over the weekend.    Quentin's
>>>> probably right, I probably need to turn up the logging levels.  Do we know
>>>> where those are?  in the .conf file?
>>>>
>>>> On Mon, Dec 5, 2016 at 11:06 AM, Grant Johnson <grant at amadensor.com>
>>>> wrote:
>>>>
>>>>> There is a good chance that the Thursday log holds some clues about
>>>>> what
>>>>> the activity was.    It is likely repeated often, and will be easy to
>>>>> find, as it was happening a lot.   If the volume was very high, you
>>>>> also
>>>>> see messages about checkpoints, because changes were out pacing the
>>>>> normal background DB writes.  If these were happening only during this
>>>>> anomalous time, I would not worry about it, but if it happens during
>>>>> normal operations, some tuning changes may be needed.
>>>>>
>>>>> If you are having updates that exceed your free space map, the normal
>>>>> vacuum will not release the space, and instead you will need to run a
>>>>> vacuum full, which has the unfortunate side effect of locking tables
>>>>> while it runs, so it impedes write operations.
>>>>>
>>>>> So, I would look at the log, see if anything starts making sense, and
>>>>> then clean up with a full vacuum during a low activity time, and
>>>>> perhaps
>>>>> a reindex all.   I have found that doing full clean up every year or so
>>>>> helps, but generally, leaving the free space in the files with normal
>>>>> vacuum/auto-vacuum helps performance.
>>>>>
>>>>>
>>>>> On Mon, 2016-12-05 at 07:47 -0700, Mike Bean wrote:
>>>>> > Of particular interest is
>>>>> >
>>>>> >
>>>>> > -rw-------  1 postgres postgres  94479 Dec  2 21:29
>>>>> postgresql-Fri.log
>>>>> > -rw-------  1 postgres postgres      0 Dec  5 00:00
>>>>> postgresql-Mon.log
>>>>> > -rw-------  1 postgres postgres      0 Dec  3 00:00
>>>>> postgresql-Sat.log
>>>>> > -rw-------  1 postgres postgres      0 Dec  4 00:00
>>>>> postgresql-Sun.log
>>>>> > -rw-------  1 postgres postgres 208184 Dec  1 23:59
>>>>> postgresql-Thu.log
>>>>> > -rw-------  1 postgres postgres  86154 Nov 29 21:57
>>>>> postgresql-Tue.log
>>>>> > -rw-------  1 postgres postgres  89399 Nov 30 21:48
>>>>> postgresql-Wed.log
>>>>> >
>>>>> >
>>>>> > Huh - want to bet the writes were actually on thursday?    The plot
>>>>> > thickens!
>>>>> >
>>>>> > On Mon, Dec 5, 2016 at 7:43 AM, Mike Bean <beandaemon at gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> >         Do we have members who are comfortable/skilled with postgres?
>>>>> >         Long story short we have a spacewalk server running
>>>>> >         postgres and the DB is showing really erratic behavior.  We
>>>>> >         used to do weekly vacuums to keep it from running out of
>>>>> >         space, I figured out that doing a reindex bumped the free
>>>>> >         space from 20-80 GB.  Over the weekend, it dropped to roughly
>>>>> >         60, we went one week without doing a vacuum reindex, and it's
>>>>> >         back down to 20.
>>>>> >
>>>>> >
>>>>> >         I know for a fact there's no one else in the organization
>>>>> >         making major changes on this DB, and certainly nothing that
>>>>> >         could justify 40GB of writes in 3 days.  Not really being a
>>>>> >         DBA, I'm running low on my bag of tricks, thought I'd see if
>>>>> >         CLUE had any advice.   Running vacuums once a week to keep it
>>>>> >         from filling the disk feels like taking aspirin to get over
>>>>> >         your gunshot wound.  Sure, it'll keep the server running, but
>>>>> >         ultimately what I really want to know is why a largely static
>>>>> >         spacewalk DB is writing 60GB to the disk in a 2 week
>>>>> interval.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > clue mailing list: clue at cluedenver.org
>>>>> > For information, account preferences, or to unsubscribe see:
>>>>> > http://cluedenver.org/mailman/listinfo/clue
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> clue mailing list: clue at cluedenver.org
>>>>> For information, account preferences, or to unsubscribe see:
>>>>> http://cluedenver.org/mailman/listinfo/clue
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> clue mailing list: clue at cluedenver.org
>>>> For information, account preferences, or to unsubscribe see:
>>>> http://cluedenver.org/mailman/listinfo/clue
>>>>
>>>
>>>
>>
>> _______________________________________________
>> clue mailing list: clue at cluedenver.org
>> For information, account preferences, or to unsubscribe see:
>> http://cluedenver.org/mailman/listinfo/clue
>>
>
>
> _______________________________________________
> clue mailing list: clue at cluedenver.org
> For information, account preferences, or to unsubscribe see:
> http://cluedenver.org/mailman/listinfo/clue
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue/attachments/20161207/e75a580d/attachment-0001.html