[clue] postgres questions?

Mon Dec 19 08:35:45 MST 2016

No worries,  yeah, I've pretty much resigned myself to doing a
vacuum/reindex 2 or 3 times a month.   I'm fairly convinced that the
deadlocks I've found are happening around RHN errata, and there's only one
repo that's importing errata, but I haven't figured out how to disable that
function.  I've gone as far as chatting with some of the DBA's who work
here in other functions.  They're not as afraid of deadlocks as I am, but
they also have access to the queries/code, and since it's basically an
open-source implementation of satellite and I'm not a DBA, that's not going
to happen.  So really my only choice is to patch it, but it (the system),
is a relatively delicate single point of failure in this environment, so I
really don't want to stick my hand in that bee-hive if I can help it.
It'd be easier to try and get more storage for the NFS share it runs on,
but of course, the problem with that logic, is I have no idea if there's a
cap on how much space it will eat.   For all I know it will keep growing
ad-infinitum.

On Mon, Dec 12, 2016 at 11:20 AM, Chris Fedde <chris at fedde.us> wrote:

> I know that this one is getting kind of old but I thought I'd chime in
> anyway.  You can ask postgresql to log the queries it is running.
> It might be worth reviewing https://www.postgresql.org/docs/8.4/
> static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT as a step
> along the way.
> Of interest might be the 'log_statement'
>
> Also the freenode#postgresql irc channel has been very helpful to me in
> the past.
>
> Good luck!
> chris
>
> On Wed, Dec 7, 2016 at 8:37 AM, Quentin Hartman <qhartman at gmail.com>
> wrote:
>
>> Sorry to hear that. Such is the life of us Internet janitors, always
>> cleaning up someone else's spilled lunch...
>>
>> On Tue, Dec 6, 2016 at 2:13 PM, Mike Bean <beandaemon at gmail.com> wrote:
>>
>>> Yeah, tell me about it.   Problem is just to make things super
>>> interesting there is way too much of the network with fish-hooks into
>>> thing, it's not easy to patch.  Apparently the term "single point of
>>> failure" - wasn't thoroughly considered when designing/implementing this
>>> environment.   I just recently met a DBA at my company, asked their
>>> opinion.   After answering some questions we arrived at basically the same
>>> conclusion.  If you can't get into the code and look at the actual queries,
>>> your only real practical option is to patch it; which I definitely cannot
>>> do easily.  (sigh)
>>>
>>>
>>> On Mon, Dec 5, 2016 at 1:50 PM, Quentin Hartman <qhartman at gmail.com>
>>> wrote:
>>>
>>>> I should add that there have been some pretty massive improvements to
>>>> postgres since 8.4, and it wouldn't be at all surprising if your problem
>>>> just disappeared after an upgrade.
>>>>
>>>> QH
>>>>
>>>> On Mon, Dec 5, 2016 at 1:48 PM, Quentin Hartman <qhartman at gmail.com>
>>>> wrote:
>>>>
>>>>> Oof, 8.4, that's tough. It's been long enough since I touched an 8.x
>>>>> box, I'm not sure off hand. It should be in the main conf though. Here's
>>>>> what should be the relevant section i nthe official docs:
>>>>> https://www.postgresql.org/docs/8.4/static/runtime-config-logging.html
>>>>>
>>>>> As an aside, that's not a supported version anymore. Unless you have a
>>>>> compelling reason not to, upgrading to a 9.2+ (9.6 is current) should be at
>>>>> or near the top of your to-do list...
>>>>>
>>>>> QH
>>>>>
>>>>> On Mon, Dec 5, 2016 at 1:41 PM, Mike Bean <beandaemon at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> It's 8.4  - unfortunately we're having some problems with
>>>>>> autovacum appearing to be configured correctly, not realistically running
>>>>>> when it should
>>>>>>
>>>>>> POSTGRES_LAST_AUTOVACUUM CRITICAL: DB "xe" (host:localhost)
>>>>>> (port=1935) DB: xe TABLE: pg_catalog.pg_class: 17:11 November 18, 2016 (16
>>>>>> days 20 hours 23 minutes)
>>>>>>
>>>>>> the thursday log is largely full of checkpoint_segments spam, and ONE
>>>>>> deadlock condition:
>>>>>>
>>>>>> DETAIL:  Process 9758 waits for ShareLock on transaction 40741339;
>>>>>> blocked by process 9794.
>>>>>>         Process 9794 waits for ShareLock on transaction 40740891;
>>>>>> blocked by process 9758.
>>>>>>         Process 9758: select * from rhn_channel.update_needed_cache($1)
>>>>>> as result
>>>>>>         Process 9794:
>>>>>>
>>>>>> there hasn't been much in the way of logging since the deadlock, just
>>>>>> more checkpoint_segment spam
>>>>>> HINT:  Consider increasing the configuration parameter
>>>>>> "checkpoint_segments".
>>>>>> LOG:  checkpoints are occurring too frequently (16 seconds apart)
>>>>>>
>>>>>> Since as near as I can tell the only meaningful logging over the last
>>>>>> 4 days is the deadlock, I'm inclined to think the deadlock has some
>>>>>> relationship with the DB growing 40GB over the weekend.    Quentin's
>>>>>> probably right, I probably need to turn up the logging levels.  Do we know
>>>>>> where those are?  in the .conf file?
>>>>>>
>>>>>> On Mon, Dec 5, 2016 at 11:06 AM, Grant Johnson <grant at amadensor.com>
>>>>>> wrote:
>>>>>>
>>>>>>> There is a good chance that the Thursday log holds some clues about
>>>>>>> what
>>>>>>> the activity was.    It is likely repeated often, and will be easy to
>>>>>>> find, as it was happening a lot.   If the volume was very high, you
>>>>>>> also
>>>>>>> see messages about checkpoints, because changes were out pacing the
>>>>>>> normal background DB writes.  If these were happening only during
>>>>>>> this
>>>>>>> anomalous time, I would not worry about it, but if it happens during
>>>>>>> normal operations, some tuning changes may be needed.
>>>>>>>
>>>>>>> If you are having updates that exceed your free space map, the normal
>>>>>>> vacuum will not release the space, and instead you will need to run a
>>>>>>> vacuum full, which has the unfortunate side effect of locking tables
>>>>>>> while it runs, so it impedes write operations.
>>>>>>>
>>>>>>> So, I would look at the log, see if anything starts making sense, and
>>>>>>> then clean up with a full vacuum during a low activity time, and
>>>>>>> perhaps
>>>>>>> a reindex all.   I have found that doing full clean up every year or
>>>>>>> so
>>>>>>> helps, but generally, leaving the free space in the files with normal
>>>>>>> vacuum/auto-vacuum helps performance.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 2016-12-05 at 07:47 -0700, Mike Bean wrote:
>>>>>>> > Of particular interest is
>>>>>>> >
>>>>>>> >
>>>>>>> > -rw-------  1 postgres postgres  94479 Dec  2 21:29
>>>>>>> postgresql-Fri.log
>>>>>>> > -rw-------  1 postgres postgres      0 Dec  5 00:00
>>>>>>> postgresql-Mon.log
>>>>>>> > -rw-------  1 postgres postgres      0 Dec  3 00:00
>>>>>>> postgresql-Sat.log
>>>>>>> > -rw-------  1 postgres postgres      0 Dec  4 00:00
>>>>>>> postgresql-Sun.log
>>>>>>> > -rw-------  1 postgres postgres 208184 Dec  1 23:59
>>>>>>> postgresql-Thu.log
>>>>>>> > -rw-------  1 postgres postgres  86154 Nov 29 21:57
>>>>>>> postgresql-Tue.log
>>>>>>> > -rw-------  1 postgres postgres  89399 Nov 30 21:48
>>>>>>> postgresql-Wed.log
>>>>>>> >
>>>>>>> >
>>>>>>> > Huh - want to bet the writes were actually on thursday?    The plot
>>>>>>> > thickens!
>>>>>>> >
>>>>>>> > On Mon, Dec 5, 2016 at 7:43 AM, Mike Bean <beandaemon at gmail.com>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> >         Do we have members who are comfortable/skilled with
>>>>>>> postgres?
>>>>>>> >         Long story short we have a spacewalk server running
>>>>>>> >         postgres and the DB is showing really erratic behavior.  We
>>>>>>> >         used to do weekly vacuums to keep it from running out of
>>>>>>> >         space, I figured out that doing a reindex bumped the free
>>>>>>> >         space from 20-80 GB.  Over the weekend, it dropped to
>>>>>>> roughly
>>>>>>> >         60, we went one week without doing a vacuum reindex, and
>>>>>>> it's
>>>>>>> >         back down to 20.
>>>>>>> >
>>>>>>> >
>>>>>>> >         I know for a fact there's no one else in the organization
>>>>>>> >         making major changes on this DB, and certainly nothing that
>>>>>>> >         could justify 40GB of writes in 3 days.  Not really being a
>>>>>>> >         DBA, I'm running low on my bag of tricks, thought I'd see
>>>>>>> if
>>>>>>> >         CLUE had any advice.   Running vacuums once a week to keep
>>>>>>> it
>>>>>>> >         from filling the disk feels like taking aspirin to get over
>>>>>>> >         your gunshot wound.  Sure, it'll keep the server running,
>>>>>>> but
>>>>>>> >         ultimately what I really want to know is why a largely
>>>>>>> static
>>>>>>> >         spacewalk DB is writing 60GB to the disk in a 2 week
>>>>>>> interval.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > clue mailing list: clue at cluedenver.org
>>>>>>> > For information, account preferences, or to unsubscribe see:
>>>>>>> > http://cluedenver.org/mailman/listinfo/clue
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> clue mailing list: clue at cluedenver.org
>>>>>>> For information, account preferences, or to unsubscribe see:
>>>>>>> http://cluedenver.org/mailman/listinfo/clue
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> clue mailing list: clue at cluedenver.org
>>>>>> For information, account preferences, or to unsubscribe see:
>>>>>> http://cluedenver.org/mailman/listinfo/clue
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> clue mailing list: clue at cluedenver.org
>>>> For information, account preferences, or to unsubscribe see:
>>>> http://cluedenver.org/mailman/listinfo/clue
>>>>
>>>
>>>
>>> _______________________________________________
>>> clue mailing list: clue at cluedenver.org
>>> For information, account preferences, or to unsubscribe see:
>>> http://cluedenver.org/mailman/listinfo/clue
>>>
>>
>>
>> _______________________________________________
>> clue mailing list: clue at cluedenver.org
>> For information, account preferences, or to unsubscribe see:
>> http://cluedenver.org/mailman/listinfo/clue
>>
>
>
> _______________________________________________
> clue mailing list: clue at cluedenver.org
> For information, account preferences, or to unsubscribe see:
> http://cluedenver.org/mailman/listinfo/clue
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue/attachments/20161219/5de687e2/attachment-0001.html