[clue] 12216-RNA-A00

Mike mikedawg at gmail.com
Wed Apr 15 12:07:05 MDT 2015


>    2. Performing RCAs (foo7775 at comcast.net)
>
>
>
> Message: 2
> Date: Wed, 15 Apr 2015 17:36:55 +0000 (UTC)
> From: foo7775 at comcast.net
> Subject: [clue] Performing RCAs
> To: "list, CLUE" <clue at cluedenver.org>
> Message-ID:
>         <1006413404.4981382.1429119415076.JavaMail.zimbra at comcast.net>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all,
>
> I'm hoping to get some good suggestions on how I might be able to improve
> my ability to perform root cause analysis when problems occur. At the
> moment, my primary method is to go through logs (/var/log/messages, etc.)
> in the hope that something might be logged that will let me say "OK, _this_
> is what caused the service to stop/the problem to occur/etc." - but as many
> of you know, all too often, there simply isn't anything logged. I am aware
> of the historical data provided by the 'sar' utility, & that's definitely
> helpful up to a point, and I've tried to start an effort to ensure that
> 'sysstat' & 'collectl' are installed on all of our production servers, but
> I'm fairly sure that many of you know a number of other things that would
> be helpful to me.
>
> One thing that's really frustrating to me is that the management team will
> often insist upon knowing the cause for an event, when (from everything I
> can tell) there's simply *nothing* there to say why it occurred. I'm hoping
> that a number of you might be able to help me drastically reduce the number
> of times I have to say "I don't know why <foo> occurred."
>
> Thanks all,
>
> T.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://cluedenver.org/pipermail/clue/attachments/20150415/1ea73b4a/attachment-0001.html
>
> ------------------------------
>
> _______________________________________________
> clue mailing list
> clue at cluedenver.org
> http://cluedenver.org/mailman/listinfo/clue
>
> End of clue Digest, Vol 51, Issue 11
> ************************************
>


Hi T.

I'd be more than happy to walk you through some sample events,
specifically, real life stuff, that has happened to me/the company I've
worked for in general.

There are a bunch of things that need to line up, and that's why a Incident
Response Plan (IRP)  needs to be built.

Something that covers point of contacts, to logging, to incident analysis
needs to be fully documented and fully implemented.

That is the short story of it all, but I'd be more than willing to go into
more details with you.

Thanks

Mike

-- 
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue/attachments/20150415/394ac584/attachment.html 


More information about the clue mailing list