[clue] 12216-RNA-A00

foo7775 at comcast.net foo7775 at comcast.net
Wed Apr 15 12:11:39 MDT 2015


Sounds great, thank you Mike. I'll contact you off-list to see if we can maybe arrange a workable time. 

I'd also welcome any info that anyone else might have. 

T. 

----- Original Message -----

From: "Mike" <mikedawg at gmail.com> 
To: "CLUE's mailing list" <clue at cluedenver.org> 
Sent: Wednesday, April 15, 2015 12:07:05 PM 
Subject: Re: [clue] 12216-RNA-A00 






2. Performing RCAs ( foo7775 at comcast.net ) 



Message: 2 
Date: Wed, 15 Apr 2015 17:36:55 +0000 (UTC) 
From: foo7775 at comcast.net 
Subject: [clue] Performing RCAs 
To: "list, CLUE" < clue at cluedenver.org > 
Message-ID: 
< 1006413404.4981382.1429119415076.JavaMail.zimbra at comcast.net > 
Content-Type: text/plain; charset="utf-8" 

Hi all, 

I'm hoping to get some good suggestions on how I might be able to improve my ability to perform root cause analysis when problems occur. At the moment, my primary method is to go through logs (/var/log/messages, etc.) in the hope that something might be logged that will let me say "OK, _this_ is what caused the service to stop/the problem to occur/etc." - but as many of you know, all too often, there simply isn't anything logged. I am aware of the historical data provided by the 'sar' utility, & that's definitely helpful up to a point, and I've tried to start an effort to ensure that 'sysstat' & 'collectl' are installed on all of our production servers, but I'm fairly sure that many of you know a number of other things that would be helpful to me. 

One thing that's really frustrating to me is that the management team will often insist upon knowing the cause for an event, when (from everything I can tell) there's simply *nothing* there to say why it occurred. I'm hoping that a number of you might be able to help me drastically reduce the number of times I have to say "I don't know why <foo> occurred." 

Thanks all, 

T. 
-------------- next part -------------- 
An HTML attachment was scrubbed... 
URL: http://cluedenver.org/pipermail/clue/attachments/20150415/1ea73b4a/attachment-0001.html 

------------------------------ 

_______________________________________________ 
clue mailing list 
clue at cluedenver.org 
http://cluedenver.org/mailman/listinfo/clue 

End of clue Digest, Vol 51, Issue 11 
************************************ 





Hi T. 

I'd be more than happy to walk you through some sample events, specifically, real life stuff, that has happened to me/the company I've worked for in general. 

There are a bunch of things that need to line up, and that's why a Incident Response Plan (IRP) needs to be built. 

Something that covers point of contacts, to logging, to incident analysis needs to be fully documented and fully implemented. 

That is the short story of it all, but I'd be more than willing to go into more details with you. 

Thanks 

Mike 

-- 
Mike 

_______________________________________________ 
clue mailing list: clue at cluedenver.org 
For information, account preferences, or to unsubscribe see: 
http://cluedenver.org/mailman/listinfo/clue 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue/attachments/20150415/ded98ae5/attachment.html 


More information about the clue mailing list