[clue-tech] Multiple system backups

Tue Feb 14 12:03:17 MST 2006

David,

I respect your opinions and experience and I value your input very 
highly as someone with a lot more Linux experience than me.  However, 
please be careful to separate emotional or personal reactions from this 
transaction (e.g.:  "3 dozen lines of cruft" or "I don't value those 
kinds of comments").  I am in no way attempting to insult you and I hope 
to not incite insult from you.  I'm honestly very interested in 
improvement.  Some of your comments here hit me as biased toward only 
"your way of doing things" instead of "industry standards, which are 
documented [here], dictate...".  I'm not looking for philosophy, but for 
effectiveness and standards compliance.  :)

David L. Anselmi wrote:

> William wrote:
>
>> I feel like you missed my "Power User" note.  :)   I specifically did 
>> not set a $PATH to provide a maximal user-control capability.  
>> Imagine this script running chrooted, or a particular user (say, 
>> using Debian rather than Red Hat) prefers alternative commands to the 
>> ones I chose to use. Or, they prefer to supply particular other 
>> options to the commands over and above my choices.  I decided that 
>> this was a good way to provide a very highly granular system of 
>> control to the (power) user.
>
>
> I did miss the power user note.  But it doesn't matter, find & replace 
> accomplishes the same as well.  If someone is going to change one of 
> these he has to look up all the places they are used to evaluate the 
> impact of the change.  So I think this is 3 dozen lines of cruft.

This seems to be a philosophical difference of opinion, so I'll leave 
the code as-is here.  It is self-evident that programs which offer all 
of their options up-front are far more easily managed by the general 
population than those which require individuals to dig into the code, or 
perform mass-code substitutions.  I cannot cite or count the number of 
other Perl or shell scripts that I have encountered which do precisely 
what I have mimicked here, a trait which I admire.  Because this point 
seems to have devolved into a debate of ideals, please let it stand.

>>> You frequently say "delimited list".  Since you mean "whitespace 
>>> delimited list" I would just say "list".
>>
>>
>> I worked in the technical support department of a software company 
>> for three years, and today I'm a software architect responsible for 
>> extremely detailed specifications and painstakingly accurate 
>> documentation.  I don't short-hand anything when the meaning can be 
>> expressed more precisely.  This is to avoid as many forms of 
>> confusion as I can before confusion leads to problem.  As I'm sure 
>> you know, "whitespace" can include far more than just "space".  I 
>> specifically targeted only one type of whitespace character.
>
>
> Now that's very interesting.  Perhaps a lesson in how hard detailed 
> specs are to get right.  You only said "space delimited" once but if 
> you look at how that list is used you meant whitespace, as whitespace 
> is defined by the shell.  Sure, space works but it isn't required.  
> The other places that just say "delimited" are ambiguous.  The reader 
> immediately asks "delimited by what"?

As I learned, it is customary practice in both professional 
specifications and legal documentation to expressly define a term's 
deliberate usage the first time it is encountered and infer that meaning 
thereafter, unless otherwise specifically defined.  If you view the code 
top-down, it is easy to see that I have done exactly that.  I will 
accept this point from you as an opportunity to further reinforce the 
meaning of "single-space delimited list".  While it is true that the 
shell permits other whitespace characters, it is my intent to utilize 
just one.

>
>>> Debian doesn't have a service command for starting/stopping 
>>> services, at least not on a typical system.
>>
>>
>> I wasn't aware of this.  My only Linux experience is with Red Hat and 
>> derivative products.  That presents and interesting, though workable, 
>> problem.  Debian users could set the SERVICE_PATH variable to 
>> whatever the equivalent is, if there is one (I hope there is, 
>> otherwise I don't know how Debian users would handle services 
>> centrally).
>
>
> There's invoke-rc.d.  I don't know how similar it is--it's designed to 
> use in Debian packages and goes through the rc.d policy layer.  But 
> using the init.d scripts directly would be easy enough.

I'm happy to research this further, though it will be a challenge as I 
do not have access to another machine upon which I could install 
Debian.  If anyone reading this post is already familiar with the usage 
of this command, please let me know how I can incorporate it.

>>> Keeping a version and history for each function seems excessive. 
>>> Comments are nice but that stuff belongs in CVS or your changelog.
>>
>>
>> Most of these functions are portable to other, unrelated code 
>> projects.  I did that on purpose; I generally write code that can be 
>> very widely reused (dubbed "generally useful").
>
>
> So do you have a way to maintain these generally useful functions in 
> one place and include them where needed?  You're talking about library 
> routines but I wouldn't say that your approach is very portable or 
> maintainable.  Your functions depend on commands defined in global 
> variables and if you find a bug in one you have to edit it everywhere 
> you used it.  If you have some build system that puts this script 
> together out of canonical function definitions then that's pretty cool.

Over the years, I have observed that programmers simply copy-and-paste 
generally useful functions right out of other people's code and into 
their own regardless whether the original was in a library or right in 
the main code body.  This happens in pretty much every language.  While 
centralizing storage of my functions into a library makes economic sense 
for a large code base for the reasons you cited, I don't feel it is 
practical here, particularly because my code is published.  As the code 
already requires two distinct files, I just didn't want to add 
unnecessary complexity to it.  Also, I haven't learned to include files 
in shell script.  :)

It is my hope that people who find my functions useful will copy them 
as-is with the comment block intact.  Since I encourage people to copy 
only what they find useful, I felt it appropriate to state the version 
and modification history with each segment.  Thus, should I -- or other 
authors -- improve on these code blocks, they can be shared back to the 
source and elsewhere; vis a vis Free Open Source.

> I'm doing something like this, not for general functions like yours 
> but for system specific functions that are used in various scripts.  
> The previous approach was a combination of cut and paste and including 
> (sourcing) functions from a "library script".  Libraries are 
> problematic in shell so we're currently writing separate scripts for 
> each function that can be run as commands from our scripts.  That 
> gives us the flexibility of writing each command in the most 
> appropriate language.
>
> Unfortunately our version control and build systems are cumbersome and 
> unreliable.  So I don't think the "build from canonical source" trick 
> will work for us (but I'll throw it out for consideration tomorrow).

As a matter of exposure, I am interested in how you are handling this, 
regardless that it isn't quite working (my Linux exposure is strictly 
limited to server administration; while I develop and "live" on Windows 
machines).  I wrote this particular script as a one-shot experience 
without any intention of maintaining versioning or, for that matter, 
perfection.  Consequently, I didn't even remotely consider the 
techniques you prescribe.  I religiously utilize code versioning and 
partitioning for major projects, which this is not intended to be.

I was somewhat tickled that the result of this exercise seemed generally 
useful because of my approach:  using standards-compliant shell script 
techniques (per the book mentioned on my web site, less globalization) 
and an external list of backup files and directories.  I remembered the 
old thread here on CLUE-Tech where someone asked for my previous (and 
far less portable) shell backup script.  The rest is very recent 
history; I'm here to learn and to share.

>> On a personal note, I often wish other developers would document
>> there code as I have here.  When I have to bug-fix someone else code
>> on-site, I loath digging through "disconnected" documentation like
>> change-files or CVS comments.  This is a personal preference.
>
>
> The receives, returns, and example docs are nice (until you find out 
> they're wrong--not you but I've seen it happen).  If you're looking 
> for version and history for maintenance, I'd think you'd want to see 
> the old code too (why else do you care?) and that's in CVS.

This is somewhat OT, but I don't care to see the old code.  Good 
documentation says it all to me.  It would seem a waste of time to dig 
through old versions of the code, granting that the new version works 
and the old version did not.  In the reverse case, this question is 
philosophical.  I believe that the script is too small and its function 
too narrow to warrant such code versioning.  At this point, we may be at 
a point of conflicting ideals rather than standards, so please let this 
stand.

>>> There's a lot of string manipulation going on.  You should see 
>>> whether perl would be a better choice.
>>
>>
>> I specifically selected shell script in order to learn shell 
>> scripting.  This entire project is an exercise for me and I already 
>> have a major Perl project.  :)
>
>
> Fair enough.  I wrote a backup script (2 in fact) the first time I had 
> a system to back up.  I never will again because I have enough 
> experience to find a real backup system to use.  But at the time I 
> wanted to back up to CD and the systems I could find wouldn't do 
> that.  I learned some things doing that but I didn't go through the 
> effort to make it portable or maintainable.  It had one update when I 
> changed the backup media from CD to disk.  I should have installed 
> bacula instead.

This is where your Linux experience exceeds my own on a critical point.  
I am not at all familiar with other backup solutions for RHEL/CentOS4 
Linux servers except by passively seeing backup software names float 
around this list.  On one side, I have neither CD/DVD burner drives nor 
tape drives -- I back up (realizing that it is a single point of 
failure) to a central file server.  On the other, my backup needs are 
extremely specific -- I have no intention of backing up entire drives; 
only very specific files and a limited few directories.

My need for a backup solution may be atypical, which drove this whole 
project.  All of my Linux  machines are virtually headless servers 
(omitting a big KVM switch that is rarely used).  I am the only "real 
user" on these boxes; everything else is virtual.  Consequently, I know 
exactly what I need to back up.  I specifically accept the duty of fully 
rebuilding any machine that goes down unrecoverably and restoring only 
the few configuration files that I save with my backup routine.

>>> Use install -d rather than writing makedirs().  Probably mkdir -p 
>>> would work too.
>>
>>
>> My version also applies the chmod, which the others do not (as far as 
>> I can tell).  Additionally, the way I handle the component path 
>> elements automatically cleans up otherwise unpredictable paths.  For 
>> example, if you pass "//some/dir////broken" to my function, it is 
>> automatically cleaned up as "/some/dir/broken/".
>
>
> The install command will set user, group, and permissions, and do some 
> other things.  I think it handles unpredictable paths at least as well 
> as yours (//some/dir////broken isn't actually unpredictable).  What 
> does yours do with "/some/dir/../other/dir"?

The man page didn't clearly reveal this information and I saw the task 
as something very easy to code myself, so I did.  The behavior of, 
"/some/dir/../other/dir" is exactly what you'd expect it to be.  The end 
result goes to "/some/other/dir/".

>> Actually, there is quite a lot of error checking in main(), though of 
>> a different style than you seem to be looking for.  I'm measuring 
>> output rather than exit state in main(), although as you probably 
>> noted, I do error-check the command exit states in my other functions 
>> in a style you're probably looking for.
>
>
> I don't see that.  You don't check exit status and you don't do 
> anything with command output other than print it or redirect it (in 
> main(), I'm talking about).  How is that error checking?

Either the file or directory I'm looking for exists, or it doesn't.  If 
it does -- even if the file is incomplete -- that's all I want.  If it 
doesn't, then the task failed.  I want anything that I can get, even if 
it is incomplete.  This is output-based error checking rather than 
function-based and it is appropriate here given the philosophy of what I 
deem critical-vs-non-critical failure in this "take down services and 
create file system access points, create a single file, copy it 
elsewhere, restore the system" context.  Please grant to me that I'm 
only doing this type of error-checking in this single shell script 
because of inconsistent behaviors that I encountered from tar and 
smbumount while developing the script.

>> This is deliberate, mainly because I want to reverse the system-level
>> changes I've caused as soon as possible, regardless of error.  If I
>> open the samba connections, I want them closed right away.
>
>
> Huh?  If the smbmount fails you'll print an error and return 1.  But 
> you don't clean up the mount point you created.  And in main() you 
> continue on.  Writing the backup to a mount point probably isn't what 
> you want, though assuming there's space it may be ok.  The rmdir in 
> close_smb_share will fail (probably a good thing).
>
> I'm not saying you should exit if the smbmount fails.  But you 
> probably shouldn't write the backup there.  And you probably shouldn't 
> exit 0.

If smbmount fails, there is no mount point to clean up.  As for the 
rest, you found a bug in my code.  You're absolutely right.  I forgot to 
check that open_smb_share() succeeds before proceeding in main().  In 
fact, I also forgot to provide all the arguments to it (even though they 
are picked up by default values -- I prefer to be absolutely, 
painstakingly clear in code).

On the contrary, based on what I wrote previously, writing to a mount 
point is indeed exactly what I want.  Also, rmdir is what I want and it 
does work just fine so long as the mount works in the first place and I 
don't copy any files into that unmounted path (or the mount fails and I 
remove the directory immediately thereafter) -- which will be fixed as 
soon as I get home to fix the bug you found.  If I create the mount 
point directory, I destroy that directory.  If someone or something else 
created the path, then I likely can't destroy it and it fails exactly as 
gracefully (quietly or with a deliberate message) as I want it to.

Thank you for pointing that out!

>> If I disable services, I want them right back up ASAP.  I do not test 
>> whether services fail to stop because the list of services can be 
>> quite long and I won't abort the whole operation for a single failure 
>> (not to mention, users may put "service" names in the list that are 
>> actually not services -- because I can't tell at run-time whether the 
>> failure is user-driven or a true system failure, I choose to ignore 
>> the failure altogether).
>
>
> This isn't a question of aborting because a service failed to stop, 
> but of giving the user a useful indication that it did.  You print a 
> message (whatever service writes to stderr) and exit 0.  The user has 
> to look at the script output to figure that out.
>
> Is it intended that the only output will be on errors?  That would be 
> a useful comment (see below).  If not, then the user gets a mail from 
> cron every day and has to read it to see what happened.

This is standards-compliant behavior per my reading and -- being such -- 
I wasn't inclined to add additional documentation to this effect.  Yes, 
being a cron shell script, it outputs only on error.  If you still feel 
it necessary, I'll add documentation to this effect.

I'm changing my mind on silently ignoring service start/stop failure.  
Over the last week, my database server failed one night to restart MySQL 
at the end of the backup.  I have no idea why and my network was 
basically down for a whole day as a result (because I wasn't around to 
check on it -- I didn't even check any of my e-mail).  The service 
restarted just fine the next day at the end of the following backup, 
restoring my database server to its expected state.

This is definitely not what I want to happen regularly, much less to 
anyone else.

>
> [...]
>
>> A note on the samba failure question:  I am testing the smbmount 
>> command for failure, which -- as you can see from the way I abort 
>> with a user message -- is a critical failure.
>
>
> Where do you abort?  You test smbmount, and call showerror and return 
> 1 if it fails.  main() doesn't check the return or output of 
> open_smb_share() and continues on.  What am I missing?

You're right; I don't.  See above -- this is a bug in the code.  I truly 
appreciate you finding it!

>> You can see where I redirect output vs. where I do not.  If something 
>> really does fail -- that is critical to the success of the backup 
>> operation -- then the user will get a message from cron that night.  
>> If the failure can be muted because the net result is not a critical 
>> failure, then I mute it.  In other cases, I mute out of necessity.  
>> For example:
>>
>> $TAR_PATH -cf "$workspace_tar_fqn" -T "$BACKUP_LIST_FILE" 2>/dev/null
>
>
> Well, I don't know.  Suppose that tar fails because it runs out of 
> disk space--a tar file is created but it's incomplete.  You seem to be 
> counting on the fact that tar's error message gets to the user to 
> alert him that something went wrong.  It would be better to record the 
> failure, continue (or not as appropriate), and return an error status 
> at the end.  Then the user could script something more intelligent 
> than "read my email in the morning".  Like page me.  Or send an snmp 
> trap to my monitoring system.
>
> I understand your need to filter out useless tar output.  It would be 
> better to use grep -v for that rather than /dev/null.

That's a good idea.  I hadn't considered it.  :)   In any event, and for 
reasons posted in my previous reply, if tar produces ANY file, I want it 
off the system as-is and right away to the off-host backup store.  I 
back up very specific files.  If I can get any of them, I want them.

My literature doesn't clearly specify the behavior of exit status 
results in the presence of output piping (which is partly what led to 
the style of error-handling you see only in main()).  Consequently, I'm 
still not comfortable wrapping tar in an if test for this and the other 
reasons in this reply.

>>> Your comments in main() could use improving.  "Perform the backup" 
>>> isnt' nearly as helpful as stating what to do if there are 
>>> problems.  In the places you've decided to continue, why is that the 
>>> right thing to do?
>>
>>
>> Answered above.  As for the comments, I don't understand the 
>> complaint.   I'm documenting almost at the per-line level.  "Perform 
>> the backup" immediately precedes the tar command (making it an 
>> obvious remark), which is followed by (after the services are 
>> restored) the error-checking code for that tar operation -- in the 
>> else condition, you see the comment, "The backup tarball failed."  
>> There is no more information that I can express to a maintenance 
>> programmer without being overly redundant.  :)
>
>
> This is exactly what I mean.  "Perform the backup" is redundant.  It 
> says the same thing as "$TAR_PATH -cf".  Ditto for the "Restart 
> services" comment.
>
> "The backup tarball failed" is worse because it's misleading.  Besides 
> the fact that it's a ways down in the code, it means "this is what 
> happens if tar fails".  But it's only what happens if tar doesn't 
> create a tar file.  If tar fails but still creates a file the else 
> isn't run.
>
> So I don't value those kinds of comments.  Comments like "make the 
> backup; as long as something is produced continue on because that's 
> the best we can do" are better, I think.

I feel that the debate over non-executable comments is purely 
philosophical.  I will take your feedback in consideration and I will 
probably reinforce the distinction between "tar failing to execute" and 
"tar failing to produce an output file".

>
> Oh yeah.  That particular if should be rewritten.  I would do this:
>
> if [ ! -e "$workspace_tar_fqn" ]; then
>     # The backup tarball failed.
>     showerror "The backup catalog, $workspace_tar_fqn, failed..."
>     exit 1
> fi
>
> # Attempt to compress the backup archive per user preferences.
> case $compress_type in
> ...
>
> By reversing the sense of your test you get to put the error action 
> close to the check.  You don't have to search for it after a bunch of 
> normal actions (and their error checking).  You did it right with the
>
> if [ ! -e "$workspace_zip_fqn" ]; then
>
> case.

If I exit at that point in code, then I'm going to fail to restore the 
system to its previous state.  That is absolutely not what I want to 
happen, which is why you find no exit statement after "makedirs 
$WORK_PATH 700".  Your code is typical and correctly formed except that 
it does not meet the need here.  I hope you can see all of what I'm 
trying to do and the reasons why I'm deliberately using an alternate 
form of error-checking in main().  It is an imperative that the system 
be as it was before the script runs when it finishes.  I'm doing this 
with an economy of keywords.  For example, if I wrap tar, then I'll have 
to put more lines of code into the if/else than is necessary compared 
with the way I'm handling it now.  :)

> Thanks, this has been an interesting exercise.
>
> Dave 

Thank you for all your feedback!  Some is enlightening, some is gruffly 
delivered, all is good.  :)
_______________________________________________
CLUE-tech mailing list
CLUE-tech at cluedenver.org
http://cluedenver.org/mailman/listinfo/clue-tech