[clue-tech] Multiple system backups

Sun Feb 12 20:49:22 MST 2006

William wrote:
> David L. Anselmi wrote:
[...]
>> Rather than define a variable to keep the path for every command you 
>> call, I would set the $PATH at the beginning and just call the 
>> commands.  If you insist on a variable for every command, don't call 
>> it FOO_PATH, just call if FOO.  It's a command, not a path.
> 
> I feel like you missed my "Power User" note.  :)  I specifically did not 
> set a $PATH to provide a maximal user-control capability.  Imagine this 
> script running chrooted, or a particular user (say, using Debian rather 
> than Red Hat) prefers alternative commands to the ones I chose to use. 
> Or, they prefer to supply particular other options to the commands over 
> and above my choices.  I decided that this was a good way to provide a 
> very highly granular system of control to the (power) user.

I did miss the power user note.  But it doesn't matter, find & replace 
accomplishes the same as well.  If someone is going to change one of 
these he has to look up all the places they are used to evaluate the 
impact of the change.  So I think this is 3 dozen lines of cruft.

>> You frequently say "delimited list".  Since you mean "whitespace 
>> delimited list" I would just say "list".
> 
> I worked in the technical support department of a software company for 
> three years, and today I'm a software architect responsible for 
> extremely detailed specifications and painstakingly accurate 
> documentation.  I don't short-hand anything when the meaning can be 
> expressed more precisely.  This is to avoid as many forms of confusion 
> as I can before confusion leads to problem.  As I'm sure you know, 
> "whitespace" can include far more than just "space".  I specifically 
> targeted only one type of whitespace character.

Now that's very interesting.  Perhaps a lesson in how hard detailed 
specs are to get right.  You only said "space delimited" once but if you 
look at how that list is used you meant whitespace, as whitespace is 
defined by the shell.  Sure, space works but it isn't required.  The 
other places that just say "delimited" are ambiguous.  The reader 
immediately asks "delimited by what"?

>> Debian doesn't have a service command for starting/stopping services, 
>> at least not on a typical system.
> 
> I wasn't aware of this.  My only Linux experience is with Red Hat and 
> derivative products.  That presents and interesting, though workable, 
> problem.  Debian users could set the SERVICE_PATH variable to whatever 
> the equivalent is, if there is one (I hope there is, otherwise I don't 
> know how Debian users would handle services centrally).

There's invoke-rc.d.  I don't know how similar it is--it's designed to 
use in Debian packages and goes through the rc.d policy layer.  But 
using the init.d scripts directly would be easy enough.

>> Keeping a version and history for each function seems excessive. 
>> Comments are nice but that stuff belongs in CVS or your changelog.
> 
> Most of these functions are portable to other, unrelated code projects. 
>  I did that on purpose; I generally write code that can be very widely 
> reused (dubbed "generally useful").

So do you have a way to maintain these generally useful functions in one 
place and include them where needed?  You're talking about library 
routines but I wouldn't say that your approach is very portable or 
maintainable.  Your functions depend on commands defined in global 
variables and if you find a bug in one you have to edit it everywhere 
you used it.  If you have some build system that puts this script 
together out of canonical function definitions then that's pretty cool.

I'm doing something like this, not for general functions like yours but 
for system specific functions that are used in various scripts.  The 
previous approach was a combination of cut and paste and including 
(sourcing) functions from a "library script".  Libraries are problematic 
in shell so we're currently writing separate scripts for each function 
that can be run as commands from our scripts.  That gives us the 
flexibility of writing each command in the most appropriate language.

Unfortunately our version control and build systems are cumbersome and 
unreliable.  So I don't think the "build from canonical source" trick 
will work for us (but I'll throw it out for consideration tomorrow).

> On a personal note, I often wish other developers would document
> there code as I have here.  When I have to bug-fix someone else code
> on-site, I loath digging through "disconnected" documentation like
> change-files or CVS comments.  This is a personal preference.

The receives, returns, and example docs are nice (until you find out 
they're wrong--not you but I've seen it happen).  If you're looking for 
version and history for maintenance, I'd think you'd want to see the old 
code too (why else do you care?) and that's in CVS.

>> There's a lot of string manipulation going on.  You should see whether 
>> perl would be a better choice.
> 
> I specifically selected shell script in order to learn shell scripting. 
>  This entire project is an exercise for me and I already have a major 
> Perl project.  :)

Fair enough.  I wrote a backup script (2 in fact) the first time I had a 
system to back up.  I never will again because I have enough experience 
to find a real backup system to use.  But at the time I wanted to back 
up to CD and the systems I could find wouldn't do that.  I learned some 
things doing that but I didn't go through the effort to make it portable 
or maintainable.  It had one update when I changed the backup media from 
CD to disk.  I should have installed bacula instead.

>> Use install -d rather than writing makedirs().  Probably mkdir -p 
>> would work too.
> 
> My version also applies the chmod, which the others do not (as far as I 
> can tell).  Additionally, the way I handle the component path elements 
> automatically cleans up otherwise unpredictable paths.  For example, if 
> you pass "//some/dir////broken" to my function, it is automatically 
> cleaned up as "/some/dir/broken/".

The install command will set user, group, and permissions, and do some 
other things.  I think it handles unpredictable paths at least as well 
as yours (//some/dir////broken isn't actually unpredictable).  What does 
yours do with "/some/dir/../other/dir"?

> Actually, there is quite a lot of error checking in main(), though of a 
> different style than you seem to be looking for.  I'm measuring output 
> rather than exit state in main(), although as you probably noted, I do 
> error-check the command exit states in my other functions in a style 
> you're probably looking for.

I don't see that.  You don't check exit status and you don't do anything 
with command output other than print it or redirect it (in main(), I'm 
talking about).  How is that error checking?

> This is deliberate, mainly because I want to reverse the system-level
> changes I've caused as soon as possible, regardless of error.  If I
> open the samba connections, I want them closed right away.

Huh?  If the smbmount fails you'll print an error and return 1.  But you 
don't clean up the mount point you created.  And in main() you continue 
on.  Writing the backup to a mount point probably isn't what you want, 
though assuming there's space it may be ok.  The rmdir in 
close_smb_share will fail (probably a good thing).

I'm not saying you should exit if the smbmount fails.  But you probably 
shouldn't write the backup there.  And you probably shouldn't exit 0.

> If I disable services, I want them right back up 
> ASAP.  I do not test whether services fail to stop because the list of 
> services can be quite long and I won't abort the whole operation for a 
> single failure (not to mention, users may put "service" names in the 
> list that are actually not services -- because I can't tell at run-time 
> whether the failure is user-driven or a true system failure, I choose to 
> ignore the failure altogether).

This isn't a question of aborting because a service failed to stop, but 
of giving the user a useful indication that it did.  You print a message 
(whatever service writes to stderr) and exit 0.  The user has to look at 
the script output to figure that out.

Is it intended that the only output will be on errors?  That would be a 
useful comment (see below).  If not, then the user gets a mail from cron 
every day and has to read it to see what happened.

[...]
> A note on the samba failure question:  I am testing the smbmount command 
> for failure, which -- as you can see from the way I abort with a user 
> message -- is a critical failure.

Where do you abort?  You test smbmount, and call showerror and return 1 
if it fails.  main() doesn't check the return or output of 
open_smb_share() and continues on.  What am I missing?

> You can see where I redirect output vs. where I do not.  If something 
> really does fail -- that is critical to the success of the backup 
> operation -- then the user will get a message from cron that night.  If 
> the failure can be muted because the net result is not a critical 
> failure, then I mute it.  In other cases, I mute out of necessity.  For 
> example:
> 
> $TAR_PATH -cf "$workspace_tar_fqn" -T "$BACKUP_LIST_FILE" 2>/dev/null

Well, I don't know.  Suppose that tar fails because it runs out of disk 
space--a tar file is created but it's incomplete.  You seem to be 
counting on the fact that tar's error message gets to the user to alert 
him that something went wrong.  It would be better to record the 
failure, continue (or not as appropriate), and return an error status at 
the end.  Then the user could script something more intelligent than 
"read my email in the morning".  Like page me.  Or send an snmp trap to 
my monitoring system.

I understand your need to filter out useless tar output.  It would be 
better to use grep -v for that rather than /dev/null.

>> Your comments in main() could use improving.  "Perform the backup" 
>> isnt' nearly as helpful as stating what to do if there are problems.  
>> In the places you've decided to continue, why is that the right thing 
>> to do?
> 
> Answered above.  As for the comments, I don't understand the complaint. 
>   I'm documenting almost at the per-line level.  "Perform the backup" 
> immediately precedes the tar command (making it an obvious remark), 
> which is followed by (after the services are restored) the 
> error-checking code for that tar operation -- in the else condition, you 
> see the comment, "The backup tarball failed."  There is no more 
> information that I can express to a maintenance programmer without being 
> overly redundant.  :)

This is exactly what I mean.  "Perform the backup" is redundant.  It 
says the same thing as "$TAR_PATH -cf".  Ditto for the "Restart 
services" comment.

"The backup tarball failed" is worse because it's misleading.  Besides 
the fact that it's a ways down in the code, it means "this is what 
happens if tar fails".  But it's only what happens if tar doesn't create 
a tar file.  If tar fails but still creates a file the else isn't run.

So I don't value those kinds of comments.  Comments like "make the 
backup; as long as something is produced continue on because that's the 
best we can do" are better, I think.

Oh yeah.  That particular if should be rewritten.  I would do this:

if [ ! -e "$workspace_tar_fqn" ]; then
	# The backup tarball failed.
	showerror "The backup catalog, $workspace_tar_fqn, failed..."
	exit 1
fi

# Attempt to compress the backup archive per user preferences.
case $compress_type in
...

By reversing the sense of your test you get to put the error action 
close to the check.  You don't have to search for it after a bunch of 
normal actions (and their error checking).  You did it right with the

if [ ! -e "$workspace_zip_fqn" ]; then

case.

Thanks, this has been an interesting exercise.

Dave
_______________________________________________
CLUE-tech mailing list
CLUE-tech at cluedenver.org
http://cluedenver.org/mailman/listinfo/clue-tech