[clue-tech] nfs frustrations

Nate Duehr nate at natetech.com
Wed Jun 3 16:43:18 MDT 2009



On Wed, 03 Jun 2009 17:22 -0400, "Angelo Bertolli"
<angelo.bertolli at gmail.com> wrote:
> Ok so none of the options on an NFS mount do what I want it to do.  
> Maybe automount is the only solution, but for regular nfs...

autofs will mount and unmount things as they're "used", but it adds some
wait time when you first use the remote filesystem.  I also forget how
to tell it how long to wait before unmounting... it's been years since I
had to deal with developers machines that used it to auto-mount the
development playground server... 

But if the problem really is network connectivity going away, it won't
help... the NFS mount will still be "hung" by bad network connectivity. 
NFS is from another era where it assumes networks are perfect.  When
they're not, NFS becomes highly annoying.

> 1) There doesn't seem to be any way to get the system to automatically 
> unmount a filesystem that is not responding

Not really.  With "intr" turned on, applications aren't blocked from
sending signals... so they can be "un-hung" with manual intervention,
but it's still a pain.

> 2) There doesn't seem to be any way to tell NFS to fail within 1 
> minute.  I know the maximum retrans timeout is supposed to be 60 
> seconds, but after tweaking it
> 
> When a mount is unavailable (I'm using ls to test) ...
>     - soft/hard doesn't seem to make any difference (I'm using ro,noexec)
>     - retrans, timeo, retry don't seem to make any difference no matter 
> what settings I use

timeo should work, but it requires that there actually be file access
going on... if the mount is "quiet", it has no idea that there's
something to "timeout", so to speak.  If you already have network issues
going on, using soft would make your life a living hell.  I highly
recommend against it, unless you enjoy I/O errors in your application
level code.  (GRIN!)

>     - I've tried at least 10 combinations of the above, and ls returns 
> with an IO error within 3 - 5 minutes every time.

I've also farted around with it in the past.  There were a number of
implementation bugs in Linux NFS stacks over the years.  Those weren't
very helpful at the time.  Maybe they've cleaned those up.  The
strongest NFS implementation has always been the one in Solaris, but
like many things Solaris, it traded robustness for lack of features...
and you still couldn't really do anything about "hung" NFS mounts very
well.

> Oh well.  We're using nfs3.  Should I expect different behavior from nfs4

Doubt it.  NFSv4 really only dealt with authentication issues, and is
kinda a "too little, too late" approach to fixing things with NFS.  

I think other network filesystems, even the venerable and possibly hated
CIFS ("Windows shares") handle network outages better.  But there's a
whole new world of problems there... filenames, permissions,
ownership... Samba can also drive someone mad given the wrong set of
requirements for group access or other weird requests.

I guess the only GOOD thing about NFS is that it certainly shows you if
your network or servers aren't up to snuff.  If you can fix the
root-cause connectivity problems, it's plenty fast and maps better to
unix permissions and other things "Linuxy", but eventually NFS does
drive one mad when network or server issues are happening.

I've always wanted to try out OpenAFS, but I can't think of a good need
I have for it right now... 

--
  Nate Duehr
  nate at natetech.com


More information about the clue-tech mailing list