[clue-tech] Caching-only BIND problems (long winded)

Bruce Ediger bediger at stratigery.com
Wed Feb 3 17:46:39 MST 2010


I decided to set up a "caching only" name server on my local area network.
It's giving some problems, and maybe someone could advise?

I've tried to put all the relevant details below, but it seems a bit
long-winded, and at the same time, inadequate.

Static IP address, "stratigery.com", Qwest DSL.

Cisco 678 (10.0.0.1 on the inside), doing NAT, but nothing else.
         - Multiple boxes on the inside of NAT: Slackware, Arch, printer,
         WRT54GL running dd-wrt, occasionally Windows XP laptop.

         - NAT includes permanently routing incoming UDP port 53 traffic to
         10.0.0.12

         - From a "show int wan0" command:
         Downstream Data Rate:       384 Kbps
         Upstream Data Rate:         864 Kbps


Slackware 12.0 box (10.0.0.12)
         - bind-9.6.1-P1, I replaced the BIND that came with Slackware 12.0
                 - only authoritative inside the LAN
 				- started from /etc/rc.d/rc.bind:  /usr/sbin/named -4 -c /etc/named.conf
         - dhcpd, so I can assign fixed IP addresses and names to various
         ethernet addresses.  dhcpd sends 10.0.0.12 as the DNS server, 10.0.0.1
         as the default route.

I followed the "caching-only" example BIND configuration that comes with Slackware.
It doesn't seem too hard.  Googling a bit turned up web pages that advocated doing
a caching-only DNS server for educational purposes, same as why I am doing it.

So, I set up what I believe is a "caching-only" BIND instance.

Everything worked fine after I hammered out problems with the config files.
Slackware box (10.0.0.12) could use "nameserver 127.0.0.1", and DHCP set up the
/etc/resolv.conf files correctly on other machines.

I had some early problems in that not all machines on my LAN used 10.0.0.12
as a nameserver, so the Cisco 678 NAT would lose outgoing queries from
10.0.0.12.  That's why I NAT'ed UDP port 53 permanently to 10.0.0.12 UDP port
53.

For a while, a couple of Chinese IP addresses hammered BIND, until I excluded
everybody but my LAN in /etc/bind.conf:

     allow-recursion {10.0.0.0/24; 127.0.0.0/8;};
     listen-on {127.0.0.1; 10.0.0.0/24; };

But now, my wife complains that "the internet is down again" or somesuch all the time.

Since I'm running Arch on an HP "Pavilion" with an Intel 82845G/GL
[Brookdale-G]/GE integrated graphics module, I suffer from near daily X
crashes.  After I sync-sync-sync-reboot and startx, Firefox always has multiple tabs to
re-open.  One or two out of 10 tabs will get a "Server not found" message, with
a "Try Again" button.  And they're not obscure sites:
      "Firefox can't find the server at www.nytimes.com."

I just tried it again by killing Firefox on the Arch machine, flushing bind's
cache (rndc flush on the Slackware machine), and restarting Firefox.

6 out of 50 tabs (!?!) came up with a "Server not found" message.

My control:
Kill firefox on the Arch machine, change /etc/resolv.conf to say:
nameserver 4.2.2.2   Restart Firefox.

I only get 1 tab out of 50 failing, and it comes up with a
"The connection has timed out", not "Server not found".

Anybody got any ideas?

I've considered a bandwidth problem, since I'm on a relatively low-speed DSL line,
but DNS queries really shouldn't take that much bandwidth, should they?


More information about the clue-tech mailing list