[CLUE-Tech] HP ethernet switch UDP broadcast storm

Tue May 11 17:55:23 MDT 2004

Hi Nate,

Nate Duehr wrote:
> 
> Jim Ockers wrote:
> 
> >We thought about the possibility of a loop but we decided that couldn't
> >be the problem because we have spanning tree turned off.
> >  
> Definitely sounds like a loop.  With spanning tree off, the switches 
> have no ability to fix the problem themselves.  Spanning tree KILLS 
> loops, it doesn't CREATE them.  The logic behind the decision made in 
> the above sentence is fundamentally flawed.

You're correct, I wrote that wrong.

What we decided was that spanning tree wasn't the problem, since it
was turned off.  (I think enabling spanning tree can cause some
other problems.)

> Too late now, but next time get a copy of the ARP table from the "core" 
> switch before resetting it.  I bet you'll find it had a MAC entry to 
> another switch that had an ARP entry for that address pointing right 
> back at the core. 

Unfortunately I don't think that these switches maintain an ARP
table that is associated with a port.  (And since they are layer 2 
devices, I think they shouldn't maintain an ARP table anyway,
except for their own management IP address & interface.)

The switches do maintain an "address table" like this:

  000102-674595  Mesh
  000102-67459d  Mesh
  000102-684003  Mesh
  000102-684a76  E8
  000102-6eb369  J8
  000102-6eb63b  Mesh
  000102-be209d  H2

where the MAC address is associated with a specific port, or else it
is somewhere else on the "Mesh" switch network.

All of the switches said that the offending MAC address was on the "Mesh"
port (in their address table), during the broadcast storm.

> The "bogus packets" the PC was sending could have had their MAC address 
> mangled to specifically cause the problem. 
> (Perhaps the PC user was doing something silly like ARP hijacking to try 
> to sniff traffic or someone else was?)  With spanning-tree turned off, 
> your network is vunerable to all sorts of silliness, including the issue 
> you mentioned.

We think the problem was unintentional/accidental, and the offending
MAC address was the MAC address of the PC in question, which was
disconnected.  You're right that we are vulnerable to that sort of
activity, but I would have expected the switches to eventually divulge
the port that the offending traffic was coming from, instead of all
of them claiming it was on the Mesh.

We tried static ARP entries to force some test systems to associate the 
offending MAC address with the offending IP address (since ARP wasn't
working normally of course).  However nothing worked - the offending
system just wasn't there, even though it was supposedly generating 
lots of traffic.

> It's EXTREMELY rare, but you could also just be darn unlucky and 
> actually have two machines with duplicate MAC addresses. 

Agreed.  I don't think that's the case here since the MAC address
went away when we broke the Mesh loop.

> A span port (a port that is configured to see ALL traffic on the core 
> switch) and a copy of arpwatch running on a linux box on that port could 
> be useful.  (Heck, it's useful to have a span port hooked to a linux box 
> for all sorts of reasons... that and good ol' tcpdump will find enough 
> things wrong to keep a network admin busy for a month in most networks 
> -- but your hardware has to be able to keep up with it... both the 
> switch itself on the backplane and the linux box network card... a lot 
> of people use laptops for this, but some older PCMCIA cards and linux 
> drivers won't really keep up with 100Mb/s.)

Thanks for the tip - I will look into arpwatch.

Our network is large enough that we occasionally get "late collision"
errors from the switches.  I think that perhaps an ethernet topology
problem (propagation delay, processing delay, etc.) could cause some
problems like this.  What do you think?

-- 
Jim Ockers, P.Eng. (ockers at ockers.net)
Contact info: please see http://www.ockers.net/