[CLUE-Tech] HP ethernet switch UDP broadcast storm

Tue May 11 09:23:26 MDT 2004

Hi everyone,

We had a very unusual problem yesterday in our core network.
This isn't strictly a Linux problem but I'm not sure who else
to ask.

We have HP Procurve 4000M managed ethernet switches.  Somehow
a bogus ethernet frame (or frames) entered the network from 
a PC on our network through one of the switches.

The traffic was all UDP, sent to the broadcast IP address for
our LAN subnet.

This traffic was replicated to the other switches but for
some reason instead of dying out, it was replicated back to
the original switch and turned into quite the packet storm.
The storm grew until it was saturating our network and causing
some serious connectivity problems.  Ethereal showed that
some 98% of the traffic on the network was this UDP broadcast
garbage.

We disconnected the PC whose IP & MAC address were listed as
the source, according to ethereal.  The traffic continued to
grow even after that PC was disconnected.  Ethereal still
showed that the traffic was from that IP & MAC address but
when we did a search in the switches for that MAC address, all
of the switches said it was on a MESH port and no switch would
admit to being the source of the traffic - they all blamed the
other switches in the mesh.

We've never before had traffic from a MAC address that we
could not trace back to a port on one of our switches.  This
is clearly a switch glitch, right?

We resolved the issue by disconnecting the "mesh" ports from
the switch that the PC was originally connected to.  The cables
were disconnected for about 30 seconds, and ethereal showed
that the traffic went away.  We reconnected the mesh ports and
the problem has not re-occurred.

We did NOT reboot any of the switches.

I did some google searches and didn't find anything about this.
Does anyone have any ideas?  I have an ethereal PCAP showing
the bogus traffic, and we know we weren't imagining things.

Here are some more details.

We have 7 80-port ethernet switches, all 10/100 except for one
1-port gigabit blade.  They are all connected with 4 "mesh" ports
back to "switch2" which is the concentrator.  Switch2 has 6
sets of 4 mesh ports, one set for each of the other switches
that are connected to it.  We have a hub & spoke "switch mesh"
with switch2 as the hub, that normally works quite well.

The switches are HP Procurve 4000M running Firmware revision 
C.09.16, which I think is the most recent firmware.

The bogus packet storm traffic itself was as follows:

UDP, source 172.16.1.108, sport 138, dest 172.16.255.255, dport 138, 243 bytes.
UDP, source 172.16.1.108, sport 1783, dest 172.16.255.255,dport 42508, 260 bytes.
	same, 132 bytes.
	same, 234 bytes.
UDP, source 172.16.1.108, sport 137, dest 172.16.255.255, dport 137, 92 bytes.

That traffic was repeated over and over again.  The dport 42508 
traffic was the majority.  We know that eTrust antivirus was the 
original source of those packets but the PC was disconnected from 
the network and we stopped the eTrust services on the eTrust 
server, and the packet storm continued anyway.

Every port on the switches was receiving this traffic, regardless
of subnet (we do have a couple of other subnets in use, and some
external ports).  The Linux users had to turn off nmbd because 
nmbd was using so much CPU resources handling the port 137 & port 
138 traffic.

We have only one VLAN and every port is in the same VLAN.  All VLAN 
ports are untagged.

Weird eh?  Anyone seen anything like this before?  What did you do
to resolve the problem?

Thanks,
Jim

-- 
Jim Ockers, P.Eng. (ockers at ockers.net)
Contact info: please see http://www.ockers.net/