Is your network playing nice?

The other day I was working for a couple of hours with a colleague on trying to figure out why, regardless of what we did in terms of multipath configuration (bonded IP network+multiple virtual openiscsi initiator ports per bond + dm-multipath or multiple NICs on the same subnet + one openiscsi initiator per physical nic + dm-multipath), we were unable to get more than  between 110 & 125MB/sec of throughput from our RHEL 5.4 host to our EqualLogic iSCSI SAN volumes.The expectation was to see more than 200MB/sec throughput but of course, we were essentially seeing about 1 paths’ worth of throughput.

We tried everything. Different bonding algorithms (balanced-alb, 802.3ad, balanced-rr, etc), different IO testing tools, different network configurations, messing with the rr_io_min settings, tuning the network stack itself (increasing read/write buffer sizes) and then we got desperate and changed the IO schedulers (For the record: I consider that “an act of desperation” since I would never expect an IO scheduler change to give us a 100% increase).

Since we didn’t configure the physical aspects of the network and we’re playing with equipment hosted in a rather large and very established Lab SAN, it took me a while to think of and accept that it might be a physical problem and not a host configuration one. It was probably about time to walk to the switches and get a visual on the port status for the host links, the Controller port lights and any ISLs or (if present) stacking links.

Guess what… the ISL between the switch connected to the test host and the switch connected to the EqualLogic group was operating at 1/2 its intended capacity.

“Mystery” solved.

You know, sometimes the blinking lights are the most important troubleshooting tool. Sometimes, the color of the blinking lights are the difference between wasting 3 hours and not. And, sometimes, a person with more than 10 years experience designing, configuring and reviewing SANs in mission critical environments can be reminded – in very embarrassing ways – that you should always verify the physical aspects of the configuration first and that simple is more often than not better than the alternative(s). Even if the simple approach includes a hike, a flight of stairs and entering into the noisy data center.

There are no comments yet. Be the first and leave a response!

Leave a Reply

Wanting to leave an <em>phasis on your comment?

Trackback URL http://linux.sjolshagen.net/2010/08/27/is-your-network-playing-nice/trackback/