Monday, September 17, 2012

Don't give up on your poor cabling jobs

One thing that I neglected to ask during my interview at $newJob was for a tour of their datacenter. I had asked a ton of questions about procedure, team structure, political divisions, etc and everything sounded great. I figured they really had their ducks in a row, the datacenter must be tip top too and let it slide. If I had taken a tour I probably would have declined the offer, but I'm glad that I didn't! This is a picture of the core switch (a Catalyst 6513) and the patch panels for our servers and 4th floor data drops:

That picture was taken in July. Since then, I've been part of a project that's migrated the core to a Nexus 7k and moved all of the workstation data lines to a separate stack of Catalyst 3750Gs with a Catalyst 3560 24-PoE powering access points and phones that don't have power available.

It was a very frustrating project for me to work on, because I felt like I was cleaning up a mess that I didn't make. My flex hours had turned into being up at 5am to catch the early train. I was hired to be an AD and vSphere specialist, and here I am running cable, coordinating switch cutovers and that sort of thing. It actually turned out to be one of the more rewarding things I've ever done, because of how thrilled my coworkers are with how it turned out.

Sunday, September 16, 2012

When vendors start making up terminology

I recently had to call Dell, because we had one drive fail and then three others go into "predictive failure" mode during the rebuild on a PowerVault MD1000. The 14 disks were in RAID 5 with two hotspares (this predates me, put your nooses and torches away. The box now runs RAID 6 with 1 hot spare).

While Dell was doing their normal routine diagnostics before shipping out 4 disks to me, they informed me that my array had a "punctured stripe." Punctured stripe? What the heck is that, I asked? They proceeded to explain how RAID 5 works with striping and parity, which I already knew. Then they said that in large volumes sometimes during a rebuild the stripe punctures because of inconsistent parity data and the whole volume needs to be completely re-initialized and restored from backup. "Oh, so you mean it's a URE?" "A what?" "An unrecoverable read error during the rebuild." "Um, I guess?"

The Dell storage "engineer" had never heard of a URE - which is an industry standard term! Upon further googling, it seems that Dell support is really the only company that uses this term. I had to make two follow up calls and got two different engineers and every single one said "punctured stripe." If there is an industry standard term for something, please don't try and coin a new phrase! It confuses your customers and makes us think that your storage engineers are clueless, because they don't know them.