Friday, June 27, 2008

Data Center - Take 2

Shortly after my last post - actually it was the next morning in the shower, which is where I do my best thinking - I was hit by the thought, "doesn't CX4 reach 15 meters instead of 10?". So when I got to work that morning, I looked it up and, sure enough, 10GBASE-CX4 has a 15 meter (49 foot) reach. And that, my friends, makes all the difference !

15 meter reach makes it possible to use CX4 as the uplink between all the TOR switches in a 26 cabinet row and a distribution switch located in the middle of said row. This reduces the cost of each TOR switch 10GbE uplink by several thousand dollars.

The result is that, for a server rack with 48 GbE ports, it's substantially less expensive to deploy a TOR switch with 2 10GbE (CX) uplinks to a distribution switch than to deploy a two 24-port patch panels with preterminated Cat6e cables running back to the distribution switch. For server rack with 24 GbE ports, it's a wash in terms of cost - the TOR switch option being a few percent more expensive. This also means that the cost of 10G server connection is significantly lower than I originally calculated.

The only remaining issue is that, in the new data center, the plan was to distribute power down the middle aisle (13 racks on each side of the aisle) and out to each rack, but to distribute the fiber from the outsides of the rows in. One thing that makes the TOR model less expensive is that you only need 1 distribution switch per 26 racks (13 + 13) whereas with the patch panel model you'd need multiple distribution switches on each side of the aisle (2 or 3 switches per 1/2 row or 13 racks). But having only 1 distribution switch per row means that there would be CX4 cables and fiber crossing over the power cables running down the middle aisle. We have 36" raised floors though, so hopefully there's plenty of vertical space for separating the power cables and network cables.

The other consideration is that it appears to me vendors will be converging on SFP+ as a standard 10G pluggable form-factor - going away from XENPACK, XFP and X2. If this happens, SFP+ Direct Attach will become the prevalent 10G copper technology and that I believe does only have 10 meter reach. That would lead us back to placing a distribution switch on each side of the aisle (1 per 13 racks instead of 1 per 26 racks) - which will raise the overall cost slightly.

Tuesday, June 24, 2008

Networking the Data Center

As if we had nothing else to do this year, we're busy planning for our new data center in Bloomington that will come on-line in the spring of 2009. I spent the better part of the afternoon yesterday working through a rough cost analysis of the various options to distribute network connectivity into the server racks, so I thought I'd share some of that with all of you. I'll start with a little history lesson :)

The machine rooms originally had a "home-run" model of networking. All the switches were located in one area and individual ethernet cable were "home-run" from servers directly to the switches. If you're ever in the Wrubel machine room, just pick up a floor tile in the older section to see why this model doesn't scale well ;-)

When the IT building @ IUPUI was built, we moved to a "zone" model. There's a rack in each area or "zone" of the room dedicated to network equipment. From each zone rack, cables are run into each server rack with patch panels on each end. All the zone switches had GbE uplinks to a distribution switch. We originally planned for a 24-port patch panel in every other server rack - which seemed like enough way back when - but we've definitely outgrown this ! So, when we started upgrading the Wrubel machine room to the zone model, we planned for 24-ports in every server racks. 24-ports of GbE is still sufficient for many racks, but the higher-density racks are starting to have 48-ports and sometimes 60 or more ports. This is starting to cause some issues !!

But first, why so many ports per rack ? Well, it's not outrageous to consider 30 1RU servers in a 44 RU rack. Most servers come with dual GbE ports built-in and admins want to use one port for their public interface and the second for their private network for backups and such. That's 60 GbE ports in a rack. - OR - In a large VMware environment, each physical server may have 6 or 8 GbE NICs in it: 2 for VMkernel, 2 for console, 2 for public network and maybe 2 more for private network (again backups, front-end to back-end server communications, etc). 8 NICs per physical server, 6 or 8 physical servers per rack and you have 48 to 64 GbE ports per rack.

So, why doesn't the zone model work ? In a nutshell, it's cable management and too much rack space consumed by patch panels. If you figure 12 server racks per "zone" and 48-ports per rack, you end up with 576 Cat6e cables coming into the zone rack. If you use patch panels, even with 48-port 1.5RU patch panels, you consume 18 RU just with patch panels. An HP5412 switch, which is a pretty dense switch, can support 264 GbE ports in 7 RU (assuming you use 1 of the 12 slots for 10G uplinks). So you'll need 2 HP5412s (14 total RU) PLUS an HP5406 (4 RU) to support all those ports. 18 + 14 + 4 = 36 - that's a pretty full rack - and you still need space to run 576 cables between the patch panels and the switches. If you don't use patch panels, you have 576 individual cables coming into the rack to manage. Neither option is very attractive !

Also, if you manage a large VMware environment, with 6 or 8 ethernet connections into each physical server, 10GbE starts looking like an attractive option (at least until you get the bill ;-). Can you collapse the 8 GbE connections into 2 10GbE connections ? The first thing that pops out when you look at this is that the cost to run 10GbE connections across the data center on fiber between servers and switches is simply prohibitive ! 10GBASE-SR optics are usually a couple grand (even at edu discounts), so the cost of a single 10GbE connection over multimode fiber is upwards of $4,000 *just* for the optics - not include the cost of the switch port or the NIC !

For both these reasons (high-density 1G and 10G) a top-of-rack (TOR) switch model starts looking quite attractive. The result is a 3-layer switching model with TOR switches in each rack uplinked to some number of distribution switches that are uplinked to a pair of core switches.

The first downside that pops out is that you have some amount of oversubscription on the TOR switch uplink. With a 48-port GbE switch in a rack, you may have 1 or 2 10GbE uplinks for either a 4:1 or 2:1 oversubscription rate. With a 6-port 10GbE TOR switch with 1 or 2 10GbE uplinks, you have a 6:1 or 3:1 ratio. By comparison, with a "zone" model, you have full line-rate between all the connections on a single zone switch although the oversubscription rate on the zone switch uplink is likely to be much higher (10:1 or 20:1). Also, the TOR switch uplinks are a large fraction of the cost (especially with 10G uplinks), so there's a natural tendency to want to skimp on uplink capacity. For example, you can save a LOT of money by using 4 bonded 1G uplinks ( or 2 pairs of 4) instead of 1 or 2 10G uplinks.

My conclusion so far is that, if you want to connect servers at 10GbE, you *absolutely* want to go with a TOR switch model. If you need to deliver 48 or most GbE ports per rack, you probably want to go with a TOR model - even though it's a little more expensive - because it avoids a cable management nightmare. If you only need 24-ports (or less) per rack, the "zone" model still probably makes the most sense.

Wednesday, June 18, 2008

Distributing Control

One thing I've done quite a bit of since taking on the network architect role last summer is meet with LSPs to discuss their networking needs. Just yesterday we met with the Center for Genomics and Bioinformatics, this morning we're meeting with the Computer Science department, and Friday with University College @ IUPUI. What I've learned is that there are many excellent LSPs and that they know their local environment better than we ever will.

As the network becomes more complex with firewalls, IPS', MPLS VPNs and such, I think we (UITS) need to find ways to provide LSPs with more direct access to affect changes to their network configurations and with direct access to information about their network. For example, if an LSP knows they need port 443 open in the firewall for their server, what benefit does it add to have them fill out a form, which opens a ticket, which is assigned to an engineer, who changes the firewall config, updates the ticket and emails the LSP to let them know it's completed ?

Okay, it sounds easy enough to just give LSPs access to directly edit their firewall rules (as one example) - why not just do this ?

First, you have to know which LSPs are responsible for which firewall rules. To do that you first need to know who the "official" LSPs are, but then you also need to know which IP addresses they're "officially" responsible for. It turns out this is a pretty challenging endeavor. I've been told we now have a database of "authoritative" LSPs that is accomplished by an official contact from the department (e.g. dean) designating who their LSPs are. But then you need to associate LSPs with IP addresses - and doing this by subnet isn't sufficient since there can be multiple departments on a subnet. The DHCP MAC registration database has a field for LSP, but that only works for DHCP addresses and is an optional user-entered field.

Second, you have to have a UI into the firewall configuration that has an authentication/authorization step that utilizes the LSP-to-IP information. None of the commercial firewall management products I've seen address this need, so it would require custom develop. The firewall vendors are all addressing this with the "virtual firewall" feature. This would give each department their own "virtual firewall" which they could control. This sounds all fine and good, but there are some caveats.... There are limitations to the number of virtual firewalls you can create. If you have a relatively small number of large departments, this is fine, but a very large number of small departments might be an issue. Also, one advantage of a centrally managed solution is the ability to implement minimum across-the-board security standards. None of the virtual firewall solutions I've seen provide the ability for a central administrator to set base rules for security policy that the virtual firewall admins cannot override.

Third, it is possible to screw things up and, in some rare cases, one person's screw up could affect the entire system. High-end, ASIC-based firewalls are complex beasts and you should really know a bit about what you're doing before you go messing around with them. So would you require LSPs to go through training (internal, vendor, SANS, ?) before having access to configure their virtual firewall ? Would they have to pass some kind of a test ?

I don't think any of these hurdles are show-stoppers, but it will take some time to work through the issues and come up with a good solution. And this is just one example (firewalls) of many. Oh, and people have to actually buy-in to the whole idea of distributing control !

Where’s Matt ?

Well, the last two weeks were very hectic on a number of fronts and I didn’t get a chance to post.

The Friday before last was the networking “all-hands” meeting. This was a meeting for *everyone* in UITS that is involved in supporting networks. My back-of-the-envelope head-count was up over 70, but with vacations, 24x7 shift schedules and whatnot, we ended up with about 40. I babbled on for a solid 90 minutes about the draft 10-year network plan, the discussion and work that went into developing it, and how that will translate into changes over the next year or so. After questions, we were supposed to have some great cookies and brownies, but much to my dismay, the caterers showed up an entire hour late - after most people had left.

After the snackless break, we asked everyone who had a laptop to come back for some wireless testing. We had 3 APs setup in the auditorium and during the presentation had done some testing to see how well clients balanced out across the APs (not very well as we expected) and what throughput/latency/loss looked like with certain numbers of users on an AP. The dedicated testing was all done with 1 AP for all users. Using speed tests and file downloads, we tried to take objective and subjective measurements of performance with different numbers of users associated with that AP (10, 20, & 40). The goal was to set a cap on the number of users per AP that, when reached by a production AP, would be used to trigger a notification so we proactively track which locations that are reaching capacity.

I spent last week out in Roseville California meeting with HP ProCurve. I don’t know about you, but trips to the west coast just KILL me ! Doesn’t matter what I do, I always wake up at 4am (7am Eastern) and, inevitably, my schedule is booked until 9-10pm Pacific. The meeting was excellent and useful - although you’re not going to get any details here because of our non-disclosure agreement.

Okay, now throw on top of this that I’ve had multiple contractors tearing up my house for the last 2 weeks, and it’s been NUTS !