Thursday, December 10, 2009

The Lab Experiment




I've mentioned our new testlab in a couple of tweets, so I thought I'd post some more information about what we're doing. The MDF in our new data center is quite spacious and well equipped. It includes 45 heavy-duty 2-post Panduit racks, overhead infrastructure for power cables, low-voltage copper cables (ie Cat5/6) and fiber, 36 inch raised floor and 1,800 AMPs of DC power. The production equipment is being built out from the front of the room toward the back, so we reserved the last couple of rows (10 racks total) for "test" equipment.

We've compiled a fair amount of equipment that can be used for testing and we also have a lot of equipment that moves through here to be "burned-in" and configured before it's sent into the field. All this equipment needs a place to live either temporarily or permanently. We have equipment from Ciena, Juniper, Infinera, Cisco, HP and others. Up until now it's be spread across several facilities, most of which had inadequate space, power and/or cooling. So we're very excited about having a wonderful new facility !



It's been amazing how much demand there is for this kind of testing environment. Equipment has been moved in quickly and as soon as people found out it was there, they wanted to use it. It's very clear that we'll need to designate a "lab czar" to make sure we maintain some semblance of organization in the lab - and it's clear that the lab czar better not be me ! The grand vision is to have a lab environment where engineers can "check out" specific devices, automatically build cross-connects between devices to create the topology they need and have the device configs reset to default when their work is completed. We're a long way from this, but will hopefully keep moving steadily in that direction over the next 12-24 months.



Friday, November 20, 2009

Yeah, we can do that !

Do you ever read about a new technology and go, "Man, that's so cool ! We should be doing that !". Only to be disappointed once you started digging into it a bit ?

That's exactly what happened to me after I read the following whitepaper...

Connecting to the Cloud with F5 BIG-IP Solutions and VMware VMotion

Some of you may have read my post a while back about how cool Application Delivery Controllers (aka load-balancers) are. Everything I said is probably true (note to self: reread that post and edit if necessary), but man, once you start digging into what you can do with one of those things - it strikes fear into the heart of every decent network engineer !

And now it looks like these things may bring us the holy grail of virtualization - live migration across a wide-area network ! I'm onboard !!

F5 demo'd this at VM World in late August and it's now late November. We have 4 brand new F5's that aren't in production yet, 2 in each of our data centers separated by about 60 miles. And we have plenty of VM's to throw into the mix. So I figured I'd download the configuration guide and see what it takes to set this up....oh, there's no configuration guide. Hmmm, maybe the documentation is on F5's devcentral site.....no. Okay, well our F5 sales engineer is coming in today so I'll just ask him....well he didn't have very many details and he referenced the documentation on their website...which of course I can't find. And what he did tell me made me realize just how many moving parts are involved and how complex the whole setup really is. Well, this could end up being really cool stuff, but it looks like it's not quite soup yet.

And then there's the issue of whether this is the right way to solve this problem. I'm left with the feeling that this is a really ingenious solution to a problem using the tools we already have but that what we really need are some new tools !

In our case we can theoretically bridge VLANs between our data centers since we have dark fiber. This would theoretically simplify things, but we haven't done this yet because of concerns about bridging loops and broadcast storms taking down BOTH of our data centers! If we could essentially route Ethernet MAC addresses using TRILL or similar functionality developed by the IEEE - perhaps that would offer a simpler solution to this problem !

Wednesday, October 21, 2009

What's up with IPv6 ?

I'm in Dearborn Michigan this week for the NANOG and ARIN meetings. NANOG = North American Network Operators Group. NANOG is very much like the Internet2 Joint Techs Workshops for the commercial sector. It's where network engineers get together to discuss cool new things they're doing. And, like most of these things, it's a lot about social networking - a chance to meet face-to-face with the people you email and IM with every day. ARIN = American Registry of Internet Numbers. ARIN is the non-profit that is responsible for handing out Internet number resources - primarily IP addresses.

IPv6 is a huge topic of discussion this week. Yahoo presented on their IPv6 roll-out which they completed last week. Comcast just presented on their deployment. Google has IPv6 deployed as well. I saw a news story last week that the number of ISPs requesting IPv6 addresses from ARIN has gone way up. In fact, in the last quarter (last month maybe) ARIN received more requests for IPv6 addresses than IPv4 addresses for the first time ever. It seems that IPv6 is *finally* getting some traction. My sense is that this is the real deal and IPv6 is really going to happen now.

It's funny though to see all the hype around IPv6 in the commercial sector. We rolled out IPv6 on the Internet2 network in 2000 and had IPv6 enabled on every data jack at IU around 2001. WRT IPv6, attending a NANOG in 2009 is much like attending an Internet2 Joint Techs Workshop in 2000 or 2001.

Monday, September 28, 2009

Duct work !

I remember the first time I was in a meeting about the deployment of a computer system and there were plumbers at the meeting ! Now there's more plumbing under the raised floors than anything else. Well, last week I got to work with the guys from the sheet metal shop while they fabricated duct work for our Cisco Nexus 7018 switches
. This turns the side-to-side airflow into front-to-back airflow. The sheet metal shop did a great job on very short notice !!






-- Post From My iPhone

Monday, September 21, 2009

Video Killed the Radio Star

Okay, now I'm sure you're thinking, what the heck does that great Buggles hit from 1980 have to do with networking ? Or you may be cursing me because you'll be hearing "oh, uh oh" in that annoying high pitched female voice ALL DAY !!

Regardless, after years of anticipation (7 to be exact), on Friday Sept. 11th, the IEEE finally ratified the 802.11n standard. Of course, quite a few enterprises, including many university campuses, have been deploying 802.11n since at least 2007 when the WiFi alliance started certifying equipment to draft 2 of the standard. But long before the standard was ratified and even before there were many enterprise deployments, there was no shortage of articles heralding the end of wired Ethernet. I can't count the number of times I've been asked if we would stop pulling wiring into new buildings and go 100% wireless. My emphatic response has always been "No, wireless will be hugely popular, but wires are not going away any time soon".

So when I received an email notification from The Burton Group last week about a report entitled "802.11n The End of Ethernet", I was pretty sure what I would find inside the report. Still, I knew there was a good chance I would have to field questions about the report, so I thought I better check it out. What I found is that the report basically supported what I've been saying, although that may not be apparent on the surface.

One key thing to keep in mind is that network usage and requirements at a research university are NOT the same as your typical business. For example, the report points out that 802.11n will probably not have sufficient bandwidth for "large" file transfers. But how do they define "large" ? The report defines "moderate" file sizes as 2-8 MB, so presumably anything larger than 8-10MB or so would be considered "large". This is probably accurate for a corporate network where you typically have relatively small connections (1-10 Mbps) to the Internet. At IU we have a 10 Gbps (that's a 'G') to the Internet and it's quite common for people to very large (100MB+) files from the Internet. It's also common for people to load very large (100MB+) files such as Microsoft Office or Adobe Photoshop over the local network. The last time I downloaded Microsoft Office from IUWare (MacBook Pro on a Gigabit Ethernet data jack), I got well over 400 Mbps and it only took about 15-20 seconds to download ! Never mind the researchers who want to upload and download files that are 50-100 GBs and larger or IPTV with streams of 6-8 Mbps per user !

Typical, real-world performance for 802.11n is around 120-150 Mbps. But, keep in mind, this is half-duplex, shared bandwidth for each Access Point (AP), so performance for individual users can vary greatly depending on how many users are in the same area and what they are doing. At a recent Internet2 workshop in Indianapolis where we supplied 802.11n wireless, I often saw 50+ Mbps on downloads over 802.11n, but sometimes performance dropped down to around 10-15 Mbps. And if you're further away from the AP with lower signal strength, you could see even lower throughput.

Another important factor is that 802.11 uses unlicensed spectrum and therefore is subject to interference. Microwaves, baby monitors, cordless phones - there are many sources of potential interference. In a corporate environment, it might be easier to prevent sources of interference, but at a university, especially in student residences, it is quite difficult. I've been told that most students in our dorms connect their game systems to the wired network, even though they have wireless capabilities, because they have experienced drops in wireless connectivity that interrupted online games at inopportune moments. A 30 second wireless drop-out while your neighbor heats up some leftover pizza at 3am may not seem like a big deal, unless you've been playing on online game for the last 8 hours and are just about to win when the connection drops !

The third important factor, IMO, is the use of IP for what I'll generically call "appliances". Cash registers, card readers, security cameras, building automation systems, parking gates, exercise equipment...the list goes on and on and they all used wired connections. If the use of wired Ethernet for PC's decreases, it's possible the increase in wired connections for these "appliances" will more than make up for it !

IMO networking is not a fixed sized pie that is divided between wired and wireless such that when one slice gets bigger the other slice gets smaller. The pie is getting much bigger all the time - it just so happens that going forward, growth in the wireless slice will probably dwarf the growth in the wired slice !

So, just as radio is still alive and well almost 30 years after the introduction of the video, I suspect wired Ethernet will be alive a well many years from now.

Monday, April 13, 2009

Why Load-Balancers are Cool !

I suppose the term "load-balancer" is out of date and has been replaced by the term "Application Delivery Controller", but regardless of what you call them, they are pretty powerful and can do a lot of cool things ! Sysadmin types have known this for years, but as a network guy who just recently started digging into these, I'm a bit geeked about what you can do with these.

The background here is that we use load-balancers from both Zeus and F5 depending on the application. In preparing for the move to our new data center, we're testing some new F5 hardware and software and reconsidering how these things get connected into the network.

One goal we have is to enable failover between our data centers in Indianapolis and Bloomington (see my previous post on this). We had been looking at DNS based solutions (Global Server Load-Balancers), but for a number of reasons Route Health Injection (RHI) is a much better option for us. A couple of weeks ago we got together with our Messaging team to setup and test RHI. Without too much manual reading and just a little bit of poking around, we were able to get RHI working within about 15 minutes and boy was it slick. We injected a /32 BGP route for a DNS Virtual IP from our F5's at Indy and Bloomington and weighted the routing preferences so the Bloomington path was preferred. DNS queries resolved on the Bloomington server until we shutdown 'named' on the Bloomington server. Within a few second, queries to the same IP address were resolved by the server in Indy. Turned 'named' back up in Bloomington, and queries went back to Bloomington. One problem solved !

Operationally this points out how load-balancers are both network and server ! Server-wise they do things like SSL-offload so your SSL certs actually live on the load-balancer --- so your server admins probably want to manage these. Network-wise, they're now running BGP routing with your core routers and the routing configuration on the F5 (based on Zebra) looks a lot like Cisco's IOS --- so your network admins probably want to have some control of these functions.

Now, what if I want to add IPv6 support to those DNS servers ? Well, I could go and enable IPv6 on all my DNS servers, but with a load-balancer, I could just enable IPv6 on the load-balancers and have them translate between v6 and v4 . After all, the load-balancer is essentially acting like an application-layer proxy server. In under 2 minutes I added a new Virtual IP (IPv6 in this case) and associated it with the pool of DNS servers we already configured in our test F5s and, without touching the servers, I was resolving DNS queries over IPv6 transport ! According to their documentation Zeus supports this IPv6 functionality as well. So, instead of hampering IPv6 deployment, as is the case with many network applications such as firewalls and IDPs, these load-balancers are actually making it easy to support IPv6 !

Tuesday, February 24, 2009

The IP Timemachine

I started working when I was 15 years old. It was at the trucking company my dad worked for. I finished all the manual labor they gave me by the middle of the summer, so I got sent into the office to do data entry. That got me started on computers and the rest is history.

I bet you're wondering what this has to do with networking? Well, I remember clearly taking my timecard to the time clock to punch in and out every day. That machine was built like a tank that would last forever ! You know the kind I'm talking about - the big grey metal box with the clock on the front and the big button on the top !

Well, I'm guessing they must look a bit different these days since I found out today that time clocks are getting connected to our IP network ! Time clocks !

Here's the list of devices I know are connected to our network (off the top of my head):

Phones, cellphones, security cameras, heating and air conditioning systems, electric meters, door locks, parking gates, cash registers, laundry machines, fire alarms, MRI machines, game systems, TVs, digital signs, clocks, and probably many more I'm not aware of.

Crazy stuff !




-- Post From My iPhone

Solving Cross Data Center Redundancy

First things first. Yes, I realize it's been almost 3 months since my last post...shame on me ! The good news is that we've been quite busy working on lots of new things, so I have plenty of material to keep me writing for a while !

I'd like to start with a topic I've been thinking about a lot lately (today in particular) that I think many people are interested in. That topic is how do you provide automatic, transparent fail-over between servers located in different data centers. Ever since the I-Light fiber between Indianapolis and Bloomington was completed and the ICTC building was completed, we've been receiving requests to enable server redundancy between the two campuses. Seems easy enough, so why haven't we done this yet ?

There are really 3 main options available:

(1) Microsoft Network Load-Balancing or similar solutions. These solutions require the 2 servers to be connected to the same broadcast domain. They usual work by assigning a "shared" MAC or IP address to the two servers along with various tricks for getting the router to direct traffic destined for a single IP address to 2 different servers. Some of these packages also include software that handles the server synchronization (eg synchronizing databases, config files, etc).

(2) Global Server Load Balancing (GSLB). These are DNS based solutions whereby the GSLB DNS server is the authoritative DNS server for a domain and returns different A records based on the IP address of the client (or rather the client's recursive DNS server) and the "health" of the servers. In many cases, "servers" are actually virtual IP addresses on a local load-balancing appliance.

(3) Route Health Injection. These solutions involve a local load-balancing appliance that "injects" a /32 route via BGP or OSPF into the network for the virtual IP address of a server. Typically you have a load-balancing appliance in each data center that injects a /32 for the server's virtual IP address. The key is the virtual IP addresses are the *SAME* IP address in both data centers. It's NOT the same broadcast domain, just the same IP address and the actual servers are typically on private IP addresses "behind" the load-balancing appliances. You can achieve an active-passive configuration by setting the routing metrics so that the announcement at one data center is preferred over the other. *OR* you can set equal route metrics and clients will follow the path to the "closest" data center based on network paths -- this is referred to as "anycast".

So you're thinking "these all sound like good options, surely there must be some gotchas?"....

The issue with option #1 is that you have to extend a broadcast domain between the two data centers - in our case between Indianapolis and Bloomington. As I think I covered in an earlier post, "broadcast domain" == "failure domain". Many types of failures are contained within a single broadcast domain and by extending broadcast domains across multiple facilities, you increase the risk of a single failure bringing down multiple systems. Especially in a university environment where management of servers is very decentralized, this can become very problematic. I can recount numerous occasions where someone made a change (ie did something bad) that created a failure (eg loop, broadcast storm, etc) and all the users in multiple buildings were affected because a VLAN had been "plumbed" through multiple buildings for whatever reason. However, these solutions are typically very inexpensive (often free), so they are very attractive to system owners/administrators.

There are 2 main issues with option #2. First, in order to provide reasonably fast failover, you have to reduce the TTL on the DNS records to a relatively small value (eg 60 seconds). If you have a very large number of clients querying a small set of recursive DNS servers, you may significantly increase the load on your recursive DNS servers. The other issue is with clients that ignore the DNS TTL and cache the records for an extended period of time. GSLB solutions are also significantly more expensive than option #1 solutions. One big advantage of GSLB is that the servers can literally be anywhere on the Internet.

Option 3 is actually quite attractive in many ways. One downside is that the servers must reside behind a local load-balancing appliance. That's not entirely correct. You could install routing software on the servers themselves, but with many different groups managing servers this raises concerns about who is injecting routes into your routing protocols. The need for load-balancing appliances significantly increases the cost of the solution and limits where the servers can be located. In order to reduce costs you could place multiple systems behind a single load-balancing appliance (assuming there's sufficient capacity), but that raises the issue of who manages the appliance. There are virtualization options of some load-balancers that allow different groups to manage different portions of the configuration, so there are some solutions to this.

We are currently exploring both the Global Server Load-Balancing and Route Health Injection options in the hope of developing a service that provides automatic, transparent (to the clients) failover between at least the two UITS data centers and possibly (with GSLB) between any two locations.