Monday, April 28, 2008

That Other Big Project !

I've alluded to our other big summer project a couple of times, but now I'm finally going to give you the low-down on it !

When I returned from vacation in early January, I fully anticipated that the core of our network would remain fairly static until the summer of 2009. This is when, according to the 10-year network plan we had been developing (a topic for another post), we would do a "fork-lift" upgrade of the core. Since that was only 16 months away, I set to work developing requirements, scheduling initial discussions with vendors, etc. As part of this process, I also started meeting with various departments and groups to get a little better handle on what they would need - in addition to the UITS projects I already knew the network would have to support such as VoIP and IPTV. What I learned over the course of January and early February led to a slight change of plans :)

One of the important things I learned is that there are several departments looking to take advantage of the high-capacity, high-bandwidth networked storage systems we've deployed over the last 2 years. Our MDSS tape storage system now has 24 10GbE connected "front-end" servers each capable of moving data in and out of the system at around 3-4Gbps. The Data Capacitor is a disk-based storage system that also has 24 10GbE connected servers each capable at moving data between the IP network and disk at around 7Gbps. I met or spoke with at least 4 departments during January that were all looking to move large data sets between their buildings and these central storage systems at very high bandwidths.

Our current architecture aggregates all the 1GbE connections from the buildings into layer-2 ethernet switches, applies 802.1Q VLAN tags and trunks all those VLANs over 10GbE links to the routers. This architecture provides a lot of flexibility and works fine for large numbers of 1GbE connected buildings using a few hundred Mbps of bandwidth each, but not so well for a dozen 10GbE buildings bursting up to 3-4Gbps each. In addition, those layer-2 ethernet switches are also out of empty ports and modules.

The solutions to this are:

(1) Move the layer-3 routing function onto the aggregation switches that terminate the fiber connections to the buildings. This removes the bottleneck between the layer-2 aggregation switches and the layer-3 routing switches. It also frees up quite a few 4-port 10GbE modules that can be reused to support 10GbE connections to buildings.

(2) Upgrade the 16-port 1GbE modules to 24 or 48 port 1GbE modules. This frees up slots in the aggregation switches Inow layer-3 switches) to install the 4-port 10GbE modules to support 10GbE connections to buildings.

The other big thing that I learned about in January was PCI-DSS !! For those of you who haven't heard about PCI-DSS, that stands for Payment Card Industry - Data Security Standards. Think HIPAA on steroids for merchants that accept credit/debit cards :) PCI-DSS has a laundry list of network and processes requirements that must be met in order to be compliant.

As I dug deeper into what it would take to support the PCI-DSS requirements, it became clear (to me at least) that MPLS Layer-3 VPNs was the way to go. We had already been discussing MPLS VPNs for a while and several other universities have already deployed MPLS VPNs to solve problems like this. The general problem is that there are many different groups (or groups of systems), that each have unique network requirements and that have users/machines spread across many different buildings on campus. In addition to PCI-DSS compliant systems, you have building control systems (e.g. HVAC, security cameras, door access systems, etc), IP phones, and School of Medicine and Auxiliary Services that supports users/systems across many buildings. In a nutshell, MPLS Layer-3 VPN allows you to "group" these systems into separate virtual routers, each of which can have different network services and policies (firewall, NAT, IPS, etc).

The Cast of Characters

I thought I should introduce some of the people working on the project so, when I say Jason did this or Dwight did that, you'll know who I'm talking about !

There's a core engineer group that is working through the myriad of engineering issues involved in getting the project from the RFP stage to the full-scale deployment phase. This is by no means a complete list of people working on the project !! A deployment of scale involves *many* people from all parts of the organizations !!

Ed Furia, who rose to fame as part of the video group, is the project manager for the wireless project. Ed's also involved in a lot of the engineering work, especially related to WPA2 Enterprise. Ed's done an excellent job of getting up to speed on the project in just a few weeks !

Jason Mueller, who hails from Iowa and the University of Iowa, started his tenure at IU just 5 short (or long) weeks ago. Jason has a seriously "mad" wireless skills - (How's that for modern pre-teen lingo!) - and really great experience from deploying Iowa's wireless network.

Dwight Hazen and Charlie Escue round out the group and have loads of experience and great ideas !

Tuesday, April 22, 2008

Equipment is rolling in...

It's always an exciting part of a project when those emails titled "packing at the dock" start rolling into my inbox :)

Last Friday a small box with about 80 LX SFPs arrived. These are for the upgrade of the core switches we're working on (I promise I'll put up a post describing what we're doing there soon). Yesterday I got a "package at the dock" email that said there were 16 boxes form Cisco - WOO HOO !!! This got my hopes up that the new Cisco 6500 interface cards we're waiting on started showing up early - which would be awesome ! Hans was nice enough to run over to the dock for me (remember, I'm hanging out in northern Virginia) only to find it was only the daughter cards that attach to the interface cards :-( So we'll enter them into the inventory DB and put them in the storage room and wait *patiently* for the cards they mate to. The new server hardware to upgrade the RADIUS servers showed up yesterday too !

Keep it coming !

Monday, April 21, 2008

Update from the road

I know you're all dying to hear what's been going on...and a LOT has happened since my last post on Thursday morning. I'm at the Internet2 Member's Meeting in Arlington, VA this week and will try to make use of what free time I have to check up again.

We met with the Messaging team last week to discuss the impact of the wireless project on DHCP and ADS. The biggest issue perhaps is the need to configured DHCP option 189 on the subnets the APs are on. Option 189 can pass up to 3 IP addresses to the APs which is how the APs figure out which controllers to associate with. The APs hold no configuration through reboot. Each time they boot, they will learn the IP addresses of the primary and backup controller via DHCP option 189 and will contact the controller to get their configuration.

The engineering team met on Friday morning. The primary topic was nailing down a tentative schedule, especially for the early part of the deployment. We plan to allow 1-2 weeks of testing after the equipment arrives. Then we want to deploy around 200 APs and let them "burn-in" for a couple of weeks before starting deployment in earnest. In addition to testing, these first 200 APs (about 7 buildings) will give us a chance to document and verify the deployment procedures, so we can move quickly and smoothly with the remaining buildings. We're tentatively scheduling the first 7 buildings during the week of May 12th with full-scale deployment starting the first week of June.

Thursday, April 17, 2008

But what about NAT ?

Okay, I know all the smart kids out there are screaming, "But why
aren't you using NAT?" The short answer is *time* - or the lack
thereof. The HP WESMs (Wireless Edge Services Modules) do have NAT
support built-in. We could also use an external firewall placed in
front of the WESMs to perform NAT - or rather PAT since we really want
all the wireless users to shared a small pool of public IPs. We also
realize that, as the number of simultaneous wireless users grows
extremely large (say more than 16,000) and as our overall pool of
unused IP blocks dwindles, we will absolutely need to consider NAT on
wireless in order to conserve public IPv4 addresses.

HOWEVER ! We also need to deploy a few thousand APs in the next 2-3
months *AND* roll-out WPA2 Enterprise *AND* roll-out a new guest
access portal. Oh, and we have this other little project to
completely overhaul the core of the network and deploy MPLS VPNs
before August (I'll dive into that project in future posts). SO,
since we have an unused /16 block at our disposal, we think that's the
best course of action. We won't allow incoming TCP connection (no
wirelessly connected servers) and wireless clients are transient by
nature, so switching to NAT later on should be fairly painless - well,
for users at least :)

Where have all the IPs gone....

...to all the wireless users, of course ! Several weeks ago I went
looking for IP subnets to assign to our new wireless SSIDs. What I
discovered is that we do NOT have the vast amounts of unused IP space
we once thought.

But first things first... the first step was to move our IP allocation
documentation from spreadsheets and flat files into a database.
Fortunately, we already developed a nice IP allocation database for
our support of networks like Internet2 and NLR, so we had a database
ready to go. Now that all our IP allocations for all our campuses
are documented in a single place, we can look at overall IP
utilization, delegate authorization to allocate addresses from
specific IP blocks, and do better planning of our IP allocations.
This will become very important as our IPv4 address space becomes more
scarce !

Once I started looking at this, I found that, especially in
Bloomington, we don't have a whole lot unused subnets and especially
not contiguous subnets. And it turns out a LOT of these are eaten up
by wireless users !

According to our monitoring software, we are seeing about 5,000
simultaneous wireless users in Bloomington these days. However, our
DHCP lease timers are in the 90-120 minute range [see note below].
So if someone uses wireless for 10 minutes and then shuts their
laptop, their IP address is reserved for another 80-110 minutes.
This means we actually have about 10,000 total host IP addresses
assigned to our wireless subnets ! That's 1/6th of a whole /16 or
legacy Class B block. But it gets worse !! Since users must use
VPN to get full access to wireless, most of these users are also
consuming an IP in the VPN address pool. So we have several thousand
more IPs assigned to those pools for a total of nearly 16,000 host IPs
assigned for wireless users. That's 1/4th of an entire /16 or Class
B !!!

Note: On DHCP lease timers, we'd love to decrease them, but there's an
issue with some VPN clients that, when they have a VPN connection,
they don't renew their lease properly because they send DHCP packets
improperly over the VPN tunnel instead of to their local subnet, so
when their DHCP lease expires they loose their network connection
until the VPN tunnel drops and they renew their lease over their local
subnet. We used to have shorter lease times, but many users
complained that their VPN connections kept dropping in the middle of
meetings and they would have to reconnect. This won't be an issue on
the new WPA2 Enterprise SSID !

Even with shorter lease times on the WPA2 Enterprise network, given
the level of growth we're seeing in wireless usage and all the new
wireless clients from the expansion into the dorms, we think we need
to at least allocate 16,000 host IPs to the new wireless network.
Since we can't reclaim the IP space from the current wireless network
until users transition to the new one, we need to come up with a new /
18 block of IPs. The *ONLY* block we can take this from is
140.182.0.0/16 which is the last unused Class B network we have.
Since we've never used this block, we need to give ample warning to
all system administrators incase they have host firewalls that need to
be updated. And THAT, my friends, is at the top of my to-do list for
today !

Wednesday, April 16, 2008

Thick and Thin

Back in the good old days, when you had to carry around a PCMCIA card in order use Wifi, and even before the term "Wifi" was coined, Wireless Access Points (WAPs or just APs) provided all the functionality of 802.11 wireless in a single device. Each AP minded it's own business and did it's own thing - communicating with clients over radio frequencies, encrypting and de-encrypting packets if necessary, and passing those packets onto the wired network. This was all fine and good when Wifi hotspots were - well, just that - "spots" - individual, isolated locations. But as companies and universities started deploying very large areas of contiguous coverage - sometimes with thousands of APs - some issues surfaced with this model.

For example, with individual autonomous APs, someone has to manually tune the power of radio signals on each AP so that, together, the APs cover an entire area. Also, as clients roamed between APs that had encryption enabled, the client needed to establish a new encryption key with each new AP which would take time and cause a short "outage" during the transition.

But, what if there was a central "controller" that controlled all the APs and knew everything all the APs knew ? Then the controller could tell each AP how strong it's radio signal needs to be in order to "fill" an area. And the encryption key could reside on the controller instead of the APs so that as clients roam between APs they don't need to renegotiate an encryption key.

Thus the terms "thick" or "fat" APs and "thin" APs were born ! With these new "controller-based" systems, the functionality of the traditional AP is split across the AP (or in HP speak Radio Ports) and one or more central controllers. This architecture provides a number of advantages in addition to the ones I mentioned and nearly all the enterprise-class wireless systems on the market today utilize this model.

What's this all about ?

Well, we really have a few main goals associated with upgrading the wireless network.

1) Replace the old wireless hardware that is now 6+ years old with a "modern" system that is much more capable. If you stay tuned, I'll cover what a lot of those new capabilities are...

2) Expand coverage area especially in the Bloomington Halls of Residence, but also in other areas of the IUB and IUPUI campuses. Our current wireless deployment in the IUB Halls only covers common areas such as lounges. We will be expanding to cover nearly every square inch of the Halls of Residence (ok, don't quote me on that every square inch thing) - that means coverage in every student room as well as all common areas. To give you an idea of the scale, we currently have a total of about 1,200 Access Points (APs) for the entire Bloomington campus. We will be adding 2,700 new APs *just* in the Halls of Residence ! That's a LOT of wifi !

3) Deploy WPA2 Enterprise to replace VPN as the mechanism for accessing wireless securely. I'll have one or more posts dedicated to discussing what WPA2 Enterprise is and how it works, but in a nut shell it provides an authenticated and encrypted wireless connection in a way that is MUCH more user-friendly than VPN. I've been using this is our test environment for several months and I can tell you, once you use WPA2 Enterprise, you'll never want to use VPN to connect to wireless again !

Let the fun begin !

After MANY months of working on the RFP for IU's next-generation wireless network, we FINALLY made an award this past Monday !! Of course that means now the REAL work starts !

Starting very soon now (soon = about 3-4 weeks from today) we will start deploying a new, improved and much larger wireless network using HP ProCurve's ZL wireless system. I know, I know - you probably have all sorts of questions like "why are we doing this ?" and "what do I get out of it ?". Hang tight ! I'm well over 30 and new to this new fangled blogging thing, but I'll try to get everyone up to speed over the next week or so. So stay tuned and you'll learned everything you wanted to know about wireless and probably a few things you didn't want to know !

If you want the readers digest version, hang on for a few more days and I'll post a link to the video podcast we just shot about an hour ago over at the Wells Library. I think I covered all the topics I had in my outline, so this should give you a reasonably decent overview.