Configuring IPMP on NexentaStor 3

by nc on September 10, 2010

At my day job, we have recently started rolling out NexentaStor 3 for our VM image storage as a trial. If all goes well, our long-term plans are to eventually migrate all storage from Netapp to NexentaStor. As we started rolling out our NexentaStor trial, one missing feature we quickly ran across is the lack of IPMP (IP Multipathing) support. The network configuration interface that they provide can currently configure aggregated interfaces with the LACP protocol, but they do not have any mechanism to configure IPMP to aggregate interfaces from multiple switches. We were able to work out an approach to configure IPMP manually, and received Nexenta’s blessing to use it in our environment. (Important note: if you are going to try this on a licensed copy of NexentaStor, please check with your support team to ensure that they are ok with you making these changes.)

[ad name=”Google Adsense 728×90″]

Updated 2010-Sep-16 — Added information on how to add static routes to configure ipmp ping targets

Server hardware configuration

First of all, I should detail what we are trying to configure. Our production machines are quite similar to the SuperMicro build I documented earlier, with a few varying specs. Here’s what is in it:

2x Intel E5620 CPUs
48G memory (6x 8G modules)
10x 1TB Seagate Constellation ES SAS drives
6x Intel 32gb X25-E drives
2x Crucial 128gb RealSSD drives
2x Intel Gigabit ET quad-port NICs

The machine has an 8TB license, with 2 of the disks configured as hot spares. The Intel SSDs are configured as 3 log-mirrors, and the RealSSDs are configured as cache devices.

Caveats

The only major caveat that I’ve hit with this configuration is that the ipmp interfaces will not be viewable via the Nexenta utilities. You can still see all the underlying interfaces; just not the ipmp ones. It’s mostly cosmetic, but is distracting and annoying.

Of course, YMMV – this worked for me, but no guarantees that it will work for you! :)

Network configuration

Desired configuration

Here is the network configuration we desire:

LACP Aggregate #1 – 4 gigabit links to Switch-1
LACP Aggregate #2 – 4 gigabit links to Switch-2
IPMP Interface #1 (balancing LACP1 and LACP2) – Native VLAN on NICs; access to our management/backend network
IPMP Interface #2 (balancing LACP1 and LACP2) – VLAN ID 100; VM storage network
IPMP Interface #3 (balancing LACP1 and LACP2) – VLAN ID 200; NFS storage network

General configuration steps

As far as I know, you cannot do VLAN tagging on top of an IPMP trunk on Solaris, which means that we need to create the VLAN interfaces on each of the aggregate interfaces, and then create three separate IPMP interfaces – one per vlan. Here are the basic configuration steps:

Via NMC: Create individual LACP aggregates (aggr1 and aggr2), with igbX interfaces as members.
Via NMC: Create VLAN interfaces ‘100’ and ‘200’ on top of both ‘aggr1’ and ‘aggr2’. This will create the interfaces ‘aggr100001’, ‘aggr100002’, ‘aggr200001’, and ‘aggr200002’.
Via NMC: Configure an IP address from within each VLAN on each of these six interfaces. This will allow IPMP to use ICMP probes in addition to link detection to try to find failed links.
Via the console: Configure the three IPMP interfaces, and add the six aggr interfaces to the proper IPMP groups.

NMC – Create LACP aggregates

This assumes that whatever interface you configured during installation is *not* one of the interfaces you desire to be part of the aggregate. If that is not true, you will need to be on the system console (via IPMI hopefully!), and destroy that interface first. Here is an example of how to create the aggregates (output from NMC; so this is how it ends up being configured):

nmc@nexenta:/$ setup network aggregation create
Links to aggregate  : igb2,igb3,igb4,igb5
LACP mode           : active
LINK            POLICY   ADDRPOLICY           LACPACTIVITY  LACPTIMER   FLAGS
aggr1           L3,L4    auto                 active        short       -----
nmc@nexenta:/$ setup network aggregation create
Links to aggregate  : igb6,igb7,igb8,igb9
LACP mode           : active
LINK            POLICY   ADDRPOLICY           LACPACTIVITY  LACPTIMER   FLAGS
aggr2           L3,L4    auto                 active        short       -----

NMC – Create vlan interfaces on each aggregate interface

nmc@nexenta:/$ setup network interface aggr1 vlan create
VLAN Id : 100
aggr100001: flags=201000842 mtu 9000 index 39
        inet 0.0.0.0 netmask 0
        ether 0:1b:21:6c:3c:de

nmc@nexenta:/$ setup network interface aggr2 vlan create
VLAN Id : 100
aggr100002: flags=201000842 mtu 9000 index 40
        inet 0.0.0.0 netmask 0
        ether 0:1b:21:6c:3d:de

nmc@nexenta:/$ setup network interface aggr1 vlan create
VLAN Id : 200
aggr200001: flags=201000842 mtu 9000 index 41
        inet 0.0.0.0 netmask 0
        ether 0:1b:21:6c:3e:de 

nmc@nexenta:/$ setup network interface aggr2 vlan create
VLAN Id : 200
aggr200002: flags=201000842 mtu 9000 index 42
        inet 0.0.0.0 netmask 0
        ether 0:1b:21:6c:3f:de

NMC – Assign IP addresses

This assumes the following IP ranges:
Native VLAN: 10.100.0.0/24
VLAN 100: 10.100.100.0/24
VLAN 200: 10.100.200.0/24

It also assumes that aggregate 1 should be assigned .2 within each /24, and aggregate 2 should be assigned .3. The ipmp shared interface will be assigned .1.

nmc@nexenta:/$ setup network interface vlan aggr1 static
aggr1 IP address: 10.100.0.2
aggr1 netmask  : 255.255.255.0
Name Server #1      : 10.0.0.101
Name Server #2      : 10.0.0.102
Name Server #3      :
Gateway IP address  : 172.16.0.254
Enabling aggr1 as 10.100.0.2/255.255.255.0 ... OK.

nmc@nexenta:/$ setup network interface vlan aggr2 static
aggr2 IP address: 10.100.0.3
aggr2 netmask  : 255.255.255.0
Name Server #1      : 10.0.0.101
Name Server #2      : 10.0.0.102
Name Server #3      :
Gateway IP address  : 172.16.0.254
Enabling aggr2 as 10.100.0.3/255.255.255.0 ... OK.

nmc@nexenta:/$ setup network interface vlan aggr100001 static
aggr100001 IP address: 10.100.100.2
aggr100001 netmask  : 255.255.255.0
Name Server #1      : 10.0.0.101
Name Server #2      : 10.0.0.102
Name Server #3      :
Gateway IP address  : 172.16.0.254
Enabling aggr100001 as 10.100.100.2/255.255.255.0 ... OK.

nmc@nexenta:/$ setup network interface vlan aggr100002 static
aggr100002 IP address: 10.100.100.3
aggr100002 netmask  : 255.255.255.0
Name Server #1      : 10.0.0.101
Name Server #2      : 10.0.0.102
Name Server #3      :
Gateway IP address  : 172.16.0.254
Enabling aggr100002 as 10.100.100.3/255.255.255.0 ... OK.

nmc@nexenta:/$ setup network interface vlan aggr200001 static
aggr200001 IP address: 10.100.200.2
aggr200001 netmask  : 255.255.255.0
Name Server #1      : 10.0.0.101
Name Server #2      : 10.0.0.102
Name Server #3      :
Gateway IP address  : 172.16.0.254
Enabling aggr200001 as 10.100.200.2/255.255.255.0 ... OK.

nmc@nexenta:/$ setup network interface vlan aggr200002 static
aggr200002 IP address: 10.100.200.3
aggr200002 netmask  : 255.255.255.0
Name Server #1      : 10.0.0.101
Name Server #2      : 10.0.0.102
Name Server #3      :
Gateway IP address  : 172.16.0.254
Enabling aggr200002 as 10.100.200.3/255.255.255.0 ... OK.

Console – Set up IPMP

First, we need to get into expert mode.

nmc@nexenta:/$ options expert_mode=1                                                                          

nmc@nexenta:/$ !bash
You are about to enter the Unix ("raw") shell and execute low-level Unix command(s). CAUTION: NexentaStor
appliance is not a general purpose operating system: managing the appliance via Unix shell is NOT
recommended. This management console (NMC) is the command-line interface (CLI) of the appliance,
specifically designed for all command-line interactions. Using Unix shell without authorization of your
support provider may not be supported and MAY VOID your license agreement. To display the agreement,
please use 'show appliance license agreement'.
Proceed anyway? (type No to return to the management console)  Yes

root@nexenta:/volumes#

[ad name=”Google Adsense 728×90″]

Next step is to set up the hostname files for each of the IPMP interfaces. I will name the interfaces as follows:

ipmp0 – ipmp interface for aggr1 and aggr2
ipmp100000 – ipmp interface for aggr100001 and aggr100002
ipmp200000 – ipmp interface for aggr200001 and aggr200002

These files also set the IP address that we would like the system to apply to these.

root@nexenta:/etc# cat hostname.ipmp0
ipmp group ipmp0 10.100.0.1 netmask 255.255.255.0 up
root@nexenta:/etc# cat hostname.ipmp100000
ipmp group ipmp100000 10.100.100.1 netmask 255.255.255.0 up
root@nexenta:/etc# cat hostname.ipmp200000
ipmp group ipmp200000 10.100.200.1 netmask 255.255.255.0 up

Next, the groups need to be configured in the hostname. files. We need to add ‘group -failover up’ to each of these files, before the ‘up’ at the end. Here are the files after the changes are made:

root@nexenta:/etc# for i in /etc/hostname.aggr* ; do echo $i ; cat $i ; done
/etc/hostname.aggr1
10.100.0.2 netmask 255.255.255.0 mtu 9000  broadcast +  group ipmp0 -failover up
/etc/hostname.aggr2
10.100.0.3 netmask 255.255.255.0 mtu 9000  broadcast + group ipmp0 -failover up
/etc/hostname.aggr100001
10.100.100.2 netmask 255.255.255.0  broadcast +  group ipmp100000 -failover up
/etc/hostname.aggr100002
10.100.100.3 netmask 255.255.255.0  broadcast + group ipmp100000 -failover up
/etc/hostname.aggr200001
10.100.200.2 netmask 255.255.255.0  broadcast +  group ipmp200000 -failover up
/etc/hostname.aggr200002
10.100.200.3 netmask 255.255.255.0  broadcast +  group ipmp200000 -failover up

Now that all the interface configs are in place, we can apply it.. here’s the easiest way I figured out; if anyone knows a better way I’d love to hear it!

# svcadm disable svc:/network/physical:default
# for i in aggr1 aggr2 aggr100001 aggr100002 aggr200001 aggr200002 ipmp0 ipmp100000 ipmp200000 ; ifconfig $i unplumb ; done
# svcadm enable svc:/network/physical:default

At this point, all of your interfaces should be up, and all the IP addresses should be pingable. Make sure that you can ping the individual interface IPs, and the IPMP IPs. You should be able to use the ‘ipmpstat’ command to see information about your groups; IE:

root@nexenta:/volumes# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
nexenta-vl100        up     ipmp100000    aggr100001    aggr100002 aggr100001
nexenta-vl200        up     ipmp200000    aggr200001    aggr200002 aggr200001
nexenta                up     ipmp0       aggr1       aggr2 aggr1

Note that this configuration provides failover and outbound load balancing, but it does not provide inbound load balancing. If you would like inbound load balancing, you need to configure an IP alias on each of the ipmp interfaces, and then mix up which IP you use from the host that is connecting to your Nexenta machine (or use multipathing if it’s iSCSI!)

One last thing – once everything is configured, you will probably want to define your own ping targets. You can view the ones that ipmp picked automatically by running ‘ipmpstat -t’. On our configuration, on one VLAN, two Nexenta nodes picked each other.. so when we took machine two down (intentionally), machine one marked that interface down, and then when we booted machine two back up, it could not reach machine one’s interface, and marked its interface on that vlan down. Nice race condition. Oddly, mpathd (the daemon that does the checking) does not use a configuration file for ping targets, but instead relies on host routes. What we’ve done is added routes to the individual IP addresses that we would like to have it monitor by using the NMC command ‘setup network routes add’, and specifying the IP address to monitor as both the ‘Destination’ and the ‘Gateway’. We picked four to five IPs on each VLAN that were stable hosts (routers, Xen domain-0’s and the like), and added them on both hosts.. this will give more consistent results, as multiple core machines would have to go down before the interface would be disabled incorrectly.

I hope this helps! Please feel free to leave a comment if you run into any trouble getting it working.

{ 9 comments… add one }

mlaub September 10, 2010, 3:21 pm

Very informative! Can’t wait to read about the performance tests with a few servers in use.

Shouldn’t “…ifconfig $i unplumb…” actually be “…ifconfig $i plumb…” ??? Of course, I am probably wrong, cause it is Friday past 3pm, I might be in my own cloud. ;-)

Reply Link
- nc September 11, 2010, 11:54 am
  
  Things have been pretty good so far! I’ve pounded the snot out of these a few times, haven’t been able to get them to really sweat. ;) The only real issue we’ve had is that I initially thought that the LACP config that Nexenta offered actually set up IPMP, so I had it set up with 8 ports on two separate switches with LACP turned off.. that essentially just randomly moves the MAC around, etc.. when we were only testing with one server, it worked ok, but once we added more to the mix, things went *BOOM*, and I felt really, really dumb! :)
  
  Plumb vs Unplumb – unplumb is actually correct.. I am not actually sure what ‘disabling’ the svc:/network/physical:default is supposed to do, but it doesn’t unplumb the interfaces.. and enabling the service won’t bring things up properly unless the interfaces are not plumbed. Hence the unplumb in the middle. Solaris networking still confuses the crap outta me. ;)
  
  Reply Link
Lasse Osterild October 16, 2010, 5:15 am

Nate,

Which SuperMicro model did you go with, and which controller ? I’m having all sorts of issues with a SC847E26-JBOD, LSI 9200-8e SAS2 controller and Seagate Constellation ES 2TB SAS2 disks. In both Solaris and Linux I get weird SAS errors. SuperMicro claims that the LSI2008 chipset is incompatible with their dual expander setup, eventhough I’ve only attached one SAS link, and according to them and their “compatibility maxtrix” only internal SAS2 RAID controllers are qualified and supported, how that’s supposed to work with a JBOD I am yet to figure out (well basically, SuperMicro f*cked up and won’t own up to it).
There’s more to this story and I plan to write a blog post about my findings, so far I unfortunately can’t recommend SuperMicro to any, at least not their SAS2 JBODs.

Reply Link
Rod Newcomb November 11, 2010, 3:26 pm

Great Blog post. We are also working on deploying Nexenta at my current employer. We went down the LACP route and it’s worked well so far. I was actually looking for a solution to a FC Multipathing issue when I came across your site and just wanted to thank you for posting this information!

Reply Link
femi March 12, 2012, 11:05 am

Any specific reason for using “6x Intel 32gb X25-E drives” for ZIL?

Reply Link
- rainabba August 20, 2012, 4:30 am
  
  Iâ€™m fairly new to all this, but as I understand it, ZIL should be mirrored and that means heâ€™s likely running 3x32GB (96GB) which should be relative to the 48GB of RAM and the amount and rate of synchronous writes he expects. In theory, that setup ensure that he can handle a SERIOUS amount of sustained, synchronous writes and with 10x1TB SAS drives, that makes sense to me. If anything one might call it a bit overkill, but then again it may be a bit of â€œfutureproofingâ€ in case the storage is grown. Just my 2 cents. Love to hear from the author so I can confirm if Iâ€™m learning anything.
  
  Reply Link
rainabba August 20, 2012, 4:29 am

I’m running in a virtualized environment and trying to figure out the smartest way to get some form of real trunking in place (so I can break the 1Gbe barrier w/o 1+Gbe equipment). IPMP is something I’d like to look at next but since you have LAG working here, can you tell me if you’ve observed with NFS that you can exceed 1Gbe rates (by even a consistently measurable amount, not looking for 2Gb) with your setup? I ask because you clearly have the I/O to feed it and with this network setup, if I understand how LAG is implemented in Solaris combined with LAG/LACP on a switch, it seems possible.

Reply Link
- Mattias August 22, 2012, 3:25 pm
  
  Any one stream (usually the combination src and dest mac addr) will be limited to the speed of one of the links that make up the channel/aggregate.
  
  Reply Link
Nixo August 3, 2013, 7:03 am

Hello
Great post, but I have issues with my interfaces disappearing from the ipmp group after reboot. It all works great until reboot.

Any ideas?

Thanks in advance

Reply Link