[Or “my quest for the ultimate home-brew storage array.”] At my day job, we use a variety of storage solutions based on the type of data we’re hosting. Over the last year, we have started to deploy SuperMicro-based hardware with OpenSolaris and ZFS for storage of some classes of data. The systems we have built previously have not had any strict performance requirements, and were built with SuperMicro’s SC846E2 chassis, which supports 24 total SAS/SATA drives, with an integrated SAS expander in the backplane to support multipath to SAS drives. We’re building out a new system that we hope to be able to promote to tier-1 for some “less critical data”, so we wanted better drive density and more performance. We landed on the relatively new SuperMicro SC847 chassis, which supports 36 total 3.5″ drives (24 front and 12 rear) in a 4U enclosure. While researching this product, I didn’t find many reviews and detailed pictures of the chassis, so figured I’d take some pictures while building the system and post them for the benefit of anyone else interested in such a solution.
[2010-05-19 Some observations on power consumption appended to the bottom of the post.]
[2010-05-20 Updated notes a bit to clarify that I am not doing multilane or SAS – thanks for reminding me to clarify that Mike.]
[2011-12-20 Replacing references of ‘port multiplier’ with ‘SAS Expander’ to reflect the actual technology in use.. thanks commenters Erik and Aaron for reminding me that port multiplier is not a generic term, and sorry it took me so long to fix the terminology!]
In the systems we’ve built so far, we’ve only deployed SATA drives since OpenSolaris can still get us decent performance with SSD for read and write cache. This means that in the 4U cases we’ve used with integrated SAS expanders, we have only used one of the two SFF-8087 connectors on the backplane; this works fine, but limits the total throughput of all drives in the system to 4 3gbit/s channels (on this chassis, 6 drives would be on each 3gbit channel.) On our most recent build, we built it with the intention of using it both for “nearline”-class storage, and as a test platform to see if we can get the performance we need to store VM images. As part of this decision, we decided to go with a backplane that supports full throughput to each drive. We also decided to use SATA drives for the storage disks, versus 7200rpm SAS drives (which would support multilane, but with the backplane we’re using it doesn’t matter), or faster SAS disks (as the SSD caches should give us all the speed we need.) For redundancy, our plan is to use replication between appliances versus running multi-head stacked to the same storage shelves; for an example of a multi-head/multi-shelf setup, see this build by the local geek Mike Horwath of ipHouse.
When purchasing a SuperMicro chassis with a SAS backplane, there are a few things you should be aware of..
[ad name=”Google Adsense 728×90″]
- There are different models of the chassis that include different style backplanes:
- ‘A’ style (IE – SC847A) – This chassis includes backplanes that allow direct access to each drive (no SAS expander) via SFF-8087 connectors. In the SC847 case, the front backplane has 6 SFF-8087 connectors, and the rear backplane has 3 SFF-8087 connectors. This allows full bandwidth to every drive, and minimizes the number of cables as much as possible. Downside, of course, is that you need enough controllers to provide 9 SFF-8087 connectors!
- ‘TQ’ style – not available for the SC847 cases, but in the SC846 chassis an example part number would be ‘SC846TQ‘. This backplane provides an individual SATA connector for each drive — in other words, you will need 24 SATA cables, and 24 SATA ports to connect them to. This will be a bit of a mess cable-wise.. with the SFF-8087 option, I don’t know why anyone would still be interested in this – if you have a reason, please comment! This is quite a common option on the 2U chassis – it can actually be difficult to purchase a 2U barebones “SuperServer” that includes SFF-8087 connectors.
- ‘E1’ style (IE – SC847E1) – This chassis includes backplanes with integrated 3gbit/s SAS expander, without multipath support. Each backplane has one SFF-8087 connector, so you only need two SFF-8087 ports in a SC847E1 system. The downside is that you are limited to 3gbit/s per channel – so you’d have a total of 6 drives on each 3gbit/s channel for the front backplane, and 3 drives on each channel for the rear backplane. SuperMicro also has a ‘E16’ option (IE – SC847E16) which is upcoming, and supports SATA3/SAS2, for a total of 6gbit/s per channel.
- ‘E2’ style (IE – SC847E2) – Similar to the SC847E1, this includes a SAS expander on the backplane, but also supports multipath for SAS drives. Each backplane has two SFF-8087 connectors. Same caveats as the E1 apply. They also have a ‘E26’ version coming out soon (IE – SC847E26) which will include SAS2 (6gbit/s) expanders.
I do wish that SuperMicro would offer a “best of both worlds” option – it would be great to be able to get a high amount of bandwidth to each drive, and also support multipath. Maybe something like a SAS2 backplane which only put two or three drives on each channel instead of six drives? If they did two drives per channel with a SAS expander, and supported multipath, it should be possible to get the same amount of total bandwidth to each drive (assuming active/active multipath), and still keep a reasonable number of total SFF-8087 connectors, plus support multipath with SAS drives, and get the bonus of controller redundancy. If anyone knows of an alternate vendor or of plans at SuperMicro to offer this, by all means, comment!
- ‘UB’ option (IE, SC847A-R1400UB) – this option supports SuperMicro’s proprietary UIO expansion cards. It uses a proprietary riser card to mount the cards horizontally, and will support 4 full-height cards and 3 low-profile cards in the SC847. They get the card density by mounting the components for one (or more) UIO cards on the opposite site of the PCB than you usually see – the connector itself is still PCI-E x8, but the bracket and components are all on the opposite side. I have not ordered a chassis that uses UIO recently, so I’m not sure if the sample part number would include riser cards or not. Note that you will need to purchase a SuperMicro board that supports UIO for this chassis.
- ‘LPB’ option (IE, SC847A-R1400LPB) – this option supports 7 low-profile expansion slots. If you do not have any need for full-height cards, this gives you the maximum number of high-speed slots. This is the option you will need to go with if you want to use a motherboard from a vendor other than SuperMicro.
For the system I’m building, we went with the following components:
- SuperMicro SC847A-R1400LPB chassis – 36-bay chassis with backplanes that offer direct access to each drive via SFF-8087 connectors. 7 low-profile expansion slots on the motherboard tray.
- SuperMicro X8DTH-6F motherboard – Intel 5520 chipset; supports Intel’s 5500- and 5600- series Xeon CPUs. Has an integrated LSI 2008 SAS2 controller, which supports 8 channels via two SFF-8087 ports. 7 PCI-E 2.0 x8 slots. 12 total memory slots. IPMI with KVMoIP integrated. Two Gig-E network ports based on Intel’s newest 82576 chipset. This board is great.. but what would make it perfect for me would be a version of the board that had 18 memory slots and 4 integrated Gig-E ports instead of two. Ah well, can’t have it all!
- 2x Intel E5620 Westmere processors
- 24gb DDR3 memory; PC3-10600, registered/ecc.
- 4x LSI 9211-8i PCI-E SAS-2 HBA – 2 SFF-8087 ports on each controller; same chipset (LSI 2008) as the onboard controllers. This gives me a total of 10 SFF-8087 SAS2 ports, which is one more than needed to supports all the drive bays. I should also note that we haven’t had any problems with the LSI2008-based controllers dropping offline with timeouts under OpenSolaris; with our other systems, we started with LSI 3081E-R controllers, and had no end of systems failing due to bug ID 6894775 in OpenSolaris, which as far as I’m able to tell has not yet been resolved. Swapping the controllers out with 9211-8i’s solved all the issues we were having.
- Variety of SuperMicro and 3ware SFF-8087 cables in various lengths to reach the ports on the backplanes from the controller locations.
- 2x Seagate 750gb SATA hard drives for boot disks.
- 18x Hitachi 2TB SATA hard drives for data disks.
- 2x Intel 32gb X25-E SATA-2 SSD’s; used in ZFS for a mirrored Zero Intent Log (ZIL); write cache. (Note: 2.5″ drives; needs a SuperMicro MCP-220-00043-0N adapter to mount in the hot-swap bays.)
- 1x Crucial RealSSD C300 128gb SSD; used in ZFS for a L2ARC read cache. (Also a 2.5″ drive; see note above.)
We purchased the system from CDW, with our own customer-specific pricing. I’m not allowed to share what we paid, but for your reference, I’ve whipped up a shopping cart at Provantage with (essentially) the same components. There is no special pricing here; this is just the pricing that their web site listed as of May 8 2010 at 11:18am central time. Note: I have no affiliation with Provantage. I have ordered from them previously, and enjoyed their service, but cannot guarantee you will have a good experience there. The prices here may or may not be valid if you go to order. You may be able to get better pricing by talking to a customer service rep there. I also had to change a few components for parts that Provantage did not have available – namely some of the various lengths of SFF-8087 cables. I error’d on the side of ‘long’, so it should work, but I haven’t built a system with those exact cables, so can’t guarantee anything.
As you can see, the total price for this system came out at just under $8500, or $8717.14 shipped. Not bad at all for a high-performance storage array with 18 2tb data drives and the ability to add 13 more.
If we do decide that this is the route to go for our VM image storage, the config would be similar to above, with the following changes at minimum:
- More memory (probably 48gb) using 8gb modules to leave room for more expansion without having to replace modules.
- Switch from desktop HDDs to enterprise or nearline HDDs (6gb SAS if they are economical); probably also go with lower capacity drives, as our VMs would not require the same amount of total storage, and NexentaStor is priced by the terabyte of raw storage.
- Add more (either 4x or 6x total, still used in pairs of 2) X25-E’s for ZIL/SLOG, possibly also go with 64gb instead of 32gb. (More total drives should mean more total throughput for synchronous writes. If Seagate Pulsars are available, also consider those.
- Add additional RealSSD C300’s for cache drives; the more the better.
- Add additional network capacity in the form of PCI-E NIC cards – either 2x 4-port Gig-E or 2x 10-GigE. This will allow us to make better use of IPMP and LACP to both distribute our network load among our core switches and use more than 2gbit total bandwidth.
In any case, on to some pictures of the chassis and build.
Chassis in shipping box – includes good quality rackmount rails and the expected box of screws, power cables, etc. First SuperMicro chassis I’ve ordered that is palletized.
Front of the chassis – 24 drive bays up front.
Rear of the chassis – 12 drive bays, and a tray for the motherboard above them. Also shows the air shroud to direct airflow over the CPUs; the only part of the chassis that feels cheap at all.. but it serves its purpose just fine.
System with the motherboard tray removed. Note that as far as the mounting is concern the tray is pretty much the same as a standard SuperMicro 2U system. You’ll need to order heatsinks, cards, etc that would work in a 2U.
View of the system from the back with the motherboard and four front fans removed. You can see a bit of the front backplane in the upper right; two of the SFF-8087 connectors are visible. All cable routing goes underneath the fans; there is plenty of room under the motherboard for cable slack. You can also see the connectors that the power supplies slide into on the upper left hand corner, and a pile of extra power cables that are unneeded for my configuration underneath that.
Another shot of the front backplane. You can see the five of the six SFF-8087 connectors (the other is on the right-hand side of the backplane which is not visible.) Also note the fans that I’ve removed to get better access to the backplane.
One of the power connectors that the fans slide into (white four-pin connector near the center of the picture); the SFF-8087 connector that is not visible in the picture above is highlighted in red.
Motherboard tray before installing the motherboard. This tray uses a different style screw system than I’ve seen before; instead of having threaded holes that you screw standoffs into, they have standoffs coming up off the bottom (one highlighted in blue), which you screw an adapter onto (highlighted in red) which the motherboard rests on and is secured to.
A partial view of the rear backplane on the system; also the bundle of extra power cables and the ribbon cable connected to the front panel.
Labels on one of the power supplies. This system includes a pair of ‘PWS-1K41P-1R’ power supplies, which output 1400W at 220V or 1100W at 120V.
Motherboard installed on tray, with the four LSI SAS HBAs in their boxes.
One of the two Intel E5620 ‘Westmere’ Xeon processors set in motherboard but not secured yet.
Both processors and 24gb of memory installed. No heatsinks yet.
Motherboard tray complete and ready to be installed in the system. Heatsinks and LSI controllers have been installed. Note the two SFF-8087 connectors integrated on the motherboard, and eight more on the four controllers.
Prep work on the rear backplane; the chassis shipped with the power cables pre-wired; I connected the SFF-8087 cable.
Motherboard tray installed back in the system; SFF-8087 cables connected to three of the four LSI controllers. I ended up moving one controller over for ease of cabling – notice the gap in the middle of the four controllers.
(Note: The pictures of the finished system below this point were taken on 5/7/2010; thanks to my coworker Colleen for letting me borrow her camera since I #natefail‘d to bring mine!)
The seven cooling fans to keep this system running nice and cool.
HBAs with all cables connected.
Finished system build with the top off. One power supply is slightly pulled out since I only have a single power cable plugged in.. if you have one cable plugged in but both power supplies installed, alas, the alarm buzzer is loud.
Front hard drive lights after system is finished – note that we don’t have every drive bay populated yet.
Rear drive lights while system is running.
[ad name=”Google Adsense 728×90″]
The build-out on the system went fine for the most part; the only problem I ran into is that the motherboard did not have a BIOS installed which supported the relatively new Westmere processors. Fortunately I had a Nehalem E5520 I could borrow from another system to get the BIOS upgraded.. I wish the BIOS recovery procedure would work for unsupported processors, but ah well. I was pleased with the way the motherboard tray slides out; it makes it easy to get the cabling tucked underneath and routed so that they will not interfere with airflow. There also seems to be plenty of airflow to keep the 36 drives cooled.
I currently have NexentaStor 3.0 running on the system; we have not yet landed on what operating system we will run on this long-term.. but it will likely either be NexentaCore or NexentaStor. If we deploy this solution for our VM images (with some upgrades as mentioned above), we will almost certainly use NexentaStor and the VMDC plugin, but we’ll cross that bridge if we get there!
Here’s the disk configuration I have running at the moment with NexentaStor:
- ‘syspool’: Mirrored ZFS zpool with 2x750gb Seagate drives.
- ‘NateVol1’: ZFS zpool with..
- 2 RaidZ3 arrays with 8 2TB disks each
- 2 2TB disks set as spares
- 2 36gb Intel X25-E SSDs as a mirrored log device
- 1 128gb Crucial RealSSD C300 as a cache device
..and the obligatory screenshot of the data volume config:
This nets 18T usable space, and would allow for a simultaneous failure of any three data disks before there is any risk of data loss. (Each of the sub-arrays in ‘NateVol1’ have 3 parity disks – so I could also lose 3 disks from each of the sub-arrays without any issues.)
Again, this system only has two Gig-E NICs at the moment.. I’ve done I/O tests with NFS across one NIC and iSCSI across the other NIC, and can max out the bandwidth on both cards simultaneously with multiple runs of Bonnie++ 1.96 without the system breaking a sweat. I like! I should also note that this is with both deduplication and compression enabled.
Another note – before putting this into production, I did some simple “amp clamp” power usage tests on the box, with one power supply unplugged. The other power supply was plugged into 120V. While idling, it consumed 3.3A, and while running multiple copies of Bonnie in the ZFS storage pool (with all active disks lighting up nicely), it consumed 4.1A. Not bad at all for the amount of disk in this machine! I’d estimate that if the 13 additional drive bays were occupied with 2TB disks, and all those disks were active, the machine would consume about 5.5A – maybe slightly more. When we racked it up at the data center (in one of our legacy racks that is still 120V), the power usage bumped up by 3.2A combined across the A+B power, which matches nicely with my clamped readings. I’m very impressed – under 500 watts while running full out.. wow.
I will update this post once we decide on a final configuration “for real” and put this into production, but so far I’d highly recommend this configuration! If you’ve used the SC847 chassis, I’d love to hear what you’ve thought. I’d also love to try out the 45-bay storage expansion version of this chassis at some point – talk about some dense storage! :)