Tag Archives: storage

Thinking out loud: Hyper converged storages missing link

Introduction:

In general, I’m not a huge fan of hyper converged infrastructure.  To me, its more “hype” than substance at the moment.  It was born out of web scale infrastructure like Google, Facebook, etc. and IMO, that is still the area where it’s better suited.    The only enterprise layer where I see HCI being a good fit is VDI, other than that, almost every other enterprise workload would be better suited on new school shared storage.  I could probably go into a ton of reasons why I personally see shared storage still being the preferred architecture for enterprises, but instead I’ll focus on one area that if adopted might change my view (slightly).  You see, there is a balance between the best and good enough.  Shared storage IMO is the best, but HCI could be good enough.

What’s missing?

What is the missing link (pun intended)?  IMO, its external / independent DAS.  Can’t see where this is going?  Follow along on why I think external DAS will make hyper converged storage good enough for almost anyone’s environment.

Scaling Deep:  Right now the average server tops out at 24 2.5” drives and less for 3.5” drives.  In a lot of larger shops, that would mean running more hosts in order to meet your storage requirements, and that will come at the cost of paying for more CPU, memory and licensing then you should have to.  Just imagine a typical 1ru r630 + a 2u 60 drive JBOD!  That’s a lot more storage that you can fit under a single host, and it would only consume one more rack unit than a typical r730.  Add to this, theoretically speaking, the number of drives you could add to a single host would go beyond a single JBOD.  A quad port SAS HBA could have four 60 drive enclosures attached, and that’s a single HBA.

Storage independence:  Having the storage outside the server also makes that storage infinitely more flexible.  This is even true when you’re building vendor homogeneous solutions.  Take Dell for example.  Typically speaking their enclosures are movable between different server generations.  Currently with the storage stuck in the chassis, it gets really messy (support wise) and in many cases not doable, to move the storage from one chassis to another, especially if you’re talking about going from an older generation server to a newer generation.

Adding to this, depending on your confidence, white boxing also allows you to cut a server vendor out of the costliest part of the solution, which is the disks themselves.  Going with an enclosure from someone like RAID inc. or DataOn, Quanta QCT, Seagate, etc.  Add in a generic LSI (sorry Avago, oh sorry again, Broadcom) HBA, and now you have a solution that is likely good enough supportability wise.  JBODs tend to be pretty dumb and reliable, which just leaves the LSI card (well known established vendor) and your SSD / HDD.

Why do you want to move the storage anyway?  Simple, I’d bet a nice steak dinner that you want to upgrade or replace your compute long before you need to replace your storage. If you’re simply replacing your compute (not adding a new node but swapping it) then moving a SAS card + DAS is far more efficient than rebuying the storage, or moving the internal storage into a new host (remember warranty gets messy).  Simply vacate the host like you would with internal storage, shutdown, rip the hba out, swap server, put existing HBA back in, done.

If you’re adding a new host, depending on your storage, you may have the option of buying another enclosure and spreading the disks you have evenly across all hosts again.  So if for example, you had 50 disks in 4 hosts (total 200 disks) and you add a fifth host.  One option could be you simply remove 10 disks from each current node and place them in the new node. Your only additional cost was JBOD enclosure, and you now continue to keep your current investment in disks (with flash, that would be the expensive part).

Mix and match 3.5 / 2.5 drives:  Right now with internal storage, you are either running a 3.5” chassis, which doesn’t hold a lot of drives, but CAN support 2.5” drives with a sled.  Or you are running a 2.5” chassis which guarantees no 3.5” drives.  External DAS could mean one of two options:

  1. Use a denser 3.5” JBOD (say 60 disks) and use 2.5” sleds when you need to.
  2. Use one JBOD for 3.5” drives and a different one for 2.5” drives.

Again it comes down to flexibility.

Performance upgrades:  Now this is a big “it depends”.  Hypothetically if there were no SW imposed bottlenecks (which there are), one of your likely bottlenecks (with all flash at least) are going to be either how many drives you have per SAS lane, or how many drives you have per SAS card.  For example, if your SAS card is PCIe 3.0 internally, but the PCIe bus is 4.0, there’s a chance you could upgrade your server to a newer / better storage controller card.   More so, even if you were stuck on PCIe 3 (as an example).  There would be nothing stopping you from slicing your JBOD in half, and using two HBA to double your throughput.  Before you even go there, yes I do know the 730xd has an option for two RAID cards, glad you brought that up.  Guess what, with external DAS, you’re only limited by your budget, the number of PCIe slots you have and the constrains of your HCI vendor.  I for example could have 4 SAS cards, and 2 JBODs partially filled and each sliced in half.  You don’t have that flexibility with internal storage.

With the case of white boxing your storage, this also means to the extent of the HCL, you can run what you want.  So if you want to use all Intel dc3700’s, you can.  Heck, they’re even starting to make JBOF (just a bunch of flash) enclosures for NVMe, which again, would be REALLY fast.

Conclusion:

I say external DAS support is the missing link because it is what would allow HCI to offer similar scaling flexibility that exists in SAN/NAS.  I still think the HCI industry is at least 3 – 5 years out from matching the performance, scalability and features we’ve come to expect in enterprise storage, but external storage support would knock a big hole in a large facet of the scalability win with SAN/NAS.

Review: 2.5 years with Nimble Storage

Disclaimer: I’m not getting paid for this review, nor have I been asked to do this by anyone.  These views are my own,  and not my employers, and they’re opinions not facts .

Intro:

To begin with, as you can tell, I’ve been running Nimble Storage for a few years at this point, and I felt like it was time to provide a review of both the good and bad.  When I was looking at storage a few years ago, it was hard to find reviews of vendors, they were very short, non-informative, clearly paid for, or posts by obvious fan boys.

Ultimately Nimble won us over against the various  storage lines listed below.  Its not a super huge list as there was only so much time and budget that I had to work with .  There were other vendors I was interested in but the cost would have been prohibitive, or the solution would have been too complex.  At the time, Tintri and Tegile never showed up in my search results, but ultimately Tintri wouldn’t have worked (and still doesn’t) and Tegile is just not something I’m  super impressed with.

  • NetApp
  • X-IO
  • Equallogic
  • Compellent
  • Nutanix

After a lot of discussions and research, it basically boiled down to NetApp vs. Nimble Storage, with Nimble obviously winning us over.  While I made the recommendation with a high degree of trepidation and even after a month with the storage, wondered if I totally made an expensive mistake, I’m happy to say, it was and is still is a great storage decision.  I’m not going into why I chose Nimble over NetApp, perhaps some other time, for now this post is about Nimble, so let’s dig into it.

When I’m thinking about storage, the following are the high level area’s that I’m concerned about.  This is going to be the basic outline of the review.

  • Performance / Capacity ratios
  • Ease of use
  • Reliability
  • Customer support
  • Scaling
  • Value
  • Design
  • Continued innovation

Finally, for your reference, we’re running 5 of their 460’s, which is between their cs300 and cs500 platforms and these are hybrid arrays.

Performance / Capacity Ratios

Good performance like a lot of things is in the eye of the beholder.  When I think of what defines storage as being fast, its IOPS, throughput and latency.  Depending on your workload, more of one than the other may be more important to you, or maybe you just need something that can do ok with all of those factors, but not awesome in any one area.  To me, Nimble falls in the general purpose array, it doesn’t do any one thing great, but it does a lot of things very well.

Below you’ll find a break down of our workloads and capacity consumers.

IO breakdown (estimates):

  • MS SQL (50% of our total IO)
    • 75% OLTP
    • 25% OLAP
  • MS Exchange (30% of total IO)
  • Generic servers (15% of total IO)
  • VDI (5% of total IO)

Capacity consuming apps:

  • SQL (40TB after compression)
  • File server (35TB after compression)
  • Generic VM’s (16TB after compression)
  • Exchange (8TB after compression)

Compression?  yeah, Nimble’s got compression…

Nimble’s probably telling you that compression is better than dedupe, they even have all kinds of great marketing literature to back it up.  The reality like anything is, it all depends.  I will start by saying if you need a general purpose array, and can only get one or the other, there’s only one case where I would choose dedupe over compression, which is data sets mostly consisting of operating system and application installer data.  The biggest example of that would be VDI, but basically where ever you find your data being mostly consistent of the same data over and over.  Dedupe will always reduce better than compression in these cases.    Everything else, you’re likely better off with compression.    At this point, compression is pretty much a commodity, but if you’re still not a believer, below you can see my numbers.  Basically, Nimble (and everyone else using compression) delivers on what they promise.

  • SQL: Compresses very well, right now I’m averaging 3x.  That said, there is a TON of white space in some of my SQL volumes.  The reality is, I normally get a minimum of 1.5x and usually end up more along the 2x range.
  • Exchange 2007: Well this isn’t quite as impressive, but anything is better than nothing,   1.3x is about what we’re looking at.  Still not bad…
  • Generic VM’s: We’re getting about 1.6x, so again, pretty darn good.
  • Windows File Servers: For us its not entirely fair to just use the general average, we have a TON of media files that are pre-compressed.  What I’ll say is our generic user / department file server gets about 1.6 – 1.8 reduction.

Show me the performance…

Ok, so great, we can store a lot of data, but how fast can we access it?  In general, pretty darn fast…

The first thing I did when we got the arrays was fire up IOMeter, and tried trashing the array with a 100% random read 8k IO profile (500GB file), and you know what, the array sucked.  I mean I was getting like 1,200 IOPS, really high latency and was utterly disappointed almost instantly.    In hind sight, that test was unrealistic and unfair to some extent.  Nimble’s caching algorithm is based on random in, random out, and IOmeter was sequential in (ignored) and then attempting random out.  For me, what was more bothersome at the time, and still is to some degree is it took FOREVER before the cache hit ratio got high enough that I was starting to get killer performance.    Its actually pretty simple to figure out how long it would take a cold dataset like that to completely heat up, divide (524288000k/9600) or 15 hours.  The 524288000 is 500GB converted to KB.  The 9600 is 8k * 1200IOPS to figure out the approximate throughput at 8k.

So you’re probably think all kinds of doom and gloom and how could I recommend Nimble with such a long theoretical warm up time?  Well let’s dig into why:

  • That’s a synthetic test and a worst case test.  That’s 500GBs of 100% random, non-compressed data.  If that data was compressed for example to 250GB, it would “only” take 7.5 hours to copy into cache.
  • On average only 10% – 20% of you total dataset is actually hot.  If that file was compressed to 250GB, worst case you’re probably looking at 50GB that’s hot, and more realistic 25GB.
  • That was data that was written 100% sequential and then being read 100% random.  Its not a normal data pattern.
  • That time is how long it takes for 100% of the data to get a 100% cache hit.  The reality is, its not too long before you’re starting to get cache hits and that 1,200 IOPS starts looking a lot higher (depending on your model).

There are a few example cases where that IO pattern is realistic:

  • TempDB: When we were looking at Fusion IO cards , the primary workload that folks used them for in SQL was TempDB.  TempDB can be such a varied workload that its really tough to tune for, unless you know your app.  Having a sequential in, random out in TempDB is a very realistic scenario. 
  • Storage Migrations:  Whether you use Hyper-V or VMware, when you migrate storage, that storage is going to be cold all over again with Nimble.  Storage migrations tend to be sequential write.
  • Restoring backup data:  Most restores tend to be sequential in nature.  With SQL, if you’re restoring a DB, that DB is going to be cold.

if you recall, I highlighted that my IOmeter test was unrealistic  except in a few circumstances, and one of those realistic circumstances can be TempDB, and that’s a big “it depends”.    But what if you did have such a circumstance?  Well any good array should have some knobs to turn and Nimble is no different.  Nimble now has two ways to solves this:

  • Cache Pinning: This feature was released in NOS 2.3, basically volumes that are pinned run out of flash.  You’ll never have a cache miss.
  • Aggressive caching: Nimble had this from day one, and it was reserved for cases like this.  Basically when this is turned on, (volume or performance policy granularity TMK), Nimble caches any IO coming in or going out.  While it doesn’t guarantee 100% cache hit ratios, in the case of TempDB, its highly likely the data will have a very high cache hit ratio.

Performance woes:

That said, Nimble suffers the same issues that any hybrid array does, which is a cache miss will make it fall on its face, which is further amplified in Nimbles case by having a weak disk subsystem IMO.  If you’re not seeing at least a 90% cache hit ratio, you’re going to start noticing pretty high latency .  While their SW can do a lot to defy physics, random reads from disk is one area they can’t cheat.  When they re-assure you that you’ll be just fine with 12 7k drives, they’re mostly right, but make sure you don’t skimp on your cache.  When they size your array, they’ll likely suggest anywhere between 10% and 20% of your total data set size.  Go with 20% of your data set size or higher, you’ll thank me.  Also, if you plan to do pinning or anything like that, account for that on top of the 20%.  When in doubt, add cache.  Yes its more expensive, but its also still cheaper than buying NetApp, EMC, or any other overpriced dinosaur of an array.

The only other area where I don’t see screaming performance is situations where 50% sequential read + 50% sequential write is going on.  Think of something like copying a table from one DB to another.  I’m not saying its slow, in fact, its probably faster than most, but its not going to hit the numbers you see when its closer to 100% in either direction.  Again, I suspect part of this has to do with the NL-SAS drives and only having 12 of them.  Even with coalesced writes, they still have to commit at some point, which means, you have to stop reading data for that to happen, and since sequential data comes off disk by design, you end up with disk contention.

Performance, the numbers…

I touched on it above, but I’ll basically summarize what Nimble’s IO performance spec’s look like in my shop.  Again, remember I’m running their slightly older cs460’s, if these were cs500’s or cs700’s all these numbers (except cache misses) would be much higher.

  • Random Read:
    • Cache hit: Smoking fast (60k IOPS)
    • Cache miss: dog slow (1.2k IOPS)
  • Random Write: fast (36k IOPS)
  • Sequential
    • 100% read: smoking fast (2GBps)
    • 100% write: fast (800MBps – 1GBps)
    • 50%/50%: not bad, not great (500MBps)

Again, its rough numbers, I’ve seen higher number in all the categories, and I’ve seen lower, but these are very realistic numbers I see.

Ease of use:

Honestly the simplest SAN I’ve ever used, or at least mostly.  Carving up volumes, setting up snapshots and replication has all been super easy, and intuitive.  While Nimble provided training, I would content its easy enough that you likely don’t need it.  I’d even go so far as saying you’ll probably think you’re missing something.

Also, growing the HW has been simple as well.  Adding a data shelf or cache shelf has been as simple as a few cables and clicking “activate” in the GUI.

Why do I say mostly?  Well if you care about not wasting cache, and optimizing performance, you do need to adapt your environment a bit.  Things like transaction logs vs DB, SQL vs Exchange, they all should have separate volume types.  Depending on your SAN, this is either common place, or completely new.  I came from an Equallogic shop, where all you did was carve up volumes.  With Nimble you can do that too, but you’re not maximizing your investment, nor would you be maximizing your performance.

Troubleshooting performance can take a bit of storage knowledge in general (can’t fault Nimble for that per say) and also a good understanding of Nimble its self.  That being said, I don’t think they do as good of a job as they could in presenting performance data in a way that would make it easier to pin down the problem.  From the time I purchased Nimble till now, everything I’ve been requesting is being siloed in this tool they call “Infosite”, and the important data that you need to troubleshoot performance  in many ways is still kept under lock and key by them, or is buried in a CLI.  Yeah, you can see IOPS, latency, throughput and cache hits, but you need to do a lot of correlations.  For example, they have a line graph showing total read / write IOPS, but they don’t tell you in the line graph whether it was random or sequential.  So when you see high latency, you now need to correlate that with the cache hits and throughput to make a guess as to whether the latency was due to a cache miss, or if it was a high queue depth sequential workload.  Add to that, you get no view of the CPU, average IO size, or other things that are helpful for troubleshooting performance.  Finally, they role up the performance data so fast, that if you’re out to lunch and there was a performance problem, its hard to find, because the data is average way too quickly.

Reliability:

Besides disk failures (common place) we’ve had two controller failures.    Probably not super normal, but none the less, not a big deal.  Nimble failed over seamlessly, and replacing them was super simple.

Customer Support:

I find that their claim of having engineers staffing support to be mostly true.  By in large, their support is  responsive, very knowledgeable and if they don’t know the answer, they find it out.  Its not always perfect, but certainly better than other vendors I’ve worked with.

Scaling:

I think Nimble scales fantastically so long as you have the budget.  At first when they didn’t have data or cache shelves, I would have said they have some limits, but now a days, with their scale in any direction, its hard not to say that they can’t adapt to your needs.

That said, there is one area where I’m personally very disappointed in their scaling, which is going up from an older generation to a newer generation controllers.  In our case, running the cs460’s requires a disruptive upgrade to go to the cs500’s or cs700’s.  They’ll tell me its non-disruptive if I move my volumes to a new pool, but that first assumes I have SAN groups, and second assumes I have the performance and capacity to do that.  So I would say this is mostly true, but not always.

Value / Design:

The hard parts of Nimble…

If we just take face value, and compare them based on performance and capacity to their competitors, they’re a great value.  If you open up the black box though and start really looking at the HW you’re getting, you start to realize Nimble’s margins are made up in their HW.    A few examples…

  • Using Intel sc3500’s (or comparable) with SAS interposers instead of something like an STEC or HTST SAS based SSD.
  • Supermicro HW instead of something rebranded from Dell or HP.  The build quality of Supermicro just doesn’t compare to the others.  Again, I’ve had two controller failures in 2 years.
  • Crappy rail system.  I know its kind of petty, but honestly they have some of the worst rails I’ve seen next to maybe Dell’s EQL 6550 series.  Tooless kits have kind of been a thing for many years now, it would be nice to see Nimble work on this
  • Lack of cable management, seriously, they have nothing…

Other things that bug me about their HW design…

Its tough to understand how to power off / on certain controllers without looking in the manual.  Again, not something you’re going to be doing a lot, but still it could be better.  Their indicator lights are also slightly mis-leading with a continual blinking amberish orangeish light on their chassis.  The color is initially misleading that perhaps an issue is occurring.

While I like the convince of the twin controller chassis, and understand why they, and many other vendors use it.  I’d really like to see a full sized dual 2u rack mount server chassis.  Not because I like wasting space, but because I suspect it would actually allow them to build a faster array.  Its only slightly more work to unrack a full sized server, and the reality is I’d trade that any day for better performance and scalability (more IO slots).

I would like to see a more space conscious JBOD.  Given that they over subscribe the SAS backplane anyway, they might as well do it while saving space.  Unlike my controller argument, where more space would equal more performance, they’s offering a configuration that chews up more space, with no other value add, except maybe having front facing HDD’s.  I have 60 bay JBODs for backup that fit in 4u.  Would love to see that option for Nimble, that would be 4 times the amount of storage in about the same amount of space.

Its time to talk about the softer side of Nimble….

The web console, to be blunt is a POS.  Its slow, buggy, unstable, and really, I hate using it.  To be fair, I’m bigoted against web consoles in general, but if they’re done well, I can live with them.  Is it usable, sure, but I certainly don’t like living in it.  If I had a magic wand, I would actually do away with the web console on the SAN its self and instead, produce two things:

  • A C# client that mimic’s the architecture of VMware.  VMware honestly had the best management architecture I’ve seen (until they shoved the web console down my throat).  There really is no need for a web site running on the SAN.  The SAN should be locked down to CLI only, with the only web traffic being API calls.  Give me a c# client that I can install on my desktop, and that can connect directly to the SAN or to my next idea below.  I suspect, that Nimble could ultimately display a lot more useful information if this was the case, and things would work much faster.
  • Give me a central console (like vCenter) to centrallly manage my arrays,  I get that you want us to use infosite and while its gotten better, its still not good enough.  I’m not saying do away with info site, but let me have a central, local, fast solution for my arrays.  Heck, if you still want to do a web console option, this would be the perfect place to run it.

The other area I’m not a fan of right now, is their intelligent MPIO.  I mean I like it, but I find its too restrictive.  Being enabled on the entire array or nothing is just too extreme.  I’d much rather see it at the volume level.

Finally, while I love the Windows connection manager, it still needs a lot of work.

  • NCM should be forwards and backwards compatible, at least to some reasonable degree.  Right now its expected that it matches the SAN’s FW version and that’s not realistic.
  • NCM should be able to kick off on demand snaps (in guest) and offer a snapshot browser (meaning show me all snaps of the volume).
  • If Nimble truly want to say they can replace my backup with their snapshots, then make accessing the data off them easier.  For example, if I have a snap of a DB, I should be able to right click that DB, and say (mount a snapshot copy of this DB, with this name) and the Nimble goes off and runs some sort of workflow to make that happen.  Or just let us browse the snaps data almost like a UNC share.

The backup replacement myth…

Nimble will tell you in some cases that they have a combined backup and primary storage solution.  IMO, that’s a load of crap.  Just because you take a snapshot, doesn’t mean you’ve backed up the data.  Even if you replicate that data, it’s still not counting as a backup.  To me, Nimble can say they’ve solved the backup dilemma with their solution when they can do the following:

  • Replicate your data to more than one location
  • Replicate your data to tape every day and send it offsite.
  • Provide an easy straight forward way to restore data out of the snapshots.
  • Truncate transaction logs after a successful backup.
  • Provide a way of replicating the data to non-Nimble solution, so the data can be restored anywhere.  Or provide something like a “Nimble backup / recovery in the cloud” product.

Continued Innovation:

I find Nimble’s innovation to be on the slow side, but steady, which is a good thing.  I’d much rather have a vendor be slow to release something because they’re working on perfecting it.  In the time I’ve been a customer, they’ve released the following features post purchase:

  • Scale out
  • Scale deep
  • External flash expansion
  • Cache Pinning
  • Virtual Machine IOPS break down per volume
  • Intelligent MPIO
  • Cache Pinning
  • QOS
  • RestAPI
  • RBAC
  • Refreshed generation of SANS (faster)
  • Larger and larger cache and disk shelves

Its not a huge list, but I also know what they’re currently working on, and all I can say is, yeah they’re pretty darn innovative.

Conclusion and final thoughts:

Nimble is honestly my favorite general purpose array right now.  Coming from Equallogic, and having looked at much bigger / badder arrays, I honestly find them to be the best bang for the buck out there.  They’re not without faults, but I don’t know an array out there that’s perfect.  If you’re worried they’re not “mature enough”,  I’ll tell you, you having nothing to fear.

That said, its almost 2016 and with flash prices being where they are now, I personally don’t see a very long life for hybrid array going forward, at least not as high performance mid-size to enterprise  storage arrays.  Flash is getting so cheap, its practically not worth the saving you get from a hybrid, compared to the guaranteed performance you get from an all flash.    Hybrids were really filling a niche until all flash became more attainable, and that attainable day is here IMO.  Nimble thankfully has announced that an AFA is in the works, and I think that’s a wise move on their part.  If you have the time, I would honestly wait out your next SAN purchase until their AFA’s are out, I suspect, they’ll be worth the wait.

Backup Storage Part 4a: Windows Storage Spaces Gotcha’s

The pro and the con of a software defined storage is that its not a turn key solution.  Not only do  you have the power to customize your solution, you also have no choice but to design your solution.  With Storage Spaces, we figured most of this stuff out before selecting it as our new backup storage solution.  At the time, there was some documentation on storage spaces, but it was very much a learning process.  Most “how to’s” were demonstrated inside labs, so I found some aspects of the documentation to be useless, but I was able to glean enough information, to at least know what to think about.

So to get started, if you end up here because you want to build a solution like this, I would encourage you to start with this FAQ’s that MS has put together.  A lot of the questions I had were answered here.

I want to go over a number of gotcha’s with WSS before you take the plunge:

  1. If you’re used to and demand a solution that notifies you automatically of HW failures, this solution may not be right for you.  Don’t get me wrong, you can tell that things are going bad, but you’ll need to monitor them yourself.  MS has written a health script, and I myself was also able to put together a very simple health script (I’ll post it once I get my GitHub page up).
  2. WSS only polls HW health once every 30 minutes.  You can’t change this.  That means if you rip an power supply out of your enclosure it will take up to 30 minutes before the enclosure goes into an unhealthy state.  I confirmed this with MS’s product manager.
  3. Disk rebuilds are not automatic, nor are they what I would call intuitive.  You shouldn’t just rip a disk out when its bad, plop a disk in and walk away.  There is a process that must be followed in order to replace a failed disk.  BTW, this process as of now, is all powershell based.
  4. Do NOT cheap out on consumer grade HW.  Stick to the MS HCL found here.  There have been a number of stability problems listed with WSS, and its almost always has to do with not sticking with the HCL and not WSS’s reliability.
  5. This isn’t specific to WSS, but do not plan on mixing SATA and SAS drives on the same SAS backplane.  Either go all SAS or go all SATA, avoid mixing.  For HDD’s specifically, the cost is so negligible between SATA and SAS, I would personally recommend just sticking with SAS, unless you never plan to cluster.
    1. Do NOT use SAS expanders either, again, stop cheaping out, this is your data we’re talking about here.
  6. Do NOT plan on using parity based virtual disks, they’re horrible at writes, as in 90MBps tops.  Use mirroring or nothing.
  7. Do NOT plan on using dedicated hot spares, instead plan on reserving free space in your storage pool.  This is one of many advantages to WSS.  it uses free space and ALL disks to rebuild your data.
    1. If you plan on using the “enclosure awareness”, you need to reserve a drives capacity of free space * the number of enclosures you have.  So if you have 4 enclosures and you’re using 4TB drives, you must reserve 16TB of space per storage pool spanned across those enclosures.
  8. Plan on taking point 7, and also ensuring there’s at least 20% free space in your pool.  I like to plan like this.
    1. Subtract 20% right of the top of your raw capacity.  So if you have 200TB raw, that’s 160TB.
    2. As mentioned in point 7\1.  If you plan to use enclosure awareness, subtract a drive for each enclosure, otherwise, subtract at least one drives worth of capacity.  So in we had 4 enclosures and they had 4TB drives, that would be 160TB – 16TB = 144TB usable before mirroring.
  9. I recommend thick provisioned disk, unless you’re going to be VERY diligent about monitoring your pools free space.  Your storage pool will go OFFLINE if even one disk in the pool runs low on free space.  Thick provisioning prevents this, as the space will only allocate what it can reserve.
  10. Figure out your strip width (number of disks in a span) before building a virtual disk.  Plan this carefully because it can’t be reversed.  This has its own set of gotcha’s.
    1. This determines the increments of disks you need to expand your pool.  If you choose a 2 column mirror, that means you need 4 disks to grow the virtual disk.
    2. Enclosure awareness is also taken into account with columns.
    3. Performance is dependent on the number of columns.  if you have 80 disk, and you create a 2 column mirrored virtual disk, you’ll only have the read performance of 4 disks, and write performance of 2 disks (even with 80 disks).  You will however be able to grow at 4 disk increments.  However, if you create a virtual disk with 38 columns, you’ll have the read performance of 76 disks, and the write performance of 38, but you’ll need 76 disks to grow the pool.  So plan your balance of growth vs. performance.
  11. Find a VAR that has WSS experience to purchase your HW through.  Raid Inc. is who we used, but now other vendors such as Dell have also taken to selling WSS approved solutions.  I still prefer Raid Inc due to pricing, but if you want the warm and fuzzies, its good to know you now can go to Dell.
  12. Adding storage to an existing pool does not rebalance the data across the new disks.  The only way to do this is to create a new virtual disk, move the data over, and remove the original virtual disk.
    1. This is resolved in server 2016.

Those are most of the gotcha’s to consider.  I know it looks big, but I’m pretty confident that every storage vendor you look at, has their own unique list of gotcha’s.  Heck, a number of these are actually very similar to NetApp/ZFS, and other tier 1 storage solutions.

We’ll nerd out in my next post on what HW we ended up getting, what it basically looks like and why.

Backup Storage Part 3c: Cloud Storage

Anyone who’s heard the word “cloud” as it pertains to technology knows its a fairly nebulous term.  About the only thing people seem to grasp is that its means “not in my datacenter or home”.  When we talk about “cloud storage” the term is still fuzzy, but at least we know it has something to do with storage.  For this post, I’m going to be writing about two specific types of storage outlined below.

  • Online / Realtime storage:  This is just a name I’m going to use to describe the type of storage that you’d normally run cloud hosted VMs on.  I like to think of it as cloud SAN.  This type of storage is great for primary backup storage, both because of its performance capability and always being online.  However, do to its price, it may not be the most prudent cloud storage option for long term retention or for secondary copies.
  • Nearline or Archive storage: This storage is a kind of like tape (and depending on the vendor may be tape). You don’t write or read to this storage directly, and instead use either a virtual appliance, or some form of API / SW.  Archive and nearline storage are great options for secondary storage or for hosting secondary copies of your backups.  Other than how you interact with the storage, the only downside tends to be the potential of longer recovery times.

With two great options, what’s not to love?  You have the perfect mix of short term, fast storage for your more recent backups, and your cheap and deep storage for your secondary and extended retention copies.

Initially it sounded great, until we started digging deeper into it.  Ultimately we ended up having to pass on the solution for a number of reasons below.

  1. Cost:
    1. Let’s face it, unless you’re one of the lucky few, like us you have a tough enough time getting budget for things that might actually make your company more productive.  Trying to get your company to invest heavily in something they might never need to use, or rarely need to use is not likely to happen.
    2. This ones a toss up, but if your company prefers capex over opex, cloud is not going to be an easy battle for you.  In our case, capex is preferred, which gave cloud storage a big black eye.
    3. The storage is only one small piece of the cost of cloud options.
      1. Your network pipe is probably not big enough for all the data you need to send.  This mostly depends on your backup solution as a whole (software optimized backups with dedupe and compression may help), but I suspect you don’t have enough to send your weekly full, let alone try to recover a weekly full.  So on top of the never ending cost of cloud storage, you’re going to need to add a more expensive pipe to get the data there, and maybe even pull the data back.
      2. Potentially in addition to the network, or maybe instead of the network, you’ll need to invest in a cloud gateway or some other backup data optimizing software.  Either way, you’re going to be investing even more capex on top of your increasing opex.
  2. DR: Our new DR plan wasn’t finalized yet, which made it really difficult to pick a solution that we weren’t sure would make sense in two years.  For example, if we move our DR solution to the cloud, some of the considerations in point 1 go away, as we’ll be running our secondary site in the cloud anyway.  However, if we decided to stay in a colocation unit, while the cloud technically speaking could still work, it wouldn’t make sense to send our data to the cloud compared to just sending it to our DR site.
  3. Speed:  While point 1/3/1 (Network) if invested in correctly shouldn’t pose a bottleneck, its tough to argue against copying our data to tape from a performance perspective.  We easily saturate 5 LTO6 tape drives in parallel (off our new backup storage solution to be discussed in an upcoming post).  There’s no way that it would be cost effective to get that same level of performance out of cloud storage.
  4. Integration:  While more and more backup vendors are integrating cloud storage API’s, they’re not always free, and they’re not always good.   Veeam as an example, had a cloud solution for their product, but it was terribly inefficient (as I recall).  There was no backup optimization to the cloud.  Veeam was simply copying files for you to the cloud.  Its simple, but not efficient.  This also goes back to point 1/3/2.  We could have worked around this limitation with different backup SW, or cloud gateways, but again, we’re talking about adding cost to an already limited budget.

At the end of the day, I really wanted cloud, but financially speaking it doesn’t make much sense to backup to the cloud.  While the price of the storage its self is quiet reasonable, the cost of the pipe or SW to get the data there is what kills the solution.  I’m not saying it doesn’t make sense for others, but for us, our data was too big and the the cost would have been too high.

Factors that would change my view are below:

  1. if ISP’s prices were dropping at the same rates as cloud vendors, I could see this making cloud more affordable.
  2. If the cloud providers themselves started offering deduplication targets as part of their storage offering I think this would make a big difference.  Perhaps instead of charging what we physically consume (per GB) they could instead charge what we logically consume.  This way they still win, and so do we.

Backup Storage Part 3a: Deduplication Targets

Deduplication and backup kind of go hand in hand, so we couldn’t evaluate backup storage and not check out this segment.  We had two primary goals for a deduplication appliance.

  1. Reduce racks space while enabling us to store more data.  As you know in part 1, we had a lot of rack space being consumed.  While we weren’t hurting for rack space in our primary DC, we were in our DR DC.
  2. We were hoping that something like a deduplication target would finally enable us to get rid of tape and replicate our data to our DR site (instead of sneakernet).

For those of you not particularly versed in deduplicated storage, there are a few things to keep in mind.

  • Backup deduplication and the deduplication you’ll find running on high performance storage arrays are a little different.  Backup deduplication tends to use either variable or much smaller block size comparisons. An example, your primary array might be looking for 32k blocks that are the same, where as deduplication target might be looking for 4k blocks that are the same.  Huge difference in the deduplication potential.  The point is, just because you have deduplication baked into your primary array, does not mean its the same level of dedupliation that’s used in deduplication target.
  • Deduplication targets normally also include compression as well.  Again, its not the same level of compression found in your primary storage array, typically a more aggressive (CPU intensive) compression algorithm.
  • Deduplication targets tend to be in-line dedpulication.  Not all are, but the majority of the ones I looked at were.  There are pros and cons to this that I’ll go into later.
  • In all the appliances I’ve looked at, everyone of them had a primary access meathod of NFS/SMB.  Some of them also offered VTL, but the standard deployment method is them acting as a file share.
  • Not all deduplication targets offer whats referred to as global dedplication.  Depending on the target, you may only deduplicate at the share level.  This can make a big difference in your deduplication rates.  A true global deduplication solution, will deduplicate data across the entire target, which is the most ideal.

Now I’d like to elaborate a bit on the pros and cons of in-line vs post process (post process) deduplication.

Pros of In-Line:

  • As the name implies, data is instantly deduplicated as its being absorbed.
  • You don’t need to worry about maintaining a buffer or landing zone space like post process appliances need.
  • Once an appliance has seen the data (meaning its getting a deduplication hit) writes tend to be REALLY fast since its just metadata updates.  In turn replication speed also goes through the roof.
  • You can start replication almost instantly or in real time depending on the appliance.  Post process can’t do this, because you need to wait for the data to be deduplicated.

Pros of Post Process:

  • Data written isn’t deduplicated right away, which means if you’re doing say a tape backup right afterwards, or a DB verification, you’re not having to rehydrate the data.  Basically they tend to deal with reads a lot better.
  • Some of them actually cache the data (un-deduplicated) so that restores and other actions are fast (even days later).
  • I know this probably sounds redundant, but random disk IO in general is much better on these devices.  A good use case example would be doing a Veeam VM verification.  So not only reads in general, but random writes.

Again, like most comparisons, you can draw the inverse of each devices pros to figure out its cons.  Anyway, on to the devices we looked at.

There were three names that kept coming up in my research, EMC’s DataDomain, ExaGrid and Dell.  Its not that they’re the only players in town, HP, Quantum, Seapaton, and a few others all had appliances.  However, EMC and ExaGrid were well known, and we’re a Dell shop, so we stuck with evaluating these three devices.

Dell DR series appliances (In-line):

After doing a lot of research, discussions, demo’s the whole 9 yards.  It became very clear that Dell wasn’t isn’t the same league as the other solutions we looked at.  I’m not saying I wouldn’t recommend them, nor am I saying I wouldn’t reconsider them, but not yet, and not in its current iteration.  That said, as of this writing, its clear Dell is investing in this platform, so its certainly worth keeping an eye on.

Below are the reasons we weren’t sold on their solution at the time of evaluation.

  • At the time, they had a fairly limited set of certified backup solutions.  We planned to dump SQL straight to these devices, and SQL wasn’t on the supported list.
  • They often compared their performance to EMC, except, they were typical quoting their source side deduplicated protocol, vs. EMC’s raw (unoptimized) throughput.  Meaning it wasn’t an apples to apples comparison.  When you’re planning on transferring 100TB+ of data on a weekly basis and not everything can use source side deduplication, this makes a huge difference.  At the time we were evaluating, Dell was comparing their DR4100 vs. a DD2500.  The reality is, the Dell DR6100 is a better match for the DD2500.  Regardless, we were looking at the DD4200, so we were way above what Dell could provide.
  • They would only back a 10:1 deduplication ratio.  Now this, I don’t have a problem with.  I’d much rather a vendor be honest then claim I can fit the moon in my pocket.
  • They didn’t do multi to multi replication.  Not the end of the world, but also kind of a bummer.  Once you pick a destination, that’s it.
  • Their deduplication was at a share level, not global.  If we wanted one share for our DBA’s and one for us, no shared deduplication.
  • They didn’t support snapshots.  Not the end of the world, but its 2015, snapshots have kind of been a thing for 10+ years now.
  • Their source side deduplication protocol was only really suited to Dell products.   Given that we weren’t planning on going all in with Dell’s backup suite, this was a negative for us.
  • No one, and I mean no one was talking about them on the net.  With EMC or ExaGrid, it wasn’t hard at all to find some comments, even if they were negative.
  • They had a very limited amount of raw data (real usable capacity) that they could offer.  This is a huge negative when you consider that splitting off a new appliance means you just lost half or more of your deduplication potential.
  • There was no real analysis done to determine if they were even a good fit for our data.

ExaGrid (Post process ):

I heard pretty good things about ExaGrid after having a chat with a former EMC storage contact of mine.  If EMC has one competitor in this space, it would be ExaGrid.  Like Dell, we spent time chatting with them, researching what others said, and really just mulling on the whole solution.  Its kind of hard to solely place them in the deduplicaiton segment as they’re also scale out storage to a degree, but I think this is a more appropriate spot for them.

Pros:

  • The post process is a bit of a double edged sword.  One of the pros that I outlined above, is that data is not deduplicated right away.  This means we could use this device as our primary and archive backup storage.
  • The storage scaled out linearly in both performance and capacity.  I really like the idea of not having to forklift upgrade our unit if we grew out of it.
  • They had what I’ll refer to as “backup specialists”.  These were techs that were well versed in the backup software we’d be using with ExaGrid.  In our case SQL and Veeam.  Point being, if we had questions about maximizing our backup app with ExaGrid, they’d have folks that know not just ExaGrid but the application as well.
  • The unit pricing wasn’t simply a “lets get’em in cheap and suck’em dry later”.  Predictable (fair) pricing was part of who they are.

Cons:

  • As I mentioned, post process was a bit of a double edged sword.  One of the big negatives for us, was that their replication engine required waiting until a given file was fully deduplicated before it could begin.  So not only did we have to wait say 8 hours for a 4TB file server backup, but then we had to wait potentially another 8 hours before replication could begin.  Trying to keep any kind of RPO with that kind of variable is tough.
  • While they “scale out” their nodes, they’re not true scale out storage IMO.
    • Rather than pointing a backup target at a single share, and letting the storage figure everything out, we’d have to manually balance which backup’s go to which node.  With the number of backup’s we were talking about and the number of nodes there could be, this sounded like too much of a hassle to me.
    • The landing zone space (un-deduplicated storage) was not scale out, and was instead pinned to the local node.
    • There is no node resiliency.  Meaning if you lose one node, everything is down, or at least for that node.   While I’m not in love with giving up two or three nodes for parity, at least having it as an option would be nice.  IIRC (and could be wrong) this also affected the deduplication part of the storage cluster.
    • Individual nodes didn’t have the best throughput IMO.  While its great that you can aggregate multiple nodes throughput, if I have a single 4TB backup, I need that to go as fast as possible and I can’t break that across multiple nodes.
  • I didn’t like that the landing zone : deduplicaiton zone was manually managed on each node.  This just seemed to me like something that should be automated.

EMC DataDomain (Inline):

All I can say is there’s no wonder they’re the leader in this segment.  Just an absolutely awesome product overall.  As many who know me, I’m not a huge EMC (Expensive Machine Company) fan in general, but there area few areas they do well and this is one of them.

Pros:

  • Snapshots, file retention policies, ACL’s, they have all the basic file servers stuff you’d want and expect.
  • Multi : Multi replication.
  • Very high throughput of non-source (DDBoost) optimized data and even better when it is source optimized.
  • Easy to use (based on demo) and intuitive interface.
  • The ability to store huge amounts of data in a single unit.  At time a head swap may be required, but have the ability to simply swap the head is nice.
  • Source based optimization baked into a lot of non-EMC products, SQL and Veeam in our case.
  • Archive storage as a secondary option for data not accessed frequently.
  • End to end data integrity.  These guys were the only ones that actually bragged about it.  When I asked this question to others, they didn’t exactly instill faith in their data integrity.
  • They actually analyzed all my backup data and gave me reasonably accurate predictions of what my dedupe rate would be and how much storage I’d need.  All in all, I can’t speak highly enough about their whole sales process.  Obviously everyone wants to win, but EMC’s process was very diplomatic, non-pushy and in general a good experience.

Cons:

  • EMC provided some great initial pricing for their devices, but any upgrades would be cost prohibitive.  That said, I at least appreciate that they were up front with the upgrade costs so we knew what we were getting into.  If you go down this path yourself, my suggestion is buy a lot more storage than you need.
  • They treat archive storage and backup storage differently and it needs to be manually separated.  For the price you pay for a solution like this, I’d like to think they could auto tier the data.
  • They license al a carte.  Its not like there’s even a slew of options, I don’t get why they don’t make things all inclusive.  Its easier for the customer and its easier for them.
  • In general, the device is super expensive.  Unless you plan on storing 6+ months of data on the device, I’d bet you could do better with large cheap disks, or even something like disk to tape tiering solution (SpectraLogic Black Pearl).  Add to that, unless your data deduplicates well, you’ll also be paying through the nose for storage.
  • Going off the above statement, if you’re only keeping a few weeks worth of data on disk, you can likely build a faster solution $ for $ than what’s offered by them.
  • No cloud option for replication.  I was specifically told they see AWS as competition, not as a partner. Maybe this will change in the future, but it wasn’t something we would have banked on.

All in all, the deduplication appliances were fun to evaluate.  However, cutting to the chase, we ended up not going with any of these solutions.  As for ROI, these devices are too specialized, and too expensive for what we were looking to accomplish.  I think if you’re looking to get rid of tape (and your employer is on board), EMC DataDomain would be my first stop.  Unfortunately, for our needs, tape was staying in the picture, which meant this storage type was not a good fit.

Next up, scale out storage…