Category Archives: Storage

Wish list: Nimble Storage (2016)

Introduction:

These are just some random things that I would love to see implemented by Nimble Storage in their solution.  I don’t actually expect most of this stuff to happen in a year, but it would be interesting to see how many do.  Some are more realistic than others of course, but part of the fun in this is asking for some stuff that perhaps is a bit of a stretch.

Hardware:

  • Better than 10G networking: Given that we now have SSD’s becoming the storage of choice, its more than likely the networking which will bottleneck the SAN. I understand Nimble offers 4 10g links per controller, but this is less ideal than two 40g links.  40g is now available in all respectable server and switches, I see no reason why Nimble also shouldn’t offer the same.
  • Denser JBODs: I think Nimble has decent capacity per rack unit, but they could do a lot better. Modern JBOD’s offer as high as 84 disks in 5u and 60 disks in 4u for 3.5 inch drives and 60 disks in 2u for 2.5 inch drives.  I would love to see Nimble offer better density.  Just like they do now with SSD cache, I see no reason why the JBOD needs to be filled at time of purchase.  Make 15 disk packs available that are installable by customers, then simply add an “activate” option for the new disk span.
  • Cheap and Deep: Nimble now has a great storage line for most tier 0 – 3 workloads, but I still feel they’re missing one additionally important tier, backup. Yes, you could use their hybrid arrays for backup, but at the current cost structure, and based on the problem I raised above (rack units), I don’t feel Nimbles CS series are a prudent use of resources for backup.  Most backup data is already compressed, and random IO tends to not be a constraining factor.  I’d love to see an array designed for high throughput sequential workloads, with lots of non-compressed usable capacity.  Me personally, I’d even be ok if they offered a detuned CS series that disabled caching and compression to ensure the array is used for its designed purpose.  What I’m NOT asking for is yet another dedupe target.
  • More scalable architecture: Just my opinion, but I think the whole idea of keeping the controllers in the chassis while great for smaller workloads, doesn’t make as much sense for larger storage implementations.  There is a reason EMC, NetApp and others have external controllers.  It allows much greater expansion, and theoretically better performance.  I get the idea of having a series that can go from the smallest to the largest, with a simple hot swappable controller, but with external controllers, that would be doable as well and, it would more than likely work for completely different generations.  This would also allow more powerful controllers (quad socket as an example).  If CPU’s are truly the bottleneck, then why limit the number of cores and sockets you have access to by using a chassis that restricts such things?
  • All NVMe array: I know the tech is still new, but it would be awesome to see an NVMe array in the near future.

Hardware / Software:

  • Storage Tiers: Its either all flash or hybrid, but why not offer both under the same controller?  Simply rate the controller based on peak IOPS, and then lets us pick what storage tier (s) we want to run underneath.  Any spinning disk would go into a hybrid tier and any flash would go into a flash tier.  I’m not even looking for automated tiering, just straight up manual tiers.

Software:

  • Monitoring: I would really like to have more metrics available via SNMP that are typical on other arrays. Queue depth, average IO size, per disk metrics, CPU, RAM, etc.
  • File Protocols: Now that you have FC + iSCSI, why not add SMB + NFS as well?  I would love to see Tintri like features in Nimble.
  • Replica destinations: I’ve been asking for more than one destination for 4 years, so here it goes again.  I would like to replicate a volume to more than one destination.  If I have a traditional failover cluster, I want to replicate it locally and to DR via a snapshot.  Right now, that’s not doable TMK.
  • Full Clones: Linked clones are great for cases where you think you’re going to get rid of a volume after X period of time, but they really don’t make sense long term.  I would love an option to create full clones, or even promote existing linked clones into full clones.  Maybe I want to get rid of the parent volume and keep the clone, I can’t do that right now.
  • Full featured client agent: I would LOVE to have the ability for a client to do the following actions.  This would allow my DBA’s to do their own test / dev refreshes without me needing to delegate them access to the Nimble console.
    • Initiate a new snapshot (either at a volume group level or an individual volume)
    • Pick from a library of existing snapshots to mount
    • Clone a volume and mount it automatically
    • Delete a volume / snapshot / clone.
    • Swap a volume group (this would be a great workflow feature for UAT refreshes).
  • Virtual Appliance GA’ed:  How about letting me have access to a crippled virtual appliance so I can test some things out that might be a little risky trying in prod.  Everyone in IT knows its not a good practice to test in prod, and yet with storage, we have no choice but to do that.
  • Better Volume Management:  If there is one model that everyone should try and duplicate when it comes to management its Microsoft ACL propagation and inheritance structure.  I’m tired of going into individual volumes to change things like ACLS.  Even using volumes groups, while it helps, doesn’t eliminate the need.  I would much rather group volumes in folders (or even better tags) and apply things like protection policies and ACLS globally, rather than contend with trying to script static entries.  Additionally, being able to use tags IMO, is far smarter than say using folders.  Folders are rigid, tags are flexible.  Basically a tag can do everything a folder can do, but the reverse is not true.
  • Update Picker:  I like running the latest version of NOS, but for consistency sake, I also like having my SANs on the same revision.  There have been times where we were in the middle of a series of SAN upgrades and you released a new minor version.  When I go to download the NOS version on SAN I’m updating, it no longer matches the rev on the SAN’s I’ve already upgraded.  So ultimately I end up having to call support and request the older rev, and manually download it.
  • Centralized Console:  It would be great if you guys came out with something like a vCenter server for Nimble.  A central on premises console that I could use to do everything across all my SAN’s.  This would mitigate the need to use groups for management reasons, and instead would allow me to manage all SAN.  I could easily see this console being where things such as updates are pushed out, new volume ACL’s are created, performance monitoring is done, etc.
  • Aggressive Caching in GUI:  Offer an option right in the GUI to use default (random only), sequential and random, or pin to cache.
  • Web UI improvements:
    • Switch to a nice and fast HTML5 console.
    • Show random / sequential and write / read in the time line graph, and show it as a stacked graph instead of a line graph.
    • Don’t show cache misses for sequential IO.
    • Show CPU usage
    • Show the top 5 volumes for real time, last minute, last 5 minutes, last 15 minutes, last hour and last day for the following metrics.
      • Cache misses
      • Total IOPS
      • Total throughput
      • CPU usage
  • Use GUIDS not friendly names:  I would actually like to see Nimble switch to using a generic GUID for volume ID, and then have a simple friendly name that’s associated with it.  There are times where I wanted to change my naming convention, but doing so, would require detaching the volume from the host that’s serving it.
  • Per Volume / Per host MPIO policy:  Right now it seems the entire array needs to be enabled for intelligent MPIO.  I would actually like to see there be an additional option to only do this for certain initiator groups or certain volumes.
  • Snapshot browser:  I would love a tool that would allow us to open a snapshot through the management interface and browse for files to recover, rather than having to mount a snapshot to the OS.  Even better would be if I could open a Vmware snapshot and then open a VMDK as well.

Backup Storage Part 3a: Deduplication Targets

Deduplication and backup kind of go hand in hand, so we couldn’t evaluate backup storage and not check out this segment.  We had two primary goals for a deduplication appliance.

  1. Reduce racks space while enabling us to store more data.  As you know in part 1, we had a lot of rack space being consumed.  While we weren’t hurting for rack space in our primary DC, we were in our DR DC.
  2. We were hoping that something like a deduplication target would finally enable us to get rid of tape and replicate our data to our DR site (instead of sneakernet).

For those of you not particularly versed in deduplicated storage, there are a few things to keep in mind.

  • Backup deduplication and the deduplication you’ll find running on high performance storage arrays are a little different.  Backup deduplication tends to use either variable or much smaller block size comparisons. An example, your primary array might be looking for 32k blocks that are the same, where as deduplication target might be looking for 4k blocks that are the same.  Huge difference in the deduplication potential.  The point is, just because you have deduplication baked into your primary array, does not mean its the same level of dedupliation that’s used in deduplication target.
  • Deduplication targets normally also include compression as well.  Again, its not the same level of compression found in your primary storage array, typically a more aggressive (CPU intensive) compression algorithm.
  • Deduplication targets tend to be in-line dedpulication.  Not all are, but the majority of the ones I looked at were.  There are pros and cons to this that I’ll go into later.
  • In all the appliances I’ve looked at, everyone of them had a primary access meathod of NFS/SMB.  Some of them also offered VTL, but the standard deployment method is them acting as a file share.
  • Not all deduplication targets offer whats referred to as global dedplication.  Depending on the target, you may only deduplicate at the share level.  This can make a big difference in your deduplication rates.  A true global deduplication solution, will deduplicate data across the entire target, which is the most ideal.

Now I’d like to elaborate a bit on the pros and cons of in-line vs post process (post process) deduplication.

Pros of In-Line:

  • As the name implies, data is instantly deduplicated as its being absorbed.
  • You don’t need to worry about maintaining a buffer or landing zone space like post process appliances need.
  • Once an appliance has seen the data (meaning its getting a deduplication hit) writes tend to be REALLY fast since its just metadata updates.  In turn replication speed also goes through the roof.
  • You can start replication almost instantly or in real time depending on the appliance.  Post process can’t do this, because you need to wait for the data to be deduplicated.

Pros of Post Process:

  • Data written isn’t deduplicated right away, which means if you’re doing say a tape backup right afterwards, or a DB verification, you’re not having to rehydrate the data.  Basically they tend to deal with reads a lot better.
  • Some of them actually cache the data (un-deduplicated) so that restores and other actions are fast (even days later).
  • I know this probably sounds redundant, but random disk IO in general is much better on these devices.  A good use case example would be doing a Veeam VM verification.  So not only reads in general, but random writes.

Again, like most comparisons, you can draw the inverse of each devices pros to figure out its cons.  Anyway, on to the devices we looked at.

There were three names that kept coming up in my research, EMC’s DataDomain, ExaGrid and Dell.  Its not that they’re the only players in town, HP, Quantum, Seapaton, and a few others all had appliances.  However, EMC and ExaGrid were well known, and we’re a Dell shop, so we stuck with evaluating these three devices.

Dell DR series appliances (In-line):

After doing a lot of research, discussions, demo’s the whole 9 yards.  It became very clear that Dell wasn’t isn’t the same league as the other solutions we looked at.  I’m not saying I wouldn’t recommend them, nor am I saying I wouldn’t reconsider them, but not yet, and not in its current iteration.  That said, as of this writing, its clear Dell is investing in this platform, so its certainly worth keeping an eye on.

Below are the reasons we weren’t sold on their solution at the time of evaluation.

  • At the time, they had a fairly limited set of certified backup solutions.  We planned to dump SQL straight to these devices, and SQL wasn’t on the supported list.
  • They often compared their performance to EMC, except, they were typical quoting their source side deduplicated protocol, vs. EMC’s raw (unoptimized) throughput.  Meaning it wasn’t an apples to apples comparison.  When you’re planning on transferring 100TB+ of data on a weekly basis and not everything can use source side deduplication, this makes a huge difference.  At the time we were evaluating, Dell was comparing their DR4100 vs. a DD2500.  The reality is, the Dell DR6100 is a better match for the DD2500.  Regardless, we were looking at the DD4200, so we were way above what Dell could provide.
  • They would only back a 10:1 deduplication ratio.  Now this, I don’t have a problem with.  I’d much rather a vendor be honest then claim I can fit the moon in my pocket.
  • They didn’t do multi to multi replication.  Not the end of the world, but also kind of a bummer.  Once you pick a destination, that’s it.
  • Their deduplication was at a share level, not global.  If we wanted one share for our DBA’s and one for us, no shared deduplication.
  • They didn’t support snapshots.  Not the end of the world, but its 2015, snapshots have kind of been a thing for 10+ years now.
  • Their source side deduplication protocol was only really suited to Dell products.   Given that we weren’t planning on going all in with Dell’s backup suite, this was a negative for us.
  • No one, and I mean no one was talking about them on the net.  With EMC or ExaGrid, it wasn’t hard at all to find some comments, even if they were negative.
  • They had a very limited amount of raw data (real usable capacity) that they could offer.  This is a huge negative when you consider that splitting off a new appliance means you just lost half or more of your deduplication potential.
  • There was no real analysis done to determine if they were even a good fit for our data.

ExaGrid (Post process ):

I heard pretty good things about ExaGrid after having a chat with a former EMC storage contact of mine.  If EMC has one competitor in this space, it would be ExaGrid.  Like Dell, we spent time chatting with them, researching what others said, and really just mulling on the whole solution.  Its kind of hard to solely place them in the deduplicaiton segment as they’re also scale out storage to a degree, but I think this is a more appropriate spot for them.

Pros:

  • The post process is a bit of a double edged sword.  One of the pros that I outlined above, is that data is not deduplicated right away.  This means we could use this device as our primary and archive backup storage.
  • The storage scaled out linearly in both performance and capacity.  I really like the idea of not having to forklift upgrade our unit if we grew out of it.
  • They had what I’ll refer to as “backup specialists”.  These were techs that were well versed in the backup software we’d be using with ExaGrid.  In our case SQL and Veeam.  Point being, if we had questions about maximizing our backup app with ExaGrid, they’d have folks that know not just ExaGrid but the application as well.
  • The unit pricing wasn’t simply a “lets get’em in cheap and suck’em dry later”.  Predictable (fair) pricing was part of who they are.

Cons:

  • As I mentioned, post process was a bit of a double edged sword.  One of the big negatives for us, was that their replication engine required waiting until a given file was fully deduplicated before it could begin.  So not only did we have to wait say 8 hours for a 4TB file server backup, but then we had to wait potentially another 8 hours before replication could begin.  Trying to keep any kind of RPO with that kind of variable is tough.
  • While they “scale out” their nodes, they’re not true scale out storage IMO.
    • Rather than pointing a backup target at a single share, and letting the storage figure everything out, we’d have to manually balance which backup’s go to which node.  With the number of backup’s we were talking about and the number of nodes there could be, this sounded like too much of a hassle to me.
    • The landing zone space (un-deduplicated storage) was not scale out, and was instead pinned to the local node.
    • There is no node resiliency.  Meaning if you lose one node, everything is down, or at least for that node.   While I’m not in love with giving up two or three nodes for parity, at least having it as an option would be nice.  IIRC (and could be wrong) this also affected the deduplication part of the storage cluster.
    • Individual nodes didn’t have the best throughput IMO.  While its great that you can aggregate multiple nodes throughput, if I have a single 4TB backup, I need that to go as fast as possible and I can’t break that across multiple nodes.
  • I didn’t like that the landing zone : deduplicaiton zone was manually managed on each node.  This just seemed to me like something that should be automated.

EMC DataDomain (Inline):

All I can say is there’s no wonder they’re the leader in this segment.  Just an absolutely awesome product overall.  As many who know me, I’m not a huge EMC (Expensive Machine Company) fan in general, but there area few areas they do well and this is one of them.

Pros:

  • Snapshots, file retention policies, ACL’s, they have all the basic file servers stuff you’d want and expect.
  • Multi : Multi replication.
  • Very high throughput of non-source (DDBoost) optimized data and even better when it is source optimized.
  • Easy to use (based on demo) and intuitive interface.
  • The ability to store huge amounts of data in a single unit.  At time a head swap may be required, but have the ability to simply swap the head is nice.
  • Source based optimization baked into a lot of non-EMC products, SQL and Veeam in our case.
  • Archive storage as a secondary option for data not accessed frequently.
  • End to end data integrity.  These guys were the only ones that actually bragged about it.  When I asked this question to others, they didn’t exactly instill faith in their data integrity.
  • They actually analyzed all my backup data and gave me reasonably accurate predictions of what my dedupe rate would be and how much storage I’d need.  All in all, I can’t speak highly enough about their whole sales process.  Obviously everyone wants to win, but EMC’s process was very diplomatic, non-pushy and in general a good experience.

Cons:

  • EMC provided some great initial pricing for their devices, but any upgrades would be cost prohibitive.  That said, I at least appreciate that they were up front with the upgrade costs so we knew what we were getting into.  If you go down this path yourself, my suggestion is buy a lot more storage than you need.
  • They treat archive storage and backup storage differently and it needs to be manually separated.  For the price you pay for a solution like this, I’d like to think they could auto tier the data.
  • They license al a carte.  Its not like there’s even a slew of options, I don’t get why they don’t make things all inclusive.  Its easier for the customer and its easier for them.
  • In general, the device is super expensive.  Unless you plan on storing 6+ months of data on the device, I’d bet you could do better with large cheap disks, or even something like disk to tape tiering solution (SpectraLogic Black Pearl).  Add to that, unless your data deduplicates well, you’ll also be paying through the nose for storage.
  • Going off the above statement, if you’re only keeping a few weeks worth of data on disk, you can likely build a faster solution $ for $ than what’s offered by them.
  • No cloud option for replication.  I was specifically told they see AWS as competition, not as a partner. Maybe this will change in the future, but it wasn’t something we would have banked on.

All in all, the deduplication appliances were fun to evaluate.  However, cutting to the chase, we ended up not going with any of these solutions.  As for ROI, these devices are too specialized, and too expensive for what we were looking to accomplish.  I think if you’re looking to get rid of tape (and your employer is on board), EMC DataDomain would be my first stop.  Unfortunately, for our needs, tape was staying in the picture, which meant this storage type was not a good fit.

Next up, scale out storage…

Backup Storage Part 2: What we wanted

In part 1, I went over our legacy backup storage solution, and why it needed to change.  This section, I’m going to outline what we were looking for in our next storage refresh.

Also, just to give you a little more context, while evaluating storage, were also looking at new backup solutions.  CommVault, Veeam, EMC (Avamar and Networker), NetVault, Microsoft DPM, and a handful of cloud solutions.  Point being, there were a lot of moving parts and a lot of things to consider.  I’ll dive more into this in the coming sections, but wanted to let you know it was more than just a simple storage refresh.

The core of what we were seeking is outlined below.

  1. Capacity: We wanted capacity, LOTS of capacity.  88TB’s of storage wasn’t cutting it.
  2. Scalability: Just as important as capacity, and obviously related, we needed the ability to scale the capacity and performance.  It didn’t always have to be easy, but we needed it to be doable.
  3. Performance: We weren’t looking for 100K IOps, but we were looking for multiple GBps in throughput, that’s bytes with an uppercase B.
  4. Reliable/Resilient: The storage needed to be reasonably reliable, we weren’t looking for five 9’s or anything crazy like that.  However, we didn’t want to be down for a day at a time, so something with resiliency built in, or a really great on-site warranty was highly desired.
  5. Easy to use: There are tons of solutions out there, but not all of them are easy to use.
  6. Affordable: Again, there’s lots of solutions out there, but not all of them are affordable for backup storage.
  7. Enterprise support: Sort of related to easy to use, but not the same, we needed something with a support contract.  When the stuff hits the fan, we needed someone to fallback on.

At a deeper level, we also wanted to evaluate a few storage architectures.  Each one, meets aspects of our core requirements.

  1. Deduplication Targets:  We weren’t as concerned about deduplications ability to store lots of redundant data on few disks (storage is cheap), but we were interested in the side effect of really efficient replication (in theory).
  2. Scale out storage:  What we were looking for out of this architecture was the ability to limitlessly scale.
  3. Cloud Storage: We liked the idea of our backup’s being off site (get rid of tape).  Also in theory it too was easy to scale (for us).
  4. Traditional SAN / NAS:  Not much worth explaining here, other than that this would be a reliable fallback if the other two architectures didn’t pan out.

With all that out there, it was clear we had a lot of conflicting criteria.  We all know that something can’t be fast, reliable and affordable.  It was apparent that some concessions would need to be made, but we hadn’t figured out what those were yet.  However, after evaluating a lot of vendors and solutions, passing along quotes to management, things started to become much clearer towards the end.  That, is something I’m going to go over in my upcoming sections. There is too much information to force it all in one upcoming section, so I’ll be breaking it out further.

-Eric

Backup Storage Part 1: Why it needed to change

Introduction:

This is going to be a multi-part series where I walk you through the whole process we took in evaluating, implementing and living with a new backup storage solution.  While its not a perfect solution, given the parameters we had to work within, I think we ended up with some very decent storage.

Backup Storage Part 1: The “Why” it needed to change

Last year my team and I began a project to overhaul our backup solution and part of that solution involved researching some new storage options.   At the time, we ran what most folks would on a budget, which is simply a server, with some local DAS. It was a Dell 2900 (IIRC) with 6 MD1000’s and 1 MD1200.  The solution was originally designed to manage about 10TB of data, and really was never expected to handle what ultimately ended up being much greater than that.  The solution was less than ideal for a lot of reasons that I’m sharing below.

  • It wasn’t just a storage unit, it also ran all the backup server (CommVault) on top of it, and our tape drives were  locally attached as well.  Basically a single box to handle a lot of data and ALL aspects of managing it.
  • The whole solution was a single point of failure and many of its sub-components were singe points of failure.
  • This solution was 7 years old, and had been grown organically, one JBOD at a time.  This had a few pitfalls:
    • JBODs were daisy chained off each other.  Which meant that while you added more capacity, and spindles, the throughput was ultimately limited to 12Gbps for each chain (SAS x4 3G).  We only had two chains for the md1000’s and one chain / JBOD for the md1200.
    • The JBODs were carved up into independent LUN’s, which from CommVaults view was fine (awesome SW), but it left potential IOPS on the table.  So as we added JBODS the IOPS didn’t linearly increase per say.  Sure the “aggregate” IOPS increased, but a single job, is now as limited to the speed of a 15 disk RAID 6.  Instead of the potential of say a 60 disk RAID 60.
  • The disks at the time were fast (for SATA drives that is) but compared to modern NL-SAS drives, much lower throughput capability and density.
  • The PCI bus and the FSB (this server still had a FSB) was overwhelmed.  Remember, this was doing tape AND disk copies.  I know a lot of less seasoned folks don’t think its easy to overwhelmed a PCI bus, but that’s not actually true (more on that later) even more so when you’re PCI bus is version 1.x.
  • This solution consumed a TON or rack space, each JBOD was 3U and we had 7 of them (md1200 was 2U).  And each drive was only 1TB, so best case with 15 disk RAID 6, we were looking at 13TB usable   By today’s standards, this TB per RU is terrible even for tier 1 storage, let alone backup.
  • We were using a RAID 6 instead of 10.  I know what some of you are thinking, its backup, so why would you use RAID 10?    Backup’s are probably every much as disk intensive as your production workloads and likely more so.  On top of that, we force them to do all their work in what’s typically a very constrained window of time.  RAID 6 while great for sequential / random reads, does horribly at writes in comparison to a RAID 10 (I’m exuding fancy file systems like CASL for this generalization).  Unless you’re running one backup at a time, you’re likely throwing tons of  parallel writes at this storage.  And while each stream may be sequential in nature, the aggregation of them is random in appearance to the storage.  At the end of the day, disk is cheap, and cheap disk (NL-SAS) is even cheaper so splurge on RAID 10.
    • This was also compounded by the point I made above about one LUN per JBOD
  • It was locked into what IMO is a less than ideal Dell (LSI) RAID card.  Again, I know what some of you are thinking.  “HW” RAID is SO much better than SW RAID.  Its a common and pervasive myth.  EMC, NetApp, HP, IBM, etc. are all simple x86 servers with a really fancy SW RAID.  SW RAID is fine, so long as the SW that’s doing the RAID is good.  In fact, SW RAID is not only fine, in many cases, it’s FAR better than a HW RAID card.    Now, I’m not saying LSI sucks at RAID, they’re fine cards, but SW has come a long way, and I really see them as the preferred solution over HW RAID.  I’m not going to go into the WHY’s in this post, but if you’re curious, do some research of your own, until I have time to write another “vs” article.
  • Using Dell, HP, IBM, etc for bulk storage is EXPENSIVE compared to what I’ll call 2nd tier solutions.  Think of this as somewhere in-between Dell and home brewing your own solution.
    • Add on to this, manufactures only want you running “their” disks in “their” JBODs.  Which means not only are you stuck paying a lot for a false sense of security, you’re also incredibly limited to what options you have for your storage.
  • All the HW was approaching EOL.

There’s probably a few more reason why this solution was no longer ideal, but you get the point.  The reality is, our backup solution was changing, and it was a perfect time to re-think our storage.  In part 2, we’ll get into what we thought we wanted, what we needed, and what we had budget for.