Backup Storage Part 3a: Deduplication Targets

Deduplication and backup kind of go hand in hand, so we couldn’t evaluate backup storage and not check out this segment.  We had two primary goals for a deduplication appliance.

  1. Reduce racks space while enabling us to store more data.  As you know in part 1, we had a lot of rack space being consumed.  While we weren’t hurting for rack space in our primary DC, we were in our DR DC.
  2. We were hoping that something like a deduplication target would finally enable us to get rid of tape and replicate our data to our DR site (instead of sneakernet).

For those of you not particularly versed in deduplicated storage, there are a few things to keep in mind.

  • Backup deduplication and the deduplication you’ll find running on high performance storage arrays are a little different.  Backup deduplication tends to use either variable or much smaller block size comparisons. An example, your primary array might be looking for 32k blocks that are the same, where as deduplication target might be looking for 4k blocks that are the same.  Huge difference in the deduplication potential.  The point is, just because you have deduplication baked into your primary array, does not mean its the same level of dedupliation that’s used in deduplication target.
  • Deduplication targets normally also include compression as well.  Again, its not the same level of compression found in your primary storage array, typically a more aggressive (CPU intensive) compression algorithm.
  • Deduplication targets tend to be in-line dedpulication.  Not all are, but the majority of the ones I looked at were.  There are pros and cons to this that I’ll go into later.
  • In all the appliances I’ve looked at, everyone of them had a primary access meathod of NFS/SMB.  Some of them also offered VTL, but the standard deployment method is them acting as a file share.
  • Not all deduplication targets offer whats referred to as global dedplication.  Depending on the target, you may only deduplicate at the share level.  This can make a big difference in your deduplication rates.  A true global deduplication solution, will deduplicate data across the entire target, which is the most ideal.

Now I’d like to elaborate a bit on the pros and cons of in-line vs post process (post process) deduplication.

Pros of In-Line:

  • As the name implies, data is instantly deduplicated as its being absorbed.
  • You don’t need to worry about maintaining a buffer or landing zone space like post process appliances need.
  • Once an appliance has seen the data (meaning its getting a deduplication hit) writes tend to be REALLY fast since its just metadata updates.  In turn replication speed also goes through the roof.
  • You can start replication almost instantly or in real time depending on the appliance.  Post process can’t do this, because you need to wait for the data to be deduplicated.

Pros of Post Process:

  • Data written isn’t deduplicated right away, which means if you’re doing say a tape backup right afterwards, or a DB verification, you’re not having to rehydrate the data.  Basically they tend to deal with reads a lot better.
  • Some of them actually cache the data (un-deduplicated) so that restores and other actions are fast (even days later).
  • I know this probably sounds redundant, but random disk IO in general is much better on these devices.  A good use case example would be doing a Veeam VM verification.  So not only reads in general, but random writes.

Again, like most comparisons, you can draw the inverse of each devices pros to figure out its cons.  Anyway, on to the devices we looked at.

There were three names that kept coming up in my research, EMC’s DataDomain, ExaGrid and Dell.  Its not that they’re the only players in town, HP, Quantum, Seapaton, and a few others all had appliances.  However, EMC and ExaGrid were well known, and we’re a Dell shop, so we stuck with evaluating these three devices.

Dell DR series appliances (In-line):

After doing a lot of research, discussions, demo’s the whole 9 yards.  It became very clear that Dell wasn’t isn’t the same league as the other solutions we looked at.  I’m not saying I wouldn’t recommend them, nor am I saying I wouldn’t reconsider them, but not yet, and not in its current iteration.  That said, as of this writing, its clear Dell is investing in this platform, so its certainly worth keeping an eye on.

Below are the reasons we weren’t sold on their solution at the time of evaluation.

  • At the time, they had a fairly limited set of certified backup solutions.  We planned to dump SQL straight to these devices, and SQL wasn’t on the supported list.
  • They often compared their performance to EMC, except, they were typical quoting their source side deduplicated protocol, vs. EMC’s raw (unoptimized) throughput.  Meaning it wasn’t an apples to apples comparison.  When you’re planning on transferring 100TB+ of data on a weekly basis and not everything can use source side deduplication, this makes a huge difference.  At the time we were evaluating, Dell was comparing their DR4100 vs. a DD2500.  The reality is, the Dell DR6100 is a better match for the DD2500.  Regardless, we were looking at the DD4200, so we were way above what Dell could provide.
  • They would only back a 10:1 deduplication ratio.  Now this, I don’t have a problem with.  I’d much rather a vendor be honest then claim I can fit the moon in my pocket.
  • They didn’t do multi to multi replication.  Not the end of the world, but also kind of a bummer.  Once you pick a destination, that’s it.
  • Their deduplication was at a share level, not global.  If we wanted one share for our DBA’s and one for us, no shared deduplication.
  • They didn’t support snapshots.  Not the end of the world, but its 2015, snapshots have kind of been a thing for 10+ years now.
  • Their source side deduplication protocol was only really suited to Dell products.   Given that we weren’t planning on going all in with Dell’s backup suite, this was a negative for us.
  • No one, and I mean no one was talking about them on the net.  With EMC or ExaGrid, it wasn’t hard at all to find some comments, even if they were negative.
  • They had a very limited amount of raw data (real usable capacity) that they could offer.  This is a huge negative when you consider that splitting off a new appliance means you just lost half or more of your deduplication potential.
  • There was no real analysis done to determine if they were even a good fit for our data.

ExaGrid (Post process ):

I heard pretty good things about ExaGrid after having a chat with a former EMC storage contact of mine.  If EMC has one competitor in this space, it would be ExaGrid.  Like Dell, we spent time chatting with them, researching what others said, and really just mulling on the whole solution.  Its kind of hard to solely place them in the deduplicaiton segment as they’re also scale out storage to a degree, but I think this is a more appropriate spot for them.

Pros:

  • The post process is a bit of a double edged sword.  One of the pros that I outlined above, is that data is not deduplicated right away.  This means we could use this device as our primary and archive backup storage.
  • The storage scaled out linearly in both performance and capacity.  I really like the idea of not having to forklift upgrade our unit if we grew out of it.
  • They had what I’ll refer to as “backup specialists”.  These were techs that were well versed in the backup software we’d be using with ExaGrid.  In our case SQL and Veeam.  Point being, if we had questions about maximizing our backup app with ExaGrid, they’d have folks that know not just ExaGrid but the application as well.
  • The unit pricing wasn’t simply a “lets get’em in cheap and suck’em dry later”.  Predictable (fair) pricing was part of who they are.

Cons:

  • As I mentioned, post process was a bit of a double edged sword.  One of the big negatives for us, was that their replication engine required waiting until a given file was fully deduplicated before it could begin.  So not only did we have to wait say 8 hours for a 4TB file server backup, but then we had to wait potentially another 8 hours before replication could begin.  Trying to keep any kind of RPO with that kind of variable is tough.
  • While they “scale out” their nodes, they’re not true scale out storage IMO.
    • Rather than pointing a backup target at a single share, and letting the storage figure everything out, we’d have to manually balance which backup’s go to which node.  With the number of backup’s we were talking about and the number of nodes there could be, this sounded like too much of a hassle to me.
    • The landing zone space (un-deduplicated storage) was not scale out, and was instead pinned to the local node.
    • There is no node resiliency.  Meaning if you lose one node, everything is down, or at least for that node.   While I’m not in love with giving up two or three nodes for parity, at least having it as an option would be nice.  IIRC (and could be wrong) this also affected the deduplication part of the storage cluster.
    • Individual nodes didn’t have the best throughput IMO.  While its great that you can aggregate multiple nodes throughput, if I have a single 4TB backup, I need that to go as fast as possible and I can’t break that across multiple nodes.
  • I didn’t like that the landing zone : deduplicaiton zone was manually managed on each node.  This just seemed to me like something that should be automated.

EMC DataDomain (Inline):

All I can say is there’s no wonder they’re the leader in this segment.  Just an absolutely awesome product overall.  As many who know me, I’m not a huge EMC (Expensive Machine Company) fan in general, but there area few areas they do well and this is one of them.

Pros:

  • Snapshots, file retention policies, ACL’s, they have all the basic file servers stuff you’d want and expect.
  • Multi : Multi replication.
  • Very high throughput of non-source (DDBoost) optimized data and even better when it is source optimized.
  • Easy to use (based on demo) and intuitive interface.
  • The ability to store huge amounts of data in a single unit.  At time a head swap may be required, but have the ability to simply swap the head is nice.
  • Source based optimization baked into a lot of non-EMC products, SQL and Veeam in our case.
  • Archive storage as a secondary option for data not accessed frequently.
  • End to end data integrity.  These guys were the only ones that actually bragged about it.  When I asked this question to others, they didn’t exactly instill faith in their data integrity.
  • They actually analyzed all my backup data and gave me reasonably accurate predictions of what my dedupe rate would be and how much storage I’d need.  All in all, I can’t speak highly enough about their whole sales process.  Obviously everyone wants to win, but EMC’s process was very diplomatic, non-pushy and in general a good experience.

Cons:

  • EMC provided some great initial pricing for their devices, but any upgrades would be cost prohibitive.  That said, I at least appreciate that they were up front with the upgrade costs so we knew what we were getting into.  If you go down this path yourself, my suggestion is buy a lot more storage than you need.
  • They treat archive storage and backup storage differently and it needs to be manually separated.  For the price you pay for a solution like this, I’d like to think they could auto tier the data.
  • They license al a carte.  Its not like there’s even a slew of options, I don’t get why they don’t make things all inclusive.  Its easier for the customer and its easier for them.
  • In general, the device is super expensive.  Unless you plan on storing 6+ months of data on the device, I’d bet you could do better with large cheap disks, or even something like disk to tape tiering solution (SpectraLogic Black Pearl).  Add to that, unless your data deduplicates well, you’ll also be paying through the nose for storage.
  • Going off the above statement, if you’re only keeping a few weeks worth of data on disk, you can likely build a faster solution $ for $ than what’s offered by them.
  • No cloud option for replication.  I was specifically told they see AWS as competition, not as a partner. Maybe this will change in the future, but it wasn’t something we would have banked on.

All in all, the deduplication appliances were fun to evaluate.  However, cutting to the chase, we ended up not going with any of these solutions.  As for ROI, these devices are too specialized, and too expensive for what we were looking to accomplish.  I think if you’re looking to get rid of tape (and your employer is on board), EMC DataDomain would be my first stop.  Unfortunately, for our needs, tape was staying in the picture, which meant this storage type was not a good fit.

Next up, scale out storage…

VMware vSAN in my environment? not yet…

When VMware first announced vSAN, the premiss of the solution was just pure awesomeness.  After all, VMware has the best hypervisor (fact, not opinion), they’re in the process of honing what I think will be a Cisco butt kicking software defined network solution. Basically vSAN was the only missing piece to building a software defined datacenter.  However, like NSX, I just don’t think vSAN is at a point where they can replace my tried and true shared storage solution.  Thats not a knock against HyperConvergence (although I have my reservations with the architecture as a whole), rather VMware’s current implementation.

Before getting into my the current reservations with vSAN , I wanted to highlight one awesome feature I love about vSAN.  Its not the kernel integration, its not that its from VMware, no its simply that its a truly software only solution.  I really dig this part of vSAN.  There are plenty of things I don’t like about vSAN, but this is one area they did right and that I wish other vendors would follow.  Seriously, its 2015 and we STILL buy our storage / network as appliances.  It sucks IMO, and I’m sick of being tethered to some vendors crappy HW platform (I’m looking at you Nimble Storage).  With vSAN, so long as the SW doesn’t get in the way (and it does to a degree), you can build a solution on your terms.  Want an all Intel platform? Go for it! Want FusionIO (SanDisk)? Go for it! Want 6TB drives? Go for it! Want the latest 18 core procs from Intel?  Go for it!  Do you want Dell, HP, IBM, Cisco, Quanta, or Supermicro servers?  Take your pick…  It is the way solutions in this day an age should be, or at least have an option to be.

Cons of vSAN in my opinion are plentiful.

  • Lack of tiers:  One very simple thing VMware could do that would ease with my adoption of their solution, is allow me to have two different tiers of storage.  One that’s all flash and one that’s hybrid.  This way my file serves can go in a hybrid pool, and my SQL servers can go into the all flash.  I’m not even looking for automation, just static / manual tiers.
  • In-line compression: I would love to have a deduplicated + compression solution, but if VMware simply offered in-line compression, that would make a world of difference in making those expensive flash drives go a little farther.  Not to mention the potential throughput improvements.
  • Disk groups:  This is one area I just don’t get.  Why do I need to have disk groups?  The architecture, from my view, just seems unneeded.  Here is what I WISH vSAN actually looked like:
    • Disks come in three classifications:
      • Write Cache: I want a dedicated write cache, mixing read / write cache on the same device is wasteful.  Now I need to run a massive / expensive cache drive.  It needs to be massive (hybrid design) so that it can actually have enough capacity to cache my working set.  it also needs to be expensive because it needs to deliver good write performance and have decent write endurance.  Just imagine what vSAN’s write performance would be if I could use a NVRAM based write cache.
      • Read Cache: No need to hash this out, but I’d want a dedicated read cache.
      • Data Disk:  Pretty self explanatory, and this could be either SSD or HDD.
    • Let me pool these devices rather than “grouping” them.  Create one simple rule, you get 35 disks per host, do with them as you please.
    • Let me create sub-vSAN’s out of this pool of data, AND let me make decisions like “this vSAN only runs on hosts 1 -5, and this other vSAN runs on hosts 6 – 10”.  I would love to institute some form of true separation for environments that have multiple nodes.  For example, to exchange servers, I’d like to make sure the disks are never on the same hosts storage, ever, even the redundant parts.  Maybe this feature already exists.  I’d still want hosts to be able to float to any compute node (so long as they’re not on the same node).  This is also where I could see a vSAN being created for hybrid pools AND SSD only pools.
    • I know its probably complex and CPU intensive, but at least provide a parity based option.  Copies are great for resiliency, but man do they consume a lot of capacity.  Then again, see my point about in-line compression.
  • Replication at the vSAN level: Don’t make me fire some appliance up for this feature, this should be baked into the code and as simple as “right click the VM and pick a replica destination”.  Obviously you’d want groups and polices, and all that good stuff, but you get my point.
  • Standalone option: I would actually consider vSAN (now) if it wasn’t converged.  I know that must sound like blasphemy, but I’d really love an option to build a scale out storage solution that anything could use, including VMware.  Having it converged is a really cool option, but I’d also like the opposite.
  • Easier to setup:  Its not that it looks hard, but when you have third parties or enthusiasts creating tools to make your product easier to setup, to me its clear you dropped the ball.
  • Real world benchmarks / configurations:  This is one area where if you going to offer a software only solution, you need to work a little harder.  You can’t hide behind the “oh everyone’s environment is different.” or the “your mileage may vary”.  On top of that, when your marketing states that you do 45k IOPS in a hybrid configuration and then your marketing engineer releases a blog article showing you doing 80k IOPS http://blogs.vmware.com/storage/2015/03/17/double-vsan-performance/  it tells me that VMware 100% what its own limits are, and that’s a problem.  I’m not saying they need to lab out every single possible scenario, but put together a few examples for each vSAN type (hybrid or all flash).  I realize Dell, HP, and the various other partners are partly to blame here, but then again, its not entirely in their best interest for vSAN to succeed.

Overall, I think vSAN is cool, but its not at a point where its truly an enterprise solution.

 

Backup Storage Part 2: What we wanted

In part 1, I went over our legacy backup storage solution, and why it needed to change.  This section, I’m going to outline what we were looking for in our next storage refresh.

Also, just to give you a little more context, while evaluating storage, were also looking at new backup solutions.  CommVault, Veeam, EMC (Avamar and Networker), NetVault, Microsoft DPM, and a handful of cloud solutions.  Point being, there were a lot of moving parts and a lot of things to consider.  I’ll dive more into this in the coming sections, but wanted to let you know it was more than just a simple storage refresh.

The core of what we were seeking is outlined below.

  1. Capacity: We wanted capacity, LOTS of capacity.  88TB’s of storage wasn’t cutting it.
  2. Scalability: Just as important as capacity, and obviously related, we needed the ability to scale the capacity and performance.  It didn’t always have to be easy, but we needed it to be doable.
  3. Performance: We weren’t looking for 100K IOps, but we were looking for multiple GBps in throughput, that’s bytes with an uppercase B.
  4. Reliable/Resilient: The storage needed to be reasonably reliable, we weren’t looking for five 9’s or anything crazy like that.  However, we didn’t want to be down for a day at a time, so something with resiliency built in, or a really great on-site warranty was highly desired.
  5. Easy to use: There are tons of solutions out there, but not all of them are easy to use.
  6. Affordable: Again, there’s lots of solutions out there, but not all of them are affordable for backup storage.
  7. Enterprise support: Sort of related to easy to use, but not the same, we needed something with a support contract.  When the stuff hits the fan, we needed someone to fallback on.

At a deeper level, we also wanted to evaluate a few storage architectures.  Each one, meets aspects of our core requirements.

  1. Deduplication Targets:  We weren’t as concerned about deduplications ability to store lots of redundant data on few disks (storage is cheap), but we were interested in the side effect of really efficient replication (in theory).
  2. Scale out storage:  What we were looking for out of this architecture was the ability to limitlessly scale.
  3. Cloud Storage: We liked the idea of our backup’s being off site (get rid of tape).  Also in theory it too was easy to scale (for us).
  4. Traditional SAN / NAS:  Not much worth explaining here, other than that this would be a reliable fallback if the other two architectures didn’t pan out.

With all that out there, it was clear we had a lot of conflicting criteria.  We all know that something can’t be fast, reliable and affordable.  It was apparent that some concessions would need to be made, but we hadn’t figured out what those were yet.  However, after evaluating a lot of vendors and solutions, passing along quotes to management, things started to become much clearer towards the end.  That, is something I’m going to go over in my upcoming sections. There is too much information to force it all in one upcoming section, so I’ll be breaking it out further.

-Eric

Backup Storage Part 1: Why it needed to change

Introduction:

This is going to be a multi-part series where I walk you through the whole process we took in evaluating, implementing and living with a new backup storage solution.  While its not a perfect solution, given the parameters we had to work within, I think we ended up with some very decent storage.

Backup Storage Part 1: The “Why” it needed to change

Last year my team and I began a project to overhaul our backup solution and part of that solution involved researching some new storage options.   At the time, we ran what most folks would on a budget, which is simply a server, with some local DAS. It was a Dell 2900 (IIRC) with 6 MD1000’s and 1 MD1200.  The solution was originally designed to manage about 10TB of data, and really was never expected to handle what ultimately ended up being much greater than that.  The solution was less than ideal for a lot of reasons that I’m sharing below.

  • It wasn’t just a storage unit, it also ran all the backup server (CommVault) on top of it, and our tape drives were  locally attached as well.  Basically a single box to handle a lot of data and ALL aspects of managing it.
  • The whole solution was a single point of failure and many of its sub-components were singe points of failure.
  • This solution was 7 years old, and had been grown organically, one JBOD at a time.  This had a few pitfalls:
    • JBODs were daisy chained off each other.  Which meant that while you added more capacity, and spindles, the throughput was ultimately limited to 12Gbps for each chain (SAS x4 3G).  We only had two chains for the md1000’s and one chain / JBOD for the md1200.
    • The JBODs were carved up into independent LUN’s, which from CommVaults view was fine (awesome SW), but it left potential IOPS on the table.  So as we added JBODS the IOPS didn’t linearly increase per say.  Sure the “aggregate” IOPS increased, but a single job, is now as limited to the speed of a 15 disk RAID 6.  Instead of the potential of say a 60 disk RAID 60.
  • The disks at the time were fast (for SATA drives that is) but compared to modern NL-SAS drives, much lower throughput capability and density.
  • The PCI bus and the FSB (this server still had a FSB) was overwhelmed.  Remember, this was doing tape AND disk copies.  I know a lot of less seasoned folks don’t think its easy to overwhelmed a PCI bus, but that’s not actually true (more on that later) even more so when you’re PCI bus is version 1.x.
  • This solution consumed a TON or rack space, each JBOD was 3U and we had 7 of them (md1200 was 2U).  And each drive was only 1TB, so best case with 15 disk RAID 6, we were looking at 13TB usable   By today’s standards, this TB per RU is terrible even for tier 1 storage, let alone backup.
  • We were using a RAID 6 instead of 10.  I know what some of you are thinking, its backup, so why would you use RAID 10?    Backup’s are probably every much as disk intensive as your production workloads and likely more so.  On top of that, we force them to do all their work in what’s typically a very constrained window of time.  RAID 6 while great for sequential / random reads, does horribly at writes in comparison to a RAID 10 (I’m exuding fancy file systems like CASL for this generalization).  Unless you’re running one backup at a time, you’re likely throwing tons of  parallel writes at this storage.  And while each stream may be sequential in nature, the aggregation of them is random in appearance to the storage.  At the end of the day, disk is cheap, and cheap disk (NL-SAS) is even cheaper so splurge on RAID 10.
    • This was also compounded by the point I made above about one LUN per JBOD
  • It was locked into what IMO is a less than ideal Dell (LSI) RAID card.  Again, I know what some of you are thinking.  “HW” RAID is SO much better than SW RAID.  Its a common and pervasive myth.  EMC, NetApp, HP, IBM, etc. are all simple x86 servers with a really fancy SW RAID.  SW RAID is fine, so long as the SW that’s doing the RAID is good.  In fact, SW RAID is not only fine, in many cases, it’s FAR better than a HW RAID card.    Now, I’m not saying LSI sucks at RAID, they’re fine cards, but SW has come a long way, and I really see them as the preferred solution over HW RAID.  I’m not going to go into the WHY’s in this post, but if you’re curious, do some research of your own, until I have time to write another “vs” article.
  • Using Dell, HP, IBM, etc for bulk storage is EXPENSIVE compared to what I’ll call 2nd tier solutions.  Think of this as somewhere in-between Dell and home brewing your own solution.
    • Add on to this, manufactures only want you running “their” disks in “their” JBODs.  Which means not only are you stuck paying a lot for a false sense of security, you’re also incredibly limited to what options you have for your storage.
  • All the HW was approaching EOL.

There’s probably a few more reason why this solution was no longer ideal, but you get the point.  The reality is, our backup solution was changing, and it was a perfect time to re-think our storage.  In part 2, we’ll get into what we thought we wanted, what we needed, and what we had budget for.

Thinking out loud: CLI vs. GUI, a pointless debate

I see this come up occasionally and I really don’t get why its always an “x is better than y”.  A lot of times folks are talking about what works best for them in their world, and for all intents and purposes stating that if its best for them, its best for all.  I’d like to challenge this reasoning and also challenge the notion that both a CLI and GUI have their place.

The GUI: It’s pretty and functional

I’m not sure where all this hate on GUI’s comes from, but I can tell you that it stereotypically comes from either the *NIX or network engineer crowds.  Thinking about a GUI from their point of view, I can completely understand why they’re not huge fans of a GUI.  The folks in these crowds spend most of their day buried in a CLI, for the simple reason that no GOOD GUI exists for most of what they’re doing.  KDE? GNOME? some crappy web interface (this ones a toss up on depending on who’s interface)?  I wouldn’t want to live day in and day out with half of the GUI’s they’re using either.  So what’s the problem with their GUI’s?

  • In my admittedly limited experience with Linux GUI’s, most of them offer limited functionally.  They can do the basics, but when you really want to get in and configure something, it almost always leads to firing up a console.  If everything in your ecosystem relies on you ending up in the CLI, pretty soon, you’re going to skip the GUI.  Even with Apple’s OS-X I’ve found this to be the case.  I wanted to change my mouse wheel’s scrolling direction, that required going into the CLI (seriously).
  • Their GUI’s are not always intuitive, and honestly this is supposed to be one of the value adds of a good GUI.
  • Their GUI’s sometimes implement sloppy / bad configurations.  One area I recall hearing this a lot with is the Cisco ASA, but i’m sure this occurs on more solutions than Cisco’s firewalls.
  • Not so much a problem of their GUI, but there’s just a negative stigma with anyone in these crowds using a GUI.  To paraphrase, if use a GUI, you’re not a good/real admin (load of crap BTW).
  • Again, not so much a problem of the GUI, but likely do to the above statement, there really isn’t a lot of  “how to’s” for using a GUI.  You ultimately end up firing up the CLI, because that’s the way things are documented.
  • A lot of GUI’s don’t do bulk or automated administration well.  I think this is pretty true across the board.  That said, I have used purpose built automation tools, but they were for very specific use cases.  98% of the time, I go to a CLI for automation / bulk tasks.
  • Depending on the task you’re doing, GUI’s can be slower IF you’re familiar with the CLI (commands and proper syntax).

Clearly there’s a lot of cons for the GUI, but I think a lot of them tend to pertain more to *NIX and network engineers. its not that a GUI by nature is bad, its just that their GUI is bad.  Incase you haven’t guessed, I’m a Windows admin, and with that, I enjoy an OS that was specifically designed with a great GUI in mind (even windows 8 / 2012).  This is really the key, if the GUI is good, then your desire to use a GUI instead of the CLI will be increased.  We went over what the problems are with a GUI, so why not go over what’s good about a GOOD GUI?

  • They’re easy and intuitive to use.
  • They present data in a way thats easier to analyze visually.  This is important and I really think its overlooked a lot of times.  Sure you CAN analyze data in the CLI, but the term “a picture is worth a thousand words” isn’t a hollow phrase.
  • When they’re designed well, single task items (non-bulk) are quick and easy.  its debatable whether a CLI is quicker and there are A LOT of factors that go into that.
  • GUI’s by nature allow multi-tasking.  Now  I get that you can technically have multiple putty windows open (thats a GUI BTW), but its not quit the same as an application interactively speaking.
  • They’re pretty.  I know that’s not exactly a great reason, but let’s all be honest, a pretty GUI is a lot nicer than a cold dark CLI.
  • Options and parameters tend to be fully displayed making it easier to see all the possible options and requirements. (I know Powershell ISE now offers this, very sweet.)
  • Some newer GUI’s like MS’s SQL and Exchange management consoles even show you the CLI  commands so you can use it as a foundation for scripts.  Meaning, GUI’s can help you to learn the CLI.
  • Certain highly complex tasks are simplified by using a GUI.  Things that may have taken multiple commands or digging deep into an object can be accomplished with a few clicks.

The CLI: Lean, mean automation machine

Like I said, I’m a Windows guy, but at the same time, I have a healthy appreciation and love for the CLI.  I’m lazy by nature, and HATE routine tasks. Speaking of hate, I see a lot of windows admins dolling out their equal amount of hate on the CLI, or more specifically being forced into Powershell.    I get it, after all, it wasn’t until Powershell came out, that Microsoft actually had a decent CLI / Scripting language.  There was vbscript, it worked, but man was it a lot of work (and not really interactive).  None the less, Powershell has been out for what’s got to be close to ten years now and there’s still pushback from some of the windows crowd.  There’s no need for the resistance (its futile), your life will be better with a CLI, AND the GUI you know and love.  So let’s go into why you’re still not using a CLI, and why its all a bunch of nonsense..  Also, this is going to be targeted at windows admins mostly and windows admins that are still avoiding the CLI.  Its kind of redundant to go over the pros / cons of CLI because they’re mostly the inverse of what I mentioned about a GUI.

  • Its a PITA to learn and a GUI is easy, or at least that’s what you tell yourself.  The reality, Powershell is EASY to use and learn.
  • You don’t have time to learn how to script.  After all, you could spend 30 minutes clicking, or spend 4 hours trying to write a script.  Sure, the first time you attempt to write a new script, it might take you 16 hours, but the second script might take you 4 hours, and the third script might take you 5 minutes.  The more you familiarize yourself with all the syntax, commands, functions, methods, etc. the easier it will be to solve the next problem.
  • You’re afraid that your script might purge 500 mailboxes in a single key tap.  This is absolutely a possibility, and you know what, mistakes happen (even GOOD admins make really dumb mistakes).  But that’s why you start out small, and learn how to target.  That’s also why you have backup’s 🙂
  • Your afraid, you’ll automate yourself out of a job.  That’s not likely to happen.  Someone still needs to maintain the automation logic (its never perfect, OR things change), and it frees you up to do more interesting (fun) things.
  • Once you learn one CLI, a lot of its transferable to other CLI’s.  For me, going from Powershell to T-SQL was actually pretty easy.  I’m not a pro with T-SQL, but a lot of the concepts were similar.  I also found that as I learned how to do things in T-SQL, it helped me with other problems in Powershell (see how that works).  I don’t have a lot of experience with *NIX CLI’s, but I’d be willing to bet I could figure it out.

I probably did a bad job of being non-biased, but I really did try.  I truly see a place for both administration methods in IT and I hope you do to.

Thinking out loud: Core vs. Socket Licensing

I recall when Microsoft changed their SQL licensing model from per socket to per core.  It was not a well received model and it certainly wasn’t following the industry standard.  I’d even say it ranked up there with VMware vRAM tax debacle.  The big difference being that VMware reversed their licensing model.  To this day, other than perhaps Oracle, I don’t know of any other product / manufacture using this model.  And you know what, its a shame.

I bet you weren’t expecting that were you?  No I don’t own stock in Microsoft, and yes I do like per socket (at times), but I honestly feel like in more cases than not, per core is a better licensing model for everyone.

I didn’t always feel this way, like most of you I was pretty ticked off when Microsoft changed to this model.  After all, the cores per socket is getting denser and denser, and now when we’re finally start getting to an incredible amount of cores per socket, here’s Microsoft (and only Microsoft BTW) changing their model.  Clearly it must be driven by greed.

No, I don’t think it’s greed, in fact I think its an evolutionary model that was needed for the good of us the consumers and for the manufactures.  Let’s go into why I fell this way.

To start, I’m going to use my companies environment as an example.  We have what I would personally consider a respectably sized environment.

In my production site, I currently have the following setup:

  • 4 Dell R720’s with the v2 processors (dual socket, 12 cores per, 768GB of RAM)
    • General purpose VM’s
  • 10 Dell R720’s with the v1 processors (dual socket, 8 cores per, 384GB of RAM)
    • Web infrastructure
  • 7 Dell R820’s with v1 processors (quad socket, 8 cores per, 768GB of RAM)
    • Microsoft SQL only

In my sister companies site we have the following setup:

  • 3 Dell r710’s (dual socket, quad core, 256GB of RAM)
    • Remote office setup

In my DR site, we have the following setup (there are more servers, they’re just not VMware yet).

  • 4 Dell r710’s (dual socket, quad core, 256GB of RAM)
    • DR servers (more to come).

As you can see, beefy hardware and pretty wide array of different configurations for different uses.  All of these have a per socket licensing model, with the one caveat, that my SQL cluster is also licensed per core.

What got me thinking about this whole licensing model to begin with, is that I’d like to refresh my DR site, and take an opportunity to re-provision our prod site in a more holistic way.  The biggest problem here is SQL because its per core, and everything else is per socket.  Which really limits my options to the following:

  1. Have two separate clusters, one for SQL and one for everything else.  Then I can design my HW and licensing as it makes the most sense with SQL, and also maximize my VM’s per socket with my other cluster.
  2. Take a density hit on the number of VM’s per socket, and run a single cluster with quad socket 8 core procs.
  3. Have my SQL VM’s take a clock rate hit (ie make them slower) and adopt a dual socket 16 core setup.

So how does any of this have to do with this whole core vs. socket?  It’s pretty simple actually, and if you don’t get it, go back and re-read my current setup, and go read the design questions I’m juggling with.  Really think about it…

Let me highlight a few thing about my current infrastructure.  I’m going to attack the principle of per socket, and why its a futile licensing model (even if you don’t run anything else that’s licensed per core).

Every server in the infrastructure I laid out above is licensed on a flat per socket model, regardless of the number of cores.  That means my dual socket quad core procs, costs me the same amount to license as my dual socket 12 core procs.  They’re doing less work, not able to support the same workloads, yet they cost me the same amount.  Now, lets look at it from the manufactures view, at the time of pricing, my quad core proc was a pretty fair deal, but now as cores per socket have increased, their revenue potential decreases.  Depending on which side you’re on, someone is losing out.

Here’s another example of how per socket isn’t fair for us.  Remember my SQL hosts, quad socket 8 core?  Remember how I was saying I was thinking about dual socket 16 core procs, but at the expense of  a really slow clock rate.  How is that fair for me?  The reality is, I have the same number of cores, yet if I design my solution based on performance, it costs me twice as much in licensing, compared to a model that’s mostly about density.

I’d like to share a final example, of why I think per socket, isn’t relevant any longer.  This should hit home a lot more with SMB’s and those that try to convince SMB to go virtual.  We have a sister company that we manage, and they have all of about 20 servers.  That doesn’t mean virtualaztion isn’t a good fit for them, and why should they be forced into using something less optimal like Hyper-V.  Yet, a simple 3 host dual socket config for them, would cost an arm and a leg.  The reason being, VMware is charging them the same price that they’d charge a large enterprise.  You know what would level the playing field?  Per core instead of per socket.

In my opinion, per core would be a win for both the consumers and manufacture.  It provides the most flexibility, the most fair pricing, and it doesn’t force you into building your environment based on simply maximizing your licensing ROI (you do take application performance into consideration, right?)

Finally, if we were to adopt a per core model, there’s one thing I would insist that manufactures do to be fair.  I’m calling Microsoft out on this, because I think its very underhanded of them put this in their EULA.  Modern BIOS’s have the ability to limit the number of cores presented to the OS.  You can effectively turn a 12 core proc into a quad core proc if you want.  Microsoft, specifically doesn’t allow a company to use this feature, and still requires that all physical installed cores be licensed.  This is so wrong on their part, its not fair, its not right, and the reality is, it just makes this model look worse than it really is.  So with that being said, if a manufacture were to switch to this model, I’d implore that they utilize a few different options to allow me to buy more physical cores than i’m licensed for, but still be compliant.

  1. Let me use BIOS feature to limit the number of cores present to the OS.  Do you really think I’m going to shutdown all my servers before an audit and lower their cores?  Really what good would that do?  If my server needs 8 cores, it needs 8 cores, lowering it to 4 cores for the audit would likely have a devastating effect on my business more so then properly licensing the server.
  2. Design your product so that with a simple license key, or license augmentation, you can logically go from “x” cores to “y” cores.  Meaning you self restrict the number of physical cores used in a system.  If I have a dual socket 12 core proc, and i’m only licensed for 8 cores, only utilize 8/24 cores.
    1. Heck, take NUMA minimums into consideration.  Meaning, if I have a quad socket, 4 cores at a minimum are required, and if I have a dual socket, 2 cores at a minimum are required.

What are your thoughts?