Category Archives: Uncategorized

VMware DRS default memory load balancing

In general, I find DRS does a fantastic job of keeping VM’s happy.  However, in the past, I’ve seen a number of unexplained situations where hosts in a cluster run out of memory when a VM goes from idle, to busy all of a sudden.  In fact, this happened three times to us in a dedicated cluster for our SQL VM’s.  What was unexplained wasn’t how the host ran out of memory, that was pretty easy to track down.  What was unexplained was why DRS didn’t move any of the VM’s.  In this cluster we had a lot of VM’s with restrictive rules, but there were plenty of VM’s on this host that could have been easily relocated to prevent the over-committing of memory.

We were so perplexed by the situation that we called VMware support.  We explained the situation to them, and asked what we could do to mitigate it from happening.  I had the idea of using memory reservations, and that maybe DRS wouldn’t move VM’s to a host that didn’t have enough memory to back the configured vRAM.  Turns out out memory reservations in VMware aren’t exactly “reservations” per se.  They only reserve memory once it’s active.  So even if you say “I want to reserve 100% of the configured memory” VMware doesn’t actually do that.  It’s more of a “once I use it, I don’t give it back” functionality. As an aside, they have a new advanced setting in 6.5 that does pre-allocate the memory reservation. Given the confusion around memory reservations, I’d love to see that option exposed in the GUI with a brief comment about how it works.

Regardless, it wasn’t a mitigating solution recommended by support. In fact they suspected it would make things worse. Anyway, after looking through our logs, and chatting with their colleagues, the tech basically came to the conclusion that our cluster was overloaded and we needed more memory.  At the time, I was a little skeptical, but I had to concede our cluster was pretty full.  I didn’t think it was loaded so bad that DRS couldn’t shuffle things around, but we took the answer as is and came up with a different solution. We ended up disabling DRS since it was causing more issues than it was solving.

Fast forward to this week, we had a completely different cluster with a host on the verge of running out of memory.  It was at 96% all the while other nodes were chilling at ~50-60%.  Unlike the above scenario, there was TON of memory to balance things out, and there weren’t super restrictive DRS rules either.  I was poking around DRS, as I recall there being some DRS enhancements in 6.5.  I noticed this one setting.  “Load balance based on consumed memory instead of active memory”.  And so the light bulb went off.  I Googled the setting to make sure I understood what it meant and came across this article.  It did exactly what I thought it was going to, which honestly made me wonder why it’s not the default.  I ALSO noticed in this article that we probably could have influenced DRS in our SQL cluster to mitigate the overcommit.  A bit of a bummer, but life goes on.

In closing, I think this will be our default tweak whenever we setup a new cluster.  Within a few minute of enabling that feature,  I watched our host that was consuming 96% of it’s memory, vacated a number of VM’s and things looked a lot healthier.  I’m not suggesting that you should do this, but I might suggest that you consider it if you’re the type of shop that doesn’t like to overcommit memory.

Powershell Scripting: Installing SQL / Setting up AlwaysOn Availability Groups

Introduction:

We setup a decent number of SQL servers every year.  Most of the time we’re migrating to a newer OS + SQL combo, but sometimes it’s for a new product.  As you may have read in my virtualizing SQL post, one of the pros to virtualizing SQL, was having this ability. Having seen the ameba affect virtualization created for other systems, I knew this would soon happen with SQL as well.  As such, I started developing a bunch of script blocks to automate setting up SQL a few years ago.  Over that time, I’ve added more and more functionality, with the finally feature being able to setup a SQL AAG via Powershell.  The only thing I don’t have fully automated, is an AlwaysOn Failover Cluster, but that’s because we don’t deploy them typically.

What you’ll see in this post, is how we go from a bare OS, to a functional SQL server.  Most of it via Powershell, some of it manual.

Also, if you’re reading this in an RSS tool, all my script text is going to be messed up unless you view it on the site.  At the very bottom i’ve included a txt file of my script that you can download, which will probably be a lot easier to read.

Special Thanks:

Before going too far ahead, I want to make a special call out to two folks that have helped me accomplish this as a SysAdmin (non-DBA).  They didn’t go out of their way to help me specifically, but their blogs posts are gold.

Brent Ozar if you haven’t heard of him, is my go to blog for SQL.  I’ve been following him for a long time (Quest days).  While I can’t say I always agree with his philosophies on virtualizing SQL / SAN, when it comes to native SQL stuff, he (and his all-star team) are my go to source of info.  I suspect if you’re googling anything SQL related, you’ve probably seen his blog show up in the top section of results.  The one thing I’ve always relied on as a SysAdmin is his server setup checklist https://www.brentozar.com/archive/2008/03/sql-server-2005-setup-checklist-part-1-before-the-install/ and here https://www.brentozar.com/archive/2008/03/sql-server-2005-setup-checklist-part-2-after-the-install/

Newer to my SQL blog roll, and the person’s blog post who ultimately helped me setup AlwaysOn Availability Group is Edwin M Sarmiento.  I signed up for his failover clustering training, which if you’re new to clustering is worth checking out.  Anyway, Googling for “setup AlwaysOn Availability Groups Powershell” led me to his post here https://www.mssqltips.com/sqlservertip/2635/enable-sql-server-2012-alwayson-availability-groups-using-windows-powershell/.  What you’ll see in my post is a lot of copies from his post.  I want to be clear, most of this is Edwins work, not mine (AAG wise).  What you’ll see in my post, is I’ve modified his script commands to merge into my greater script and I also added some bits to make setting up a SQL 2017 AAG with seeding instead of the former backup / restore option.

Prerequisites:

Like anything, you’ve got to have the prerequisites figured out before you start, so let’s go over what you want to figure out before you get to the scripting part.

  1. We’re going to assume you already have two Windows servers deployed.
    1. You’ll have your local administrator’s setups and all that good standard OS provisioning stuff that you’re all doing… right?
  2. For clustered SQL servers only:
  3. For clustered SQL servers only, you’ll want the following completed.
    1. If you’ve never deployed a cluster before, you’ll want an OU in AD setup for your SQL clusters. It could be the same OU as your non-clustered SQL servers to, but I would at least make sure you have a dedicated OU in AD for SQL servers in general.
    2. You’ll want an active directory group to that will be used to delegate “manage computer” rights to for that SQL server OU.
    3. When you have that group created, you’ll want to make sure you delegate that AD group “full control” rights for computer objects only. You can accomplish this by running the delegation of rights wizard on the SQL server OU, and assigning those rights to the AD group you created.  You’ll only need to do this once.  Once it’s done, any SQL server you setup in this OU will have all the rights it needs.
    4. You’ll want to reserve at least two additional IP addresses besides what you have for the SQL nodes. One for the (CCR) Cluster Core Resource and one for the AAGL (AlwaysOn Availability Group Listener).
    5. You’ll need a good name for your CCR and AAGL, I’ll provide an example that I use further down.
    6. You’ll want a server dedicated for the use as a file share witness. You’ll want to make sure this server is as independent from the cluster nodes you’re setting up as possible. For use we have our file share on a separate VMware cluster and a completely separate SAN.
      1. You’ll want to make sure there is a generic share already setup so it can be consumed by all your clusters.
    7. If you’re virtualizing SQL Clusters, here are my general recommendations.
      1. Disable any auto-migration solutions (VMware’s DRS for example). You will have failovers, and no you won’t be able to prevent it despite what VMware tells you.
        1. For VMware specifically, I set DRS for cluster nodes to partially automated. If you have a cluster of nothing but cluster nodes, just disable DRS.  If you have a cluster of non-clustered and clustered nodes, set this on a per VM basis.
        2. If you do have DRS enabled, I recommend setting up anti-affinity rules for the cluster nodes. In a power on situation, this prevents VMware from powering on a two SQL nodes on the same host.
      2. Disable HA for clustered nodes. Again, if it’s a cluster dedicated, don’t waste your time with HA, and simply disable it.  If it’s a mixed cluster, disable it on a per VM basis.
      3. DO NOT OVERSUBSCRIBE MEMORY. I’m not telling you to set reservations, I’m telling you to make sure there’s enough memory to back every running VM in that cluster.  You shouldn’t need reservations because you’re not going to be dumb enough to oversubscribe memory.  If you MUST mix SQL with generic VM’s, then 100% reserve the memory.  In addition, there is a new advanced feature in ESXi 6.5 that allows you to pre-reserve all memory at the VM power on.  My recommendation is use this.  I’ve had SQL suffer from memory ballooning, it’s not pretty, avoid at all costs.
      4. I don’t recall the setting name off hand, but it’s in Frank’s new vSphere 6.5 deep dive book. Go pick it up, you’ll need it if you care about running SQL well in VMware.
      5. Do NOT enable hot add for memory or CPU in the VM. With VMware this disables virtual NUMA.  You have a cluster, if you need to change resources around, failover and work on one node at a time.
  4. Create new AD accounts. At a minimum, you’ll want one for the SQL agent and then one for all other SQL Services.
  5. You’ll want to setup your backup share (if you don’t already) and make sure the Agent and Service accounts have access to that location.
  6. You’ll want a GPO for your SQL servers OU that creates a firewall rule to allow port 1433 and 5022.
  7. You’ll want to deploy all your disks / controllers for SQL. Since we use VMware at ASI, we have a very standard configuration.
    1. OS + scratch / generic disk goes on controller 0 and SCSI port 0 and 1 respectively.
    2. DB + Index go on controller 1 and SCSI port 0 and 1 respectively.
    3. SQL Log goes on controller 2 and SCSI port 0
    4. TempLog and TempDB go on SCSI controller 0 and 1 respectively.
  8. You’ll want a SQL config file created. I’ll touch on this briefly further down, but it’s basically an answer file for an unattended SQL install.

Once you’ve completed the above steps, reboot your SQL servers for the changes (GPO’s for example) and rights to take effect immediately.  I like to reboot twice as sometimes

I know that’s a lot to take in, that’s a lot to comb through, but it’s all to make the deployment process smooth.  Don’t skip ahead if you don’t have all this setup.  I’ve deployed probably 150 SQL servers by now, and these prerequisites help it to go smooth.

Prepping the script:

Any good script is parameters based.  For the most part, you want to have all your possible variables defined at the top, so that the actual scripting is generic below.  It makes it easier to customize for each environment.

Keep in mind, my “script” isn’t so much a script as a bunch of script blocks.  You’re not going to simply copy and paste the whole thing and walk away.  I’m not there yet 😊

Here is my parameters block.  It’s broken into two main sections.  Static parameters, which are parameters you’ll manually change by hand, and dynamic parameters which are the result of merging values or pulling data from the computer system.

Let’s break down what you’re going to be asked.

  1. The first couple of sections go over a copying functions, ISO’s, and configuration files. You’ll need to provide a source and destination for these files.  I like to copy them locally.  You’ll see the K: drive mentioned in my examples as a destination.
  2. You’ll need that SQL config file that we’ll talk about creating right now.
    1. When I created mine, I walked through a SQL install, checked all the boxes I wanted, configured all the file paths, all the rights, etc. Before clicking install, there’s a location in the final page where you can grab a config file.  Do that, once you have the config file, make the following changes.  You can read more about it here https://docs.microsoft.com/en-us/sql/database-engine/install-windows/install-sql-server-using-a-configuration-file .
      1. I change the refreences for your SQL agent account and SQL service accounts to something generic like below so we use a script to do a find / replace. In my script, these are the generic strings I use.
        1. Change the SQL Agent account name to domain\SQLAGENTACCOUNTTOCHANGE
        2. Change the SQL Service account name to

Domain\ SQLSERVICEACCOUNTTOCHANG

  1. You’ll want your SQL service account names and their domain names. They’ll be used for various things in our script, including the above example of updating your SQL answer file.
  2. For the clustering bit, you’ll want all the names, IP’s etc. I also have test db names that we’ll use for setting up AAG’s.  The last SQL server I deployed had four AAG’s, hence why there is four of everything.
    1. You can see my generic naming convention below. I blogged about those too.
  3. FSW (File share witness) is only needed if you are setting up a cluster. You’ll notice I like to use the cluster name (CCR) as the folder for storing the FSW config.  You’ll also see I have different FSW depending on which cluster and which environment I’m setting up.
  4. Similarly, I do the same for our backup share location. In this case, I’m using node 1 as the final folder name, but your needs may vary.
  5. The rest are what I call dynamic parameters, they pull information either using previous parameters or they pull information from the system its self. There is only two things worth of note.  I ask you to fill in the password for the service account and agent.  This will allow us to install SQL unattended.

#######################################################################################################

#Parameters



#File Path to local GPO function.

$LocalGPOFunctionName = "Add-ECSLocalGPOUserRightAssignment.ps1"

$LocalGPOFunctionFilePath = "\\a1-file-04\SysAdminStuff\SQL Servers\Functions" + "\" + $LocalGPOFunctionName



#File Path to SQL config file

$ConfigFileName = "ConfigurationFile_Template_2017plus.ini"

$ConfigFileSource = "\\a1-file-04\SysAdminStuff\SQL Servers\" + $ConfigFileName

$ConfigFileDestination = "K:\" + $ConfigFileName



#File path to SSMS

$SSMSFileName = "SSMS-Setup-ENU.exe"

$SSMSFileSource = "\\YOURDOMAINHERE.local\Servers\ISG Software\Microsoft\SQL Server Management Studio\17.4\" + $SSMSFileName

$SSMSFileDestination = "K:\" + $SSMSFileName



#File Path to SQL ISO of your choice

$ISOFileName = "SW_DVD9_NTRL_SQL_Svr_Ent_Core_2017_64Bit_English_OEM_VL_X21-56995.ISO"

$ISOFileSource = "\\YOURDOMAINHERE.local\Servers\ISG Software\Microsoft\SQL Enterprise 2017\" + $ISOFileName

$ISOFileDestination = "K:\" + $ISOFileName



#Service Account

$sqlserviceuserNoDomain = "usrsqlpc15svc"

$sqlserviceuser = "YOURDOMAINHERE\" + $sqlserviceuserNoDomain

$sqlagentuserNoDomain = "usrsqlpc15agt"

$sqlagentuser = "YOURDOMAINHERE\" + $sqlagentuserNoDomain





##Clustering

$node1 = "a1-sqlpcn1-15"

$node2 = "a1-sqlpcn2-15"

$clusterip = "192.168.1.19"

$ClusterCNO = "a1-sqlpc-15"

$DB1Name = "Test1"

$DB2Name = "Test2"

$DB3Name = "Test3"

$DB4Name = "Test4"

$SQLAAGEndPointName = "Hadr_endpoint"

$AAG1Name = "a1-sqlpcdg1-15"

$AAG2Name = "a1-sqlpcdg2-15"

$AAG3Name = "a1-sqlpcdg3-15"

$AAG4Name = "a1-sqlpcdg4-15"

$AAG1IP = "192.168.1.20/255.255.255.0"

$AAG2IP = "192.168.1.21/255.255.255.0"

$AAG3IP = "192.168.1.22/255.255.255.0"

$AAG4IP = "192.168.1.23/255.255.255.0"



#File Share Wittness



<# Note 1: FSW is only needed if you're setting up an AAG. Traditional failover clusters, don't need this, instead they need a quorum disk. This script does not cover that. Note 2: There are currently four different file share wittness locations. See below Odd numbered clusters (for example a1-sqluc-01, 03, 05, etc.) go here UAT = \\a1-fswus1-02\Clusters PRD = \\a1-fswps1-02\Clusters Even numbered clusters (for example a1-sqluc-02, 04, 06, etc.) go here UAT = \\a1-fswus2-02\Clusters PRD = \\a1-fswps2-02\Clusters #>

$filesharewitness = "\\a1-fswps1-02\Clusters\$($ClusterCNO)"



#Backup Share



<# This step is only needed if you're setting up an AAG, otherwise the DBA's will do this. Here are the following paths \\YOURDOMAINHERE.local\Backups\SQL Backups 2\DEV \\YOURDOMAINHERE.local\Backups\SQL Backups 2\PRD \\YOURDOMAINHERE.local\Backups\SQL Backups 2\STG \\YOURDOMAINHERE.local\Backups\SQL Backups 2\TST \\YOURDOMAINHERE.local\Backups\SQL Backups 2\UAT #>

$BackupSharePath = "\\YOURDOMAINHERE.local\Backups\SQL Backups 2\PRD\$($node1)"



#DB backup names

$TestDB1FullPath = $BackupSharePath + "\" + $DB1Name + ".bak"

$TestDB2FullPath = $BackupSharePath + "\" + $DB2Name + ".bak"

$TestDB3FullPath = $BackupSharePath + "\" + $DB3Name + ".bak"

$TestDB4FullPath = $BackupSharePath + "\" + $DB4Name + ".bak"

$TestLog1FullPath = $BackupSharePath + "\" + $DB1Name + ".log"

$TestLog2FullPath = $BackupSharePath + "\" + $DB2Name + ".log"

$TestLog3FullPath = $BackupSharePath + "\" + $DB3Name + ".log"

$TestLog4FullPath = $BackupSharePath + "\" + $DB4Name + ".log"





#AAG information



#Dynamic parameters



#These passwords will be used to automate the setup of SQL.

$sqlagentuserpassword = Read-Host -Prompt "Enter your SQL Agent Account Password Here"

While ($sqlagentuserpassword -eq $null -or $sqlagentuserpassword -eq "")

{

$sqlagentuserpassword = Read-Host -Prompt "Enter your SQL Agent Account Password Here"

}



$sqlserviceuserpassword = Read-Host -Prompt "Enter your SQL Service Account Password Here"

While ($sqlserviceuserpassword -eq $null -or $sqlserviceuserpassword -eq "")

{

$sqlserviceuserpassword = Read-Host -Prompt "Enter your SQL Service Account Password Here"

}



#Temp Directory

$TempDirectory = Get-childitem -Path env: | where-object {$_.name -eq "Temp"} | select-object -ExpandProperty value

$LocalGPOFunctionNameTempPath = $TempDirectory + "\" + $LocalGPOFunctionName



#ComputerName

$ComputerName = Get-childitem -path env: | where-object {$_.name -like "ComputerName"} | select-object -expandproperty value



#END Parameters

#######################################################################################################

Setting up the disks:

The next section is all about setting up the disks, partitions, folder structure and finally permissions.  Now this is likely going to be different in your environment, but I did want to show how I tackle it incase you are curious.  We deploy all our VMware VM’s with a consistent disk configuration as I described above.  This allows us to know which disk is used for what purpose and ultimately automate the whole setup.

The basic steps are as follows:

  1. We find all disks that are offline. This is our standard procedure when we setup a server.
  2. Powershell brings them online, and then we loop through them.
    1. Based on which controller, and which SCSI port the disk is located on (using WMI to find out), we set it up in a specific way.
      1. We set it up with a GPT partition.
      2. We setup the volume as NTFS, with a specific drive letter and cluster size depending on the type of volume.
        1. Note, I NOW realize that we should be using a 64k cluster for all SQL files. I wrote this section almost five years ago based on the IO size.  For consistency we’re doing this with existing SQL servers, but will be switching to a 64k cluster for all new systems.
      3. Once we have all the volumes setup, we then proceed to create our folder structure.
      4. Once the folders are in place, we assign the needed NTFS rights to the service accounts. SQL sets up the permissions for the default DB, Log and TempLog/DB, but nothing else.  We have separate disks for index and we also occasionally use a G: drive to store additional DB’s.  So my script provides similar rights.  We also make sure to grant the SQL service accounts read only rights to each drive as well, and also grant it rights to our scratch / generic disk, K:

#######################################################################################################

#Setting up the physical disks



#Get all offline disks, should be any disks you've created

$OfflineDisks = Get-Disk | Where-Object {$_.OperationalStatus -eq "Offline"}



#Set Disk online

$OfflineDisks | Set-Disk -IsOffline:$false



#Disable read only

$OfflineDisks | Set-Disk -IsReadOnly:$false



#Initialize Disks

$OfflineDisks | Initialize-Disk -PartitionStyle GPT



#loop through all disks and configure based on thier location. This location is based on VMware controller card and controller port locations. In VMware you should have the following SCSI configs

#0:0 = C: = OS = disk 1

#0:1 = K: = Misc = disk 2

#1:0 = F: = UserDB1 = disk 3

#1:1 = N: = Index1 = disk 4

#2:0 = J: = UserLog1 = disk 5

#3:0 = V: = TempLog = disk 6

#3:1 = W: = TempDB = disk 7



#Microsoft in their infinite wisdom, continues to change f'ing property values around, so below is the server 2016 specific scsi id



Foreach ($disk in $OfflineDisks)

{

#This will give us the view of which SCSI card and port we're in.

#Note: property "SCSIPort" actually equals SCSI card in WMI, and "SCSITargetId" equals the SCSI port for the card





$WMIDiskInformation = get-wmiobject -Class win32_diskdrive | where-object {$_.DeviceID -like "*$($disk.number)"}





if ($WMIDiskInformation.SCSIPort -eq 2 -and $WMIDiskInformation.SCSITargetId -eq 1)

{

echo "K:"

$disk | New-Partition -UseMaximumSize

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Format-Volume -FileSystem NTFS -AllocationUnitSize 4096 -NewFileSystemLabel "Misc" -Confirm:$false

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Set-Partition -NewDriveLetter k

}



Elseif ($WMIDiskInformation.SCSIPort -eq 3 -and $WMIDiskInformation.SCSITargetId -eq 0)

{

echo "V:"

$disk | New-Partition -UseMaximumSize

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Format-Volume -FileSystem NTFS -AllocationUnitSize 4096 -NewFileSystemLabel "TempLog" -Confirm:$false

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Set-Partition -NewDriveLetter V

}

Elseif ($WMIDiskInformation.SCSIPort -eq 3 -and $WMIDiskInformation.SCSITargetId -eq 1)

{

echo "w:"

$disk | New-Partition -UseMaximumSize

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Format-Volume -FileSystem NTFS -AllocationUnitSize 8192 -NewFileSystemLabel "TempDB" -Confirm:$false

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Set-Partition -NewDriveLetter w

}

Elseif ($WMIDiskInformation.SCSIPort -eq 4 -and $WMIDiskInformation.SCSITargetId -eq 0)

{

echo "F:"

$disk | New-Partition -UseMaximumSize

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Format-Volume -FileSystem NTFS -AllocationUnitSize 8192 -NewFileSystemLabel "UserDB1" -Confirm:$false

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Set-Partition -NewDriveLetter f

}

Elseif ($WMIDiskInformation.SCSIPort -eq 4 -and $WMIDiskInformation.SCSITargetId -eq 1)

{

echo "N:"

$disk | New-Partition -UseMaximumSize

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Format-Volume -FileSystem NTFS -AllocationUnitSize 8192 -NewFileSystemLabel "Index1" -Confirm:$false

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Set-Partition -NewDriveLetter n

}

Elseif ($WMIDiskInformation.SCSIPort -eq 5 -and $WMIDiskInformation.SCSITargetId -eq 0)

{

echo "J:"

$disk | New-Partition -UseMaximumSize

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Format-Volume -FileSystem NTFS -AllocationUnitSize 4096 -NewFileSystemLabel "UserLog1" -Confirm:$false

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Set-Partition -NewDriveLetter j

}

Elseif ($WMIDiskInformation.SCSIPort -eq 4 -and $WMIDiskInformation.SCSITargetId -eq 2)

{

echo "G:"

$disk | New-Partition -UseMaximumSize

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Format-Volume -FileSystem NTFS -AllocationUnitSize 8192 -NewFileSystemLabel "UserDB2" -Confirm:$false

$disk | Get-Partition | Where-Object {$_.type -eq "Basic"} | Set-Partition -NewDriveLetter G

}

}



#END Setting up the physical disks

#######################################################################################################



#######################################################################################################

#Setting up the folder paths and permissions



# Note 1: If you have more than the standard F, J, K, N, V and W drives, you'll need to mananually setup the folder structure and permsissions

# Note 2: As part of the SQL install, SQL will configure the modify permissions for the default DB, Log, TempLog and TempDB folders, which is why I don't configure them.

# NOte 3: Keep an eye on the folder permissions, you shouldn't see a lot of errors, if you do, something went wrong.

# Note 4: These are based on default disk locations and default disk requests. if you get a request for a G: drive for example, we don't have that scripted at the moment.



#Create Folder Stucture



#F: Drive

If ((Test-Path "F:\MSSQLDB\Data") -eq $false)

{

New-Item -Path "F:\MSSQLDB\Data" -ItemType Container

}

If ((Test-Path "F:\OLAP\Data") -eq $false)

{

New-Item -Path "F:\OLAP\Data" -ItemType Container

}



#J: Drive

If ((Test-Path "J:\MSSQLDB\Log") -eq $false)

{

New-Item -Path "J:\MSSQLDB\Log" -ItemType Container

}

If ((Test-Path "J:\OLAP\Log") -eq $false)

{

New-Item -Path "J:\OLAP\Log" -ItemType Container

}



#K: Drive

If ((Test-Path "K:\Config") -eq $false)

{

New-Item -Path "K:\Config" -ItemType Container

}

If ((Test-Path "K:\MSSQL") -eq $false)

{

New-Item -Path "K:\MSSQL" -ItemType Container

}

If ((Test-Path "K:\MSSQLDB") -eq $false)

{

New-Item -Path "K:\MSSQLDB" -ItemType Container

}

If ((Test-Path "K:\OLAP\config") -eq $false)

{

New-Item -Path "K:\OLAP\config" -ItemType Container

}

If ((Test-Path "K:\Support") -eq $false)

{

New-Item -Path "K:\Support" -ItemType Container

}



#N: Drive

If ((Test-Path "N:\MSSQLDB\Index") -eq $false)

{

New-Item -Path "N:\MSSQLDB\Index" -ItemType Container

}



#V: Drive

If ((Test-Path "V:\MSSQLDB\Log") -eq $false)

{

New-Item -Path "V:\MSSQLDB\Log" -ItemType Container

}



#W: Drive

If ((Test-Path "W:\MSSQLDB\Data") -eq $false)

{

New-Item -Path "W:\MSSQLDB\Data" -ItemType Container

}

If ((Test-Path "W:\OLAP\Temp") -eq $false)

{

New-Item -Path "W:\OLAP\Temp" -ItemType Container

}



#Adding a catch for the G: drive

$GDRive = Get-PSDrive | Where-Object {$_.root -like "G:*"}



If ($GDRive -ne $null)

{

If ((Test-Path "G:\MSSQLDB\Data") -eq $false)

{

New-Item -Path "G:\MSSQLDB\Data" -ItemType Container

}

If ((Test-Path "G:\OLAP\Data") -eq $false)

{

New-Item -Path "G:\OLAP\Data" -ItemType Container

}



}



#Set Permissions



#F: Drive

Start-Process -FilePath "icacls.exe" -ArgumentList "f:\ /remove Everyone /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "f:\ /remove ""Creator Owner"" /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "f:\ /remove BUILTIN\Users /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "f: /grant:rx ""$sqlserviceuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "f: /grant:rx ""$sqlagentuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait



#J: Drive

Start-Process -FilePath "icacls.exe" -ArgumentList "j:\ /remove Everyone /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "j:\ /remove ""Creator Owner"" /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "j:\ /remove BUILTIN\Users /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "j: /grant:rx ""$sqlserviceuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "j: /grant:rx ""$sqlagentuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait



#K: Drive

Start-Process -FilePath "icacls.exe" -ArgumentList "k:\ /remove Everyone /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "k:\ /remove ""Creator Owner"" /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "k:\ /remove BUILTIN\Users /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "k: /grant:rx ""$sqlserviceuser"":(OI)(CI)(M) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "k: /grant:rx ""$sqlagentuser"":(OI)(CI)(M) /C" -NoNewWindow -Wait



#N: Drive

Start-Process -FilePath "icacls.exe" -ArgumentList "n:\ /remove Everyone /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "n:\ /remove ""Creator Owner"" /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "n:\ /remove BUILTIN\Users /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "n: /grant:rx ""$sqlserviceuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "n: /grant:rx ""$sqlagentuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "N:\MSSQLDB\Index /grant:rx ""$sqlserviceuser"":(OI)(CI)(F) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "N:\MSSQLDB\Index /grant:rx ""$sqlagentuser"":(OI)(CI)(F) /C" -NoNewWindow -Wait



#V: Drive

Start-Process -FilePath "icacls.exe" -ArgumentList "v:\ /remove Everyone /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "v:\ /remove ""Creator Owner"" /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "v:\ /remove BUILTIN\Users /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "v: /grant:rx ""$sqlserviceuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "v: /grant:rx ""$sqlagentuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait



#W: Drive

Start-Process -FilePath "icacls.exe" -ArgumentList "w:\ /remove Everyone /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "w:\ /remove ""Creator Owner"" /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "w:\ /remove BUILTIN\Users /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "w: /grant:rx ""$sqlserviceuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "w: /grant:rx ""$sqlagentuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait



#Adding a catch for the G: drive

If ($GDRive -ne $null)

{

Start-Process -FilePath "icacls.exe" -ArgumentList "g:\ /remove Everyone /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "g:\ /remove ""Creator Owner"" /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "g:\ /remove BUILTIN\Users /T" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "g: /grant:rx ""$sqlserviceuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "g: /grant:rx ""$sqlagentuser"":(OI)(CI)(RX) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "g:\MSSQLDB\Data /grant:rx ""$sqlserviceuser"":(OI)(CI)(F) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "g:\MSSQLDB\Data /grant:rx ""$sqlagentuser"":(OI)(CI)(F) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "g:\OLAP\DATA /grant:rx ""$sqlserviceuser"":(OI)(CI)(F) /C" -NoNewWindow -Wait

Start-Process -FilePath "icacls.exe" -ArgumentList "g:\OLAP\DATA /grant:rx ""$sqlagentuser"":(OI)(CI)(F) /C" -NoNewWindow -Wait

}



#END Setting up the folder paths and permissions

#######################################################################################################

At this stage, we have all our disks setup and ready to go.  Next we want to grant the SQL service accounts the “perform volume maintenance tasks” right.  I don’t have my function published yet, still working on honing it a bit more.  However, you can easily do this step manually, or until I release the function, which should be soon.  I did however want to show the code.



<h6>#################################################################################################</h6>


#Add SQL service account and agent to perform volume maintenance tasks local GPO



#NOTE: You can find this setting buried in Computer\Windows\Security\Local Policies\User Rights Assignment



#Copy our function to the temp directory

Copy-item -Path $LocalGPOFunctionFilePath -Destination $TempDirectory -Force -Confirm:$false



#Function Name to load



#Import function

. $LocalGPOFunctionNameTempPath



#Add the service account

Add-ECSLocalGPOUserRightAssignment -UserOrGroup $sqlserviceuser -UserRightAssignment "SeManageVolumePrivilege"



#Add the agent account

Add-ECSLocalGPOUserRightAssignment -UserOrGroup $sqlagentuser -UserRightAssignment "SeManageVolumePrivilege"



#END Add SQL service account and agent to perform volume maitanance tasks local GPO

#######################################################################################################

Time to install SQL.  Basic steps are as follows:

  1. We copy the SQL ISO to the K: drive
  2. We copy the SQL config file template to the K: drive
  3. We copy the SSMS to the K: drive
  4. We modify the config file template to update our service accounts to the correct names for this SQL server.
  5. We mount the ISO and capture the drive letter used (it’s automatically selected).
  6. We install SQL using a few parameters. If it works correctly, you should see the progress without having to answer any questions.
  7. Similarly, we install SSMS. You again should see progress, but not have to answer any questions.

#######################################################################################################

#Install SQL



#Copy the ISO to

Copy-item -Path $ISOFileSource -Destination $ISOFileDestination -Force -Confirm:$false



#Copy the SQL config file

Copy-item -path $ConfigFileSource -Destination $ConfigFileDestination -Force -Container:$false



#Copy the SSMS to the K:

Copy-item -path $SSMSFileSource -Destination $SSMSFileDestination -Force -Container:$false



#Modfiy the config file to update it for our SQL accounts

$ConfigFileContent = (Get-content -Path $ConfigFileDestination).Replace("YOURDOMAINHERE\SQLSERVICEACCOUNTTOCHANGE",$sqlserviceuser) | set-content -Path $ConfigFileDestination

$ConfigFileContent = (Get-content -Path $ConfigFileDestination).Replace("YOURDOMAINHERE\SQLAGENTACCOUNTTOCHANGE",$sqlagentuser) | set-content -Path $ConfigFileDestination



#Mount the ISO

$mountResult = Mount-DiskImage $ISOFileDestination -PassThru

$ISOVolume = $mountResult | Get-Volume



#Define the SQL install path

$SQLInstallPath = $($isovolume.driveletter) + ":\setup.exe"



#Install SQL

Start-Process -FilePath $SQLInstallPath -ArgumentList "/SQLSVCPASSWORD=""$sqlserviceuserpassword"" /AGTSVCPASSWORD=""$sqlagentuserpassword"" /ISSVCPASSWORD=""$sqlserviceuserpassword"" /ASSVCPASSWORD=""$sqlserviceuserpassword"" /ConfigurationFile=$($ConfigFileDestination) /IAcceptSQLServerLicenseTerms" -NoNewWindow -Wait



#Install SSSMS

Start-Process -FilePath $SSMSFileDestination -ArgumentList "/passive /norestart" -NoNewWindow -Wait



#END Install SQL

#######################################################################################################

Now let’s make sure we register the SPN’s for our DBA’s.  You should probably do this as a best practice too.


#######################################################################################################

#Register SPNs



Start-process -FilePath setspn -ArgumentList "-A MSSQLSvc/$($ComputerName).YOURDOMAINHERE.local:1433 $($sqlserviceuserNoDomain) " -NoNewWindow -Wait -PassThru

Start-process -FilePath setspn -ArgumentList "-A MSSQLSvc/$($ComputerName).YOURDOMAINHERE.local $($sqlserviceuserNoDomain) " -NoNewWindow -Wait -PassThru



#END Register SPNs

#######################################################################################################

We need SQL Servers Powershell cmdlets, BUT SQL always installs and outdated version, plus your stuck waiting on CU’s for newer PS modules.  Forget that, let’s pull them straight from the Powershell gallery where they’re kept up to data more frequently.

***Note:  I banged my head on setting powershell command errors because I didn’t have the latest versions, so another reason to keep us the PS gallery version.  Bugs are fixed.

Make sure you say “Y” to the questions asked about importing this modules.

</strong>

#######################################################################################################

#Auto Load SQL PS Module



#The included SQL module isn't kept up to date, so we're going to install the latest module from MS git

Install-Module sqlserver -AllowClobber



#Import the module

Import-Module sqlserver -DisableNameChecking



#End Auto Load SQL PS Module

#######################################################################################################

You should now have all sorts of great SQL PS commands.  At this stage, we’re going to finish setting up a generic SQL server (non-clustered).  There are a few things we do.

  1. We enable SQL contained DB’s
  2. We configure the max memory / min memory based on Brent Ozars generic recommendations.
    1. The only time I’ve had an issue with this, is when I’ve had 4GB or less of memory. Pretty obvious why.
</strong>

#######################################################################################################

#Configure SQL, all SQL servers



#SQL Object

$SQLObject = New-Object Microsoft.SqlServer.Management.Smo.Server



#Getting the server information

$server = New-Object Microsoft.SqlServer.Management.Smo.Server $env:ComputerName



#Configure SQL Memory

$MaxMemory = $($server.PhysicalMemory) - 4096

$MinMemory = $($server.PhysicalMemory) / 2



Invoke-Sqlcmd -Database Master -Query "EXEC sp_configure'Show Advanced Options',1;RECONFIGURE;"

Invoke-Sqlcmd -Database Master -Query "EXEC sp_configure'max server memory (MB)',$MaxMemory;RECONFIGURE;"

Invoke-Sqlcmd -Database Master -Query "EXEC sp_configure'min server memory (MB)',$MinMemory;RECONFIGURE;"



#End Configure SQL, all SQL servers

#######################################################################################################

Now at this point you should have a generic SQL server setup and ready to patch / hand off.  If you want to setup an AAG, continue following along.

To get the AAG setup, we need a cluster first.  So let’s install the clustering feature on both nodes.

</strong>

#######################################################################################################

#Install Cluster feature



Install-WindowsFeature -Name Failover-Clustering –IncludeManagementTools



#END Install Cluster feature

#######################################################################################################

Now that we have the cluster installed, you only need to run this next step from NODE 1.  Whenever there is a task going forward that only needs to be done on a singular node, I do it on node 1.

Here is what we’re doing.

  1. We’re forming a cluster with the two nodes you’ve defined.
  2. We’re setting up the FSW folder, assigning NTFS rights needed, and then modifying the cluster to use a FSW.
  3. We’re setting the cluster same subnet threshold to 10. This is the new default, but wasn’t always the case.  Up to you if you want that value.
  4. Now the next step is to stop the cluster. Listen up here, this is important.  As you’ll see in the code comment, you need to take the newly created CCR / CNO and add it to the same AD group you put the cluster nodes into.  This way the cluster has rights to create SQL listeners.  If you don’t do this, creating the SQL listeners will fail.  Once you’ve done that.  Wait like 15 seconds are so before proceeding to the next step.
  5. Now we’re going to finally start up the cluster nodes. They should inherit the rights you assigned them in step 4.

#######################################################################################################

#Setup and configure cluster (only one node)



#Configure cluster (only run once on eaither node)

New-Cluster –Name $ClusterCNO -Node $node1,$node2 –StaticAddress $clusterip -nostorage



#Configure file share witness folder

If ((Test-Path $filesharewitness) -eq $false)

{

New-Item -Path $filesharewitness -ItemType Container

}



#Set file share permissions

Start-Process -FilePath "icacls.exe" -ArgumentList """$filesharewitness"" /grant ""YOURDOMAINHERE\$ClusterCNO$"":(OI)(CI)(F) /C" -NoNewWindow -Wait



#Set quorum for the cluster

#Note 1: This is for AAG's only, if you are setting up a traditional cluster, you'll need to setup a disk based quorum

Set-ClusterQuorum –NodeAndFileShareMajority $filesharewitness



#Set Cluster Timeout to 10 seconds

(Get-cluster).SameSubnetThreshold=10



#Stop the cluster service so you can give it the correct AD rights

get-cluster -Name $ClusterCNO | Stop-Cluster -Force -Confirm:$false

#!!!!!!!!!!!!!Add the cluster CNO to the same AD group that you added the individual nodes to

#After assigning the rights, start the cluster service

Get-Service -Name ClusSvc -ComputerName $node1 | Start-Service

Get-Service -Name ClusSvc -ComputerName $node2 | Start-Service



#END Setup and configure cluster (only one node)

#######################################################################################################

Run this next step on both nodes again.  This enabled the SQL AAG’s on both nodes.


#######################################################################################################

#Enable SQL AAG, AAG only



#Enable SQL Always on (run on both nodes)



#NOTE: This may fail once, wait a minute, then try again.

Enable-SqlAlwaysOn -ServerInstance $($env:ComputerName) -Confirm:$false -NoServiceRestart:$false -Force:$true



#End Enable SQL AAG, AAG only

#######################################################################################################

Now, until mentioned otherwise, we’re back to only running these steps from a single node.  Again, I’m going to run these from node 1.  If you only have a need for one aag, then only run the “1”.  If you need 2, then run the 1 + 2 sections.  If you’re in IT, I probably shouldn’t need to explain this to you.

Also, just wanted to state again, most of these steps were copied from Edwins blog post, he gets most of the credit here not me.  I fill in a few of the blanks that he didn’t, like backing up the DB’s, seeding settings, etc.

Here is what we’re doing.

  1. We’re created some test DB’s that will be used by the AAG.
  2. Now we make sure the backup share has our folder created. In my case, the service accounts have rights to this location already, because we took care of that in the prerequisites, and so did you… right?
  3. In order to use the newly created DB, we need to back it up, so that’s what we do next.

#######################################################################################################

#Prepare SQL for new AAG (only one node)



#Create a DB per listener. Typically two, but sometimes more / less



#DB1

$db = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Database($SQLObject, $DB1Name)

$db.Create()



#DB2

$db = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Database($SQLObject, $DB2Name)

$db.Create()



#DB3

$db = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Database($SQLObject, $DB3Name)

$db.Create()



#DB4

$db = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Database($SQLObject, $DB4Name)

$db.Create()



#Confirm, list databases in your current instance

$SQLObject.Databases | Select Name, Status, Owner, CreateDate



#Configure the backup share

If ((Test-Path $BackupSharePath) -eq $false)

{

New-Item -Path $BackupSharePath -ItemType Container

}





#Now we need to backup all of the newly created DB's

Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB1Name -BackupFile $TestDB1FullPath

Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB2Name -BackupFile $TestDB2FullPath

Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB3Name -BackupFile $TestDB3FullPath

Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB4Name -BackupFile $TestDB4FullPath



#END Prepare SQL for new AAG (only one node)

#######################################################################################################

Now, the next steps are where we setup the AAG’s. There are two main sections.  One where you setup older AAG’s like SQL 2012 / 2014.  This is commented out right now.  The next section is for SQL 2017, where we’re utilizing SQL’s newer seeding technique, which I like a lot better.

Again, a lot of this was copied from Edwins post, I simply did things like use variables, and filled in blanks related to backing up / restoring, etc.

The basic steps that are similar for non-seeded AAG’s and seeded AAG’s.

  1. We need to setup an endpoint for the AAG on each SQL node.
  2. We need to start the endpoints
  3. We need to create a SQL login for the SQL service accounts so they can connect to each other.
  4. Then we need to grant them the needed rights.

Next, it depends on whether you’re doing a traditional AAG or a seeded AAG.  I’m going to skip the traditional AAG, because Edwin already has a great blog post on that which I linked above.  You can see my steps below, which include things like backing up / restoring that he leaves out, so just cross-reference if you need to.

As for seeding AAG setup, here is how it’s done.

  1. We need to define our replica’s. The one parameter that’s different between Edwins post and mine is the seeding parameter.
  2. Now we create the AAG (on node 1). The additional parameters we have here, are enabling support for DTC and database level health.  These are new features.
  3. Next we join the secondary node to the AAG(s).
  4. Now here is where setting up a seeding DB really starts straying from a traditional. We need to grant the AAG its self-rights to “create DB”, and we need to do this on both nodes.  ***you can continue running this command from one node though, we’re remotely executing these commands.
  5. Now we add the DB you backed up to node 1 and it should automatically have replicated to node 2
  6. Then we finally create the SQL listener and we’re done.
</pre>
#######################################################################################################
#Setup the AAG (only one node)

#Create the endpoints for each SQL AAG replica
New-SqlHADREndpoint -Path "SQLSERVER:\SQL\$($node1)\Default" -Name $($SQLAAGEndPointName) -Port 5022 -EncryptionAlgorithm Aes -Encryption Required
New-SqlHADREndpoint -Path "SQLSERVER:\SQL\$($node2)\Default" -Name $($SQLAAGEndPointName) -Port 5022 -EncryptionAlgorithm Aes -Encryption Required

#Stare the endpoints for each AAG replica
Set-SqlHADREndpoint -Path "SQLSERVER:\SQL\$($node1)\Default\Endpoints\$($SQLAAGEndPointName)" -State Started
Set-SqlHADREndpoint -Path "SQLSERVER:\SQL\$($node2)\Default\Endpoints\$($SQLAAGEndPointName)" -State Started

#Create a SQL login for the SQL service account so each server can connect to each other.
$createLogin = “CREATE LOGIN [$($sqlserviceuser)] FROM WINDOWS;”
$grantConnectPermissions = “GRANT CONNECT ON ENDPOINT::$($SQLAAGEndPointName) TO [$($sqlserviceuser)];”
Invoke-SqlCmd -ServerInstance $($node1) -Query $createLogin
Invoke-SqlCmd -ServerInstance $($node1) -Query $grantConnectPermissions
Invoke-SqlCmd -ServerInstance $($node2) -Query $createLogin
Invoke-SqlCmd -ServerInstance $($node2) -Query $grantConnectPermissions

#Create replicas

##########This is for non-seeded AAG's (traditional backup / restore AAGs)

#Create the replicas
$primaryReplica = New-SqlAvailabilityReplica -Name $($node1) -EndpointUrl “TCP://$($node1).YOURDOMAINHERE.local:5022” -AvailabilityMode “SynchronousCommit” -FailoverMode 'Automatic' -AsTemplate -Version $SQLObject.version
$secondaryReplica = New-SqlAvailabilityReplica -Name $($node2) -EndpointUrl “TCP://$($node2).YOURDOMAINHERE.local:5022” -AvailabilityMode “SynchronousCommit” -FailoverMode 'Automatic' -AsTemplate -Version $SQLObject.version

#Buildthe AAG's
New-SqlAvailabilityGroup -InputObject $($node1) -Name $AAG1Name -AvailabilityReplica ($primaryReplica, $secondaryReplica) -Database $DB1Name -DtcSupportEnabled -DatabaseHealthTrigger -ClusterType Wsfc
New-SqlAvailabilityGroup -InputObject $($node1) -Name $AAG2Name -AvailabilityReplica ($primaryReplica, $secondaryReplica) -Database $DB2Name -DtcSupportEnabled -DatabaseHealthTrigger -ClusterType Wsfc
New-SqlAvailabilityGroup -InputObject $($node1) -Name $AAG3Name -AvailabilityReplica ($primaryReplica, $secondaryReplica) -Database $DB3Name -DtcSupportEnabled -DatabaseHealthTrigger -ClusterType Wsfc
New-SqlAvailabilityGroup -InputObject $($node1) -Name $AAG4Name -AvailabilityReplica ($primaryReplica, $secondaryReplica) -Database $DB4Name -DtcSupportEnabled -DatabaseHealthTrigger -ClusterType Wsfc

#Now we need to backup all db's (both full and log)
Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB1Name -BackupFile $TestDB1FullPath
Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB2Name -BackupFile $TestDB2FullPath
Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB3Name -BackupFile $TestDB3FullPath
Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB4Name -BackupFile $TestDB4FullPath

Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB1Name -BackupFile $TestLog1FullPath -BackupAction Log
Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB2Name -BackupFile $TestLog2FullPath -BackupAction Log
Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB3Name -BackupFile $TestLog3FullPath -BackupAction Log
Backup-SqlDatabase -ServerInstance $($env:ComputerName) -database $DB4Name -BackupFile $TestLog4FullPath -BackupAction Log

#Now we need to restore the DB's
Restore-SqlDatabase -Database $DB1Name -BackupFile $TestDB1FullPath -ServerInstance $($node2) -NoRecovery
Restore-SqlDatabase -Database $DB2Name -BackupFile $TestDB2FullPath -ServerInstance $($node2) -NoRecovery
Restore-SqlDatabase -Database $DB3Name -BackupFile $TestDB3FullPath -ServerInstance $($node2) -NoRecovery
Restore-SqlDatabase -Database $DB4Name -BackupFile $TestDB4FullPath -ServerInstance $($node2) -NoRecovery

Restore-SqlDatabase -Database $DB1Name -BackupFile $TestLog1FullPath -ServerInstance $($node2) -NoRecovery -RestoreAction 'Log'
Restore-SqlDatabase -Database $DB2Name -BackupFile $TestLog2FullPath -ServerInstance $($node2) -NoRecovery -RestoreAction 'Log'
Restore-SqlDatabase -Database $DB3Name -BackupFile $TestLog3FullPath -ServerInstance $($node2) -NoRecovery -RestoreAction 'Log'
Restore-SqlDatabase -Database $DB4Name -BackupFile $TestLog4FullPath -ServerInstance $($node2) -NoRecovery -RestoreAction 'Log'

#Now we need to join the secondary nodes copy of the DB to the AAG
Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$($node2)\Default\AvailabilityGroups\$($AAG1Name)" -Database $DB1Name
Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$($node2)\default\AvailabilityGroups\$($AAG2Name)" -Database $DB2Name
Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$($node2)\default\AvailabilityGroups\$($AAG3Name)" -Database $DB3Name
Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$($node2)\default\AvailabilityGroups\$($AAG4Name)" -Database $DB4Name

########## END This is for non-seeded AAG's (traditional backup / restore AAGs)

#This is for seeded DB's, which is what we're doing in SQL 2017+

#Create the replica's
$primaryReplica = New-SqlAvailabilityReplica -Name $($node1) -EndpointUrl “TCP://$($node1).YOURDOMAINHERE.local:5022” -AvailabilityMode “SynchronousCommit” -FailoverMode 'Automatic' -AsTemplate -Version $SQLObject.version -SeedingMode Automatic
$secondaryReplica = New-SqlAvailabilityReplica -Name $($node2) -EndpointUrl “TCP://$($node2).YOURDOMAINHERE.local:5022” -AvailabilityMode “SynchronousCommit” -FailoverMode 'Automatic' -AsTemplate -Version $SQLObject.version -SeedingMode Automatic

#Create the AAG
New-SqlAvailabilityGroup -InputObject $($node1) -Name $AAG1Name -AvailabilityReplica ($primaryReplica, $secondaryReplica) -DtcSupportEnabled -DatabaseHealthTrigger -ClusterType Wsfc
New-SqlAvailabilityGroup -InputObject $($node1) -Name $AAG2Name -AvailabilityReplica ($primaryReplica, $secondaryReplica) -DtcSupportEnabled -DatabaseHealthTrigger -ClusterType Wsfc
New-SqlAvailabilityGroup -InputObject $($node1) -Name $AAG3Name -AvailabilityReplica ($primaryReplica, $secondaryReplica) -DtcSupportEnabled -DatabaseHealthTrigger -ClusterType Wsfc
New-SqlAvailabilityGroup -InputObject $($node1) -Name $AAG4Name -AvailabilityReplica ($primaryReplica, $secondaryReplica) -DtcSupportEnabled -DatabaseHealthTrigger -ClusterType Wsfc

#Join secondary node to the AAG's
Join-SqlAvailabilityGroup -Path “SQLSERVER:\SQL\$($node2)\Default” -Name $AAG1Name
Join-SqlAvailabilityGroup -Path “SQLSERVER:\SQL\$($node2)\Default” -Name $AAG2Name
Join-SqlAvailabilityGroup -Path “SQLSERVER:\SQL\$($node2)\Default” -Name $AAG3Name
Join-SqlAvailabilityGroup -Path “SQLSERVER:\SQL\$($node2)\Default” -Name $AAG4Name

#Grant the AAG the rights to create a DB (only needed for seeding mode)
$AAG1CreateCommand = "ALTER AVAILABILITY GROUP [$($AAG1Name)] GRANT CREATE ANY DATABASE"
$AAG2CreateCommand = "ALTER AVAILABILITY GROUP [$($AAG2Name)] GRANT CREATE ANY DATABASE"
$AAG3CreateCommand = "ALTER AVAILABILITY GROUP [$($AAG3Name)] GRANT CREATE ANY DATABASE"
$AAG4CreateCommand = "ALTER AVAILABILITY GROUP [$($AAG4Name)] GRANT CREATE ANY DATABASE"

Invoke-SqlCmd -ServerInstance $($node1) -Query $AAG1CreateCommand
Invoke-SqlCmd -ServerInstance $($node2) -Query $AAG1CreateCommand

Invoke-SqlCmd -ServerInstance $($node1) -Query $AAG2CreateCommand
Invoke-SqlCmd -ServerInstance $($node2) -Query $AAG2CreateCommand

Invoke-SqlCmd -ServerInstance $($node1) -Query $AAG3CreateCommand
Invoke-SqlCmd -ServerInstance $($node2) -Query $AAG3CreateCommand

Invoke-SqlCmd -ServerInstance $($node1) -Query $AAG4CreateCommand
Invoke-SqlCmd -ServerInstance $($node2) -Query $AAG4CreateCommand

#Now we need to join the primary nodes copy of the DB to the AAG
Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$($node1)\Default\AvailabilityGroups\$($AAG1Name)" -Database $DB1Name
Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$($node1)\default\AvailabilityGroups\$($AAG2Name)" -Database $DB2Name
Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$($node1)\default\AvailabilityGroups\$($AAG3Name)" -Database $DB3Name
Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$($node1)\default\AvailabilityGroups\$($AAG4Name)" -Database $DB4Name

#Now we need to setup the listners (for both seeded and non-seeded AAG's)
New-SqlAvailabilityGroupListener -Name $($AAG1Name) -staticIP $AAG1IP -Port 1433 -Path "SQLSERVER:\SQL\$($node1)\DEFAULT\AvailabilityGroups\$($AAG1Name)"
New-SqlAvailabilityGroupListener -Name $($AAG2Name) -staticIP $AAG2IP -Port 1433 -Path "SQLSERVER:\SQL\$($node1)\DEFAULT\AvailabilityGroups\$($AAG2Name)"
New-SqlAvailabilityGroupListener -Name $($AAG3Name) -staticIP $AAG3IP -Port 1433 -Path "SQLSERVER:\SQL\$($node1)\DEFAULT\AvailabilityGroups\$($AAG3Name)"
New-SqlAvailabilityGroupListener -Name $($AAG4Name) -staticIP $AAG4IP -Port 1433 -Path "SQLSERVER:\SQL\$($node1)\DEFAULT\AvailabilityGroups\$($AAG4Name)"

#End Setup the AAG (only one node)
#######################################################################################################
<pre>

Finally, we need to fix the SQL listeners so that it only registers one IP address per site, and test the cluster for Microsoft support.  Again, you only need to run this from one node.


#######################################################################################################

#Fix cluster setting and test cluster post AAG (only one node)



#Fix cluster multi-subnet SQL listner setting

$ClusterResource = Get-ClusterResource | Where-Object {$_.ResourceType -eq "network name" -and $_.ownergroup -ne "Cluster Group"}

$ClusterResource | Set-ClusterParameter -Create RegisterAllProvidersIP 0

$ClusterResource | ForEach-Object {Stop-ClusterResource -Name $_.Name}

Get-ClusterResource | Where-Object {$_.ResourceType -eq "SQL Server Availability Group"} | foreach-object {Start-ClusterResource -Name $_.Name}



#Finally test the cluster

Test-Cluster



#END Fix cluster setting and test cluster post AAG(only one node)

#######################################################################################################

Conclusion:

That’s all I have.  I know it’s a big post, I know it’s probably a little hard to follow along, but hopefully you’ve gleaned something from this.  I’ve taken a SQL server setup from a day, down to only a few hours with this script.  Most of that time BTW is waiting for SQL to install and completing the prep work.  I know it’s a lot to take in, so if you have any questions, leave them in the comments.

Also, I hope to have that local GPO function published this week or at the latest January.

One last thing, I know the formatting is a bit hard to read on a blog. SQL Powershell Setup I have the txt version of the PS script for you here.

Review: 5 years virtualizing Microsoft SQL Server

Introduction:

I know what you’re thinking, it’s 2017, why are you writing about virtualizing Microsoft SQL?  Most are doing it after all.  And even if they’re not, there’s this whole SQLaaS thing that’s starting to take off, so why would anyone care?  Well I’m writing this as more of a reflection on virtualizing SQL.  What works well, what doesn’t, what lessons I’ve learned, what I’m still learning, etc.

Like most things on the internet, I find that folks tend to share all the good, without sharing any of the bad (or vice versa).  There’s also just a lot of folks out there saying they’ve done it, without quantifying how well it’s working.  Sure, I’ve seen the cranky DBA say it’s the worst thing to happen, and I’ve seen the sysadmins say it’s the best thing that they ever did.  I find both types of feedback to be mostly useless, as they’re all missing context and depth.  This post is going to follow my typical review style, so I’ll outline things like the specs, the pros and cons, and share some general thoughts.

Background:

When I first started at ASI, I was told we’d never virtualize SQL.  It was the un-virtualizeable workload.  That was roughly six and a half years ago.  Fast forward to today, and we’ve been running a primarily virtualized SQL environment for close to five years.  It took a bit of convincing on my side, but this is basically how I convinced ASI to virtualize SQL.

  • Virtualizing SQL (and other big iron) was gaining a lot of popularity back in 2012
  • I had just completed my first successful POC of virtualizing a lot of other workloads at ASI.
  • We were running SQL on older physical systems and they were running adequately. The virtual hosts I was proposing were at the time two generations newer processor wise.  Meaning, if it was running ok on this dinosaur HW, it should run even better on this newer processor, regardless of whether it was virtual or not.
  • I did a ton of research, and a lot of political marketing / sales. Basically, compiling a list of things virtualization was going to fix in our current SQL environment.  Best of all, I was able to point at my POC as proof of these things.  For example, we had virtualized Exchange, and Exchange was a pretty big iron system that was running well. Many of the things I laid out as pros, I could point to Exchange as proof.

Basically, it was proposed as a win / win solution.  It wasn’t that I didn’t share the cons of virtualizing SQL, it was that I wasn’t as familiar with the cons until after virtualizing SQL.  This is going back to that whole lack of real-world feedback issue.  I brought up things like there would be some performance overhead, troubleshooting would be more difficult, and some of the more well-known issues.  But there was never a detailed list of gotcha’s.  No one that I was aware of had virtualized BIG SQL servers in the real world and then shared their experience in great detail.  Sure, I saw DBA’s complain a lot, but most of it was FUD (and still is).

Anyway, the point is, we did a 180, and went from not virtualizing any SQL, to virtualizing any and all SQL with the exception of one platform (more on that later).

The numbers and specs:

Bare in mind, this was five years ago, these were big numbers back then.

  • VMware cluster comprised of seven Dell r820’s
    • 32 total cores (quad socket 8 cores per)
    • 768GB of RAM
    • Quad 10gb networking
      • Two for the storage network
      • Two for all other traffic
    • Fusion IO drive 2 card
    • Fusion IO Io Turbine cache acceleration.
    • VMware ESXi 5.x – 6.x (over time)
  • Five Nimble cs460 SANs
  • Dual Nexus 5596 10Gb switches
  • Approximately 80 SQL servers (peak)
    • 20 – 30 of which were two node clusters
    • Started with Windows 2012 R1 + SQL 2012
    • Currently running Windows 2012 R2 + SQL 2014 and moving on to Windows 2016 + SQL 2017

To summarize, we have a dedicated VMware cluster for production SQL systems and another cluster (not detailed) for non-production workloads.  It didn’t start out that way, more on that later.

Pros:

No surprise, but there are a lot of advantages to virtualizing SQL that even after five years I still think holds true.  Let’s dig into it.

  • The ability to expand resources with minimal disruption. I’m not talking about anything hot-add here, simply the fact that you can add resources.  In essence, give you the ability to right size each SQL server.
  • Through virtualization, you can run any number of OS + SQL version combinations that you need. Previously there was all kinds of instance stacking, OS + SQL version lag.  With virtualization if we want a specific OS + SQL combo, we spool up a new VM and away we go.
  • Virtualization made it easy for us to have a proper dev, stage, UAT and finally production environment for all systems. Before these would have been instances on existing SQL servers.
  • Physical hardware maintenance is mostly non-disruptive. By being able to easily move workloads (scheduled) to different physical hosts, we’re able to perform maintenance without risking data loss.  There’s also the added benefit that there’s basically no firmware or driver updates (other than VMware tools / version) to apply in the OS its self.  This make maintenance a lot easier for the SQL server its self.
  • Related to the above, hardware upgrades are as easy as a shutdown / power on. There’s no need to re-install and re-configure SQL on a new system.
  • We were able to build SQL VM’s for specific purposes rather than trying to co-mingle a bunch of databases on the same SQL server. Some might say six of one and half a dozen of another, but I disagree.  By making a specific SQL server virtual, it enabled us to migrate that workload to any number of virtual hosts.
  • With enterprise licensing, we could build as many SQL systems as we wanted within the confines of resources.
  • Migrating SQL data from one storage location to another was easy, but I won’t go so far as saying non-disruptive. Doing that on a physical SQL server, requires moving the data files manually.  With VMware, we just moved the virtual disk.
  • Better physical host utilization. This is a double-edged sword, but we’ve been able to more fully utilize our physical HW than before.  When you consider how much SQL licensing costs, that’s a pretty big deal.
  • Redundancy for older OS versions. Before Windows 2012, there was no official support for NIC teaming.  You could do it, but Microsoft wouldn’t support it.  With VMware, we had both NIC redundancy and host redundancy.  In a non-clustered SQL server, VMware’s HA could kick in as a backup for host failures.

Pretty much, all the standard pros you’d expect from a virtual environment, and a few SQL specific ones.

Cons:

This is a tough one to admit, but there are a TON of cons to virtualizing SQL if a sysadmin has to deal with it at scale.

  • Troubleshooting just got tougher with a SQL. VMware will now always be suspect for any and all issues.  Some of it is justified, a lot of it not.  Still, trying to prove it’s not a VMware issue is tough.  You’re no longer simply looking at the OS stats, now you have to review the VM host stats, check for things like co-stop, wait, busy, etc.  Were there any noisy neighbors, anything in the VMware logs, etc.
  • Things behave have differently in a virtual world. In a physical world, “stuns” or “waits” don’t happen.  This is related to the above, but basically, for every simplicity that virtualization adds, it at least matches it with an equal or greater complexity.
  • The politics, OH the politics of a virtual SQL environment. If you don’t’ have a great relationship with your SQL team, I would say, don’t virtualize SQL.  It’s just not worth the pain and agony you’re going to go through.  It will only increase finger pointing.
  • DBA’s in charge of sizing VM’s on a virtual host you’re in charge of supporting. This is related to politics, but basically now that DBA’s know they can expand resources, you can bet your hind end your VM’s will get bigger and almost never shrink (we’ve gotten some resources back, so kudos to our DBA’s).  It doesn’t matter if you explain NUMA concerns, co-stop, etc.  It’s nothing more than “I want more CPU” or “I want more memory”.  Then a week later, when you have VM’s stepping on each other’s toes, it will be finger pointing back at you again.  I think what’s mostly happening here, is the DBA’s are focused on the individual server performance, whereas its difficult to convey the multi-server impact.
  • vMotion (host or storage) will cause interruptions. In a SQL cluster, you will have failovers.  At least that’s my experience.  Despite what VMware puts on their matrix, DON’T plan on using DRS.  Even if you can get the VM’s to migrate without a failover, the applications accessing SQL will slow down.  At least if your SQL VM’s are a decent size.  This was probably the number one disappointment with our SQL environment.
    • Once you can’t rely on DRS, managing VM’s across different hosts becomes a nightmare. You’ll either end up in CPU overload, or memory ballooning. I’ve never seen memory ballooning before virtualizing SQL, and that’s the last application you want to see ballooning and swapping.
    • Since you can’t vmotion VM’s to rebalance the cluster without causing disruptions (save for maybe non-clustered VMs) just keep piling on the struggle.
  • SQL VMware hosts are EXPENSIVE at least when you’re running a good number of big VM’s like we are. We actually maxed out our quad socket servers from a power perspective.  Even if we wanted to do something like add memory it’s not an option.  And when you want to talk about swapping in new hosts, it’s not some cheap 30k host, no it’s a host that probably costs close to 110k if not more.  Adding to that, you’re now tasked with trying to determine if you should stay with the same number of CPU cores, or try to make a case for more CPU cores, which now add SQL licensing costs.

I could probably keep going on, but the point is virtualizing SQL isn’t all sunshine and roses like it is for other workloads.

Lessons learned:

I’m thankful to have had this opportunity, because it’s enabled me to experience first-hand what it’s like virtualizing SQL in a shop where SQL is respectably large and critical.  In this time, I’ve learned a number of things.

  • DRS + SQL clusters = no go. Maybe it works for you and your puny 4 vCPU / 16GB VM, but for one of our vm’s with 24 vCPU and 228GB of RAM, you will cause failovers.  And no DBA wants a failover.
    • Actually DRS + any Windows cluster = no go, but that’s for another post.
  • If I had to do it over again, I would have gotten Dell r920’s instead of 820’s. While both were quad socket, I didn’t realize or appreciate the scalability difference between the 4600 and 8600 series xeons.  If I was building this today, I would go after hosts that are super dense.  Rather than relying on a scale out technique, I’d shoot for a scale up approach.  Most ideal would be something like the HPe SuperDome, but even getting a new M series Xeons with 128GB DIMMS would be a wise choice.  In essence, build a virtual platform just like you would a physical one So if you normally would have had three really big hosts, do the same in VMware.
  • Accept the fact that SQL VM’s are going to be larger than you think they should be. Some of this being fair is SysAdmins think they understand SQL, and we don’t.  There’s a lot more to SQL than CPU utilization.  For example, I’ve seen SQL queries that only used 25% of every CPU core they were running on, but the more vCPUs we allocated to the VM, the faster that query ran.  It was the oddest thing I had ever seen, but it also wasn’t the only application I’ve seen like this.  Likely, a disk bottleneck issue, or at least that’s my guess.
  • Just give SQL memory and be done with it. When we virtualized our first SQL cluster, the one thing we noticed was the disk IO on our SAN (and FusionIO card) was pretty impressive.  At first, it’s pretty cool to see 80k IOPS from a real workload, but then when you hear the DBA’s saying, “it’s slow” and you realize that if every SQL server you add needs this kind of disk IO, you’re going to run out of IOPS in no time.  We added something like 64GB of more memory to those nodes, and the disk IO went from 80k to 3k and the performance from the DBA’s perspective was back to what they expected.  There’s no replacement for memory.
  • Virtualizing SQL is complex. While it CAN be as simple as what you’re used to doing, once you start adding clustering, and managing a lot of monster VM’s on the same cluster, it’s a different kind of experience than you’re used to.  To me, it’s worth investing in VMware log insight for your SQL environment to make it easier to troubleshoot things.  I would also add ops manager as another potential value add.  At least these are things I’m thinking of pushing for.
  • Keep your environment as simple as possible. We started out with Fusion IO cards + Fusion IO caching software.  All that did was create a lot of headache, and once we increased the RAM in SQL, the disk bottleneck went away (mostly).  I could totally see using an Intel NVMe (or 3dxpoint) card for something like TempDB.  However, I would put the virtual disk on the drive directly, not use any sort of caching solution.
  • I would have broken our seven node cluster up into two or three two node clusters. This goes back to treating them like they’re physical servers.  Again, scaling up, much better, but if you’re going to use more, but smaller hosts, treat them like they’re physical.
    • We kind of do this now. Node 1’s on odd hosts, node 2’s on even hosts
  • We found that we ultimately didn’t need Vmware’s enterprise plus. We couldn’t vmotion, or use DRS, and the distributed switch was of little value, so we converted everything to standard edition.  Now, I have no clue what would happen if we wanted Ops Manager.  It used to be a la carte, but I’m not so sure anymore.
  • We originally had non-prod and prod on the same cluster. We eventually moved all of non-prod off.  This provided a little more breathing room, and now we have two out of seven hosts free to use for maintenance.  Before, they were partially consumed with non-prod SQL VM’s.
  • We made the mistake of starting with virtualizing big SQL severs and learning about Microsoft clustering + AlwaysOn Availability Groups at the same time. Not recommended J.  I don’t think it would have been easy to learn the lessons we did, even if it was difficult.
  • Just because VMware says something will work, doesn’t mean it will. I quadruple checked their clustering matrix and recommended practices guides.  We were doing everything they recommended and our clusters still failed over.
  • Big VM’s don’t behave the same way as little VM’s. I know it sounds like a no duh, but it’s really not something you think about.  This is especially true when it comes to vMotion or even trying to balance resources (manually) on different hosts.  You never realize how much you really appreciate DRS.
  • I’ve learned to absolutely despise Microsoft clustering when it’s virtualized. It just doesn’t behave well.  I think MS clustering is built for a physical world, where there are certain assumptions about how the host will react.  For the record, our physical SQL cluster is rock solid.  All our issues typically circle back to virtualization.
    • BTW, yes, we’ve tried tuning the subnet failover thresholds, no it doesn’t work, and no I can’t tell you why.
  • We’ve learned that VMware support just isn’t up to par, and that you’re really playing with fire if you’re virtualizing complex workloads like SQL. We can’t afford mission critical support, so maybe that’s what we need, but production support is basically useless if you need their help.
  • Having access to Microsoft’s premier support would be very beneficial in this environment. It’s probably something we should have insisted on.

Conclusion:

Do I recommend virtualizing SQL?  I would say it depends, but mostly yes.  There are certainly days where I want to go back to physical, but then I think about all the things I would miss with our virtual environment.  And I’m sure if you asked our DBA’s, they too would admit to missing some of the pros as well.  Here are my final thoughts.

I would say if you’re a shop that has a lot of smaller SQL servers, and they’re non-clustered, virtualization is a no-brainer.  When SQL is small, and non-clustered, it mostly behaves about the same as other VM’s.  We never have issues with our dev or stage systems, and they’re all on the smaller side and they’re all non-clustered.  Even with our UAT environment, we almost never have issues, even though they are clustered.

For us, it seems to be the combination of a clustered and large SQL server where things start getting sketchy.  I don’t want to make it sound like we’re dealing with failovers all the time.  We’ve worked through most of our issues, and for the most part, things are stable.  We occasionally have random failovers, which is incredibly frustrating for all parties, but they’re rare now a day.

My suggestion is, if you do want to virtualize large clustered SQL systems, treat them like they’re physical.  Here are a few rough recommendations:

  • Avoid heavy CPU oversubscription. Shoot for something like less than 3:1, and more ideal being less than 2:1
  • Size your VM’s so they fit in a NUMA node. That would have been impossible back in the day, but now a day, we could probably do this.  Maybe some of you though, this will still be an issue.  Our largest VM’s (so far) are only 24 vCPU, so we can fit in a single NUMA node on newer HW.
  • Don’t cluster in VMware period. No HA, no DRS.  Keep your hosts standalone and manage your SQL VM’s just like you would if they were physical.  Meaning, plan the VMware host to accommodate the SQL VM(s).
  • Don’t intermix non-SQL VM’s with these systems. We didn’t do this, but I wanted to point it out.
  • Plan on a physical host that can scale up its memory if needed.
  • When doing VMware host maintenance, failover your SQL listeners / clusters before migrating the VMs.
    • BTW, it’s typically faster to shutdown a VM then vMotion it while powered on at the sizes we’re dealing with.

Finally, I wanted to close by pointing out, that performance was never an issue in our environment.  In fact, things got faster when we moved to the newer HW + SAN.  One of the biggest concerns I used to see with virtualizing SQL was performance, and yet it was everything else that no one mentioned that ended up being the issues.

Hope this helps someone else who hasn’t taken the plunge yet or is struggling themselves.

Thanksgiving 2017: Truck Drivers

I know I’m totally guilty of saying expletives with regards to truckers.  Being honest, most of it is due to my own impatience / selfishness.  Still once I realize, that I’m getting upset about a person that’s driving a 40-ton vehicle cautiously, things tend to fall back into perspective.  In fact, the more I’ve watched shows like Ice Road Truckers, and various documentaries about truck’in, the more I find that there’s just not enough gratitude expressed for these folks.

The funny thing is, you’d think a guy like me, would actually appreciate some of the work hazards that a trucker deals with for 8 – 12 hours a day.  I commute about 1 – 1.5 hours each way, which in the grand scheme of things, is nothing compared to these folks.  However, it’s enough to have a rough idea of what they might deal with.

  • People who cut you off.
  • People who are always break tapping, or worse, waiting till the last minute and slamming on their brakes.
  • Getting stuck in traffic
  • Dealing with inclement weather
  • Dealing with road rage drivers (I’m throwing my hand up as an occasional offender)
  • Being stuck in a vehicle, by yourself, with nothing but music or the radio to keep you company.
  • Dealing with drivers who drive stupid aggressive
  • Dealing with drivers who drive so timidly they cause all kinds of traffic issues

 

And really, that’s not even scratching the surface of what a longish commute is like.  Seriously, I used to start my day at work pissed off on an almost daily basis dealing with what I view as a bunch of morons on the road.  Then doing it again on the way home.  It’s a wonder that every trucker isn’t out there just plowing people off the road.  I can’t imagine dealing with that for 12 hours a day.  Heck, I hate driving 9 hours for a vacation destination, and that’s supposed to be the start of a fun day.

Most of these issues that I’m writing about, are orders of magnitude worse for truckers.  I didn’t even touch on the more unique challenges a trucker deal with, like…

  • Driving a really large vehicle on relatively narrow roads. Think of something like a small town or worse an old city.
  • Trying to find a loading dock for some new delivery, and compound that challenge by dealing with the above.
  • Having to keep a constant eye out for bridges that might be too short, or roads that aren’t truck approved.
  • Keeping a constant eye out for signs that most of us ignore.
  • Dealing with weigh stations and random vehicle inspections
  • 12 plus hours, day after day of being stuck in a cramped space by yourself. At best you have a CB with some colleagues to keep you company.  Or maybe they’ve got a pet / or family member riding with them at times.
    • A lot of us can get up and walk around, or even stand up.
    • Most of us can bring a healthy meal to work. I’m not saying it’s impossiable for them, but it’s probably nowhere near as easy.
  • Their bathroom breaks, require finding a rest area that’s tractor friendly, or using an old cup.
  • You think changing a flat suck on your car, imagine what it’s like on a tractor?
  • Dealing with towing all kinds of different loads and needing to make sure that your cargo arrives intact. I mean, just think about driving a tanker.  There is a liquid that is sloshing back and forth while you drive.  You hit the brakes, and then there’s this delayed surge that start pushing your vehicle forward.  Now take that delayed response, and it apply it to every direction.  You accelerate, and then all of a sudden something starts pulling you back, you turn left, and something wants to go right.  Just crazy impressive the skill it has to take to haul that safely.
  • How about driving extra wide / long loads. Yeah, they do get an escort a lot of times, but that doesn’t diminish the challenge of it.
  • You and I get a ticket, at most it’s a fine and a few points. A trucker gets a ticket, it could be the end of their career.
  • They break down, they’re not making money, and to compound that issue, its likely there’s something coming out of their pocket.

I’m sure there are a ton of more unique challenges, but I think you get the point.  These folks, have a hard job, that’s totally underappreciated, and worse, most of use effectively tell them to go pound sand based on the way we drive.

How can you be thankful?

I’m just taking a stab here at a few things.  Any truckers, please feel free to let me know if anything should be added.

  • Before you merge in front of a tractor, put your blinker on for a good ten seconds to give them time to slow up and build up a new buffer space. You might think that space is huge between them and the vehicle in front of them.  That’s because they need a lot more stopping distance than you and I.
  • Those white lines at traffic stops aren’t there to look pretty. Stop creeping over that line or braking past it.  That is engineered so tucks can make a turn without you needing to backup.
    • If you see a truck getting ready to make a tight turn on to your road, and you’re approaching that intersection. Just stop early and given them plenty of turning space.
  • If they were driving in the left lane, and are trying to move back into the right, don’t pass them on the right. Instead, flash your lights and let them over (presuming you’re in the right). And if you’re in the left, don’t trying to whip around them on the right.
  • Don’t sit next to them on a highway unless you have to. I’m just guessing here, but I imagine it’s really hard for them to see you.  You and I have some pretty bad blind spots, theirs are a lot worse.  If I were them, I’d be pretty darn scared to change lanes.
  • Get out of their way on a downhill. They need the momentum for the next hill.
  • When they’re broken down on the side of the road (or anyone for that matter), do everything in your power to slow down at the least, and better, move to the left if you can. In some states, this is becoming a law, so failure to do this, could result in a ticket.
  • If you see them attempting to pull into a loading dock, or a narrow road, or whatever, give them plenty of space and be patient. They’re just doing their job, they didn’t make the loading dock or road, but they’ve been forced to fit a big thing in a small space.

I’m sure there’s are other things we can do, but I suspect this would help a bit.

Closing:

To every trucker out there, thank you!  I know you folks are responsible for getting all the things we need (and want) from its source to the destination.  America would be in a world of hurt without you.

Thanksgiving 2017: Sanitation and cleaning crew

Series Introduction:

Back in October I had a grand plan to have 23 days of thanks.  Unfortunately, life got in the way, and I never had the time to pre-write all the posts I wanted to.  Rather than giving up, I’m going to punch out as many as I can before the 24th.  Since I want to focus on the month of thanks, by giving thanks, I’m not going to be writing any technical posts.

Some of these posts will be discussing jobs that are dirty and with dirty jobs, naturally comes some dirty details.  There’s someone dealing with this stuff, so if the closest you get to anything I write about, is the words in this post, consider yourself lucky (and be thankful).

Sanitation and cleaning crew:

I was at the KOP mall with the family sometime over the summer, and I distinctly remember waiting for the wife and kids to complete their bathroom stop.  The act of the bathroom breaks itself wasn’t exactly a memorable one, it happens all the time.  What made me remember this specific event was the sanitation worker.  It was an older guy, and he was taking care of anything from changing the trash in the food court to cleaning the restrooms.  I had looked up from my phone and he was smiling while he worked.  I kept my phone down and admired him for a minute.  It’s rare to see most people smile at what they do, especially when it’s cleaning up after someone else.

At one point, he walked into the mens bathroom to empty the trash, and when he came back, he looked at me, and said something to the effect of “I swear I can never win this battle” and then we both laughed and moved on.

Aside:  If you’re wondering what happen to the wife and kids, you’re probably a male.  Let’s just not think about that trivial detail, but if you must, the blame is 100% going on the kids.

Now most of us walk past these amazing folks all the time, and probably don’t give them a second thought.  I know I’m guilty of this.  However, I’m never more keenly aware and thankful for them than when I walk in a Men’s bathroom.  Here’s the thing, “men” are pigs in the bathroom.  I know some of you aren’t, but most of you are.  Not lifting the lid up in the stalls (you can guess what’s all over the seat), leaving your toilet paper shreds all over the floor, letting paper towels that fell out of the trash lie on the ground.    You know how else I know that most men are pigs, I used to have to clean up after them myself.  While going to college, I used to be a butcher’s assistant.  I’d come in, and basically clean up after the guys, and the place was always a total shit hole.  So, while I’ve never cleaned up someone else’s urine (other than my kids), I can at the very least empathize with cleaning up someone else’s mess as a job.  It’s a tough, unrelenting, unappreciated and ultimately an undervalued job in our society, and we owe these folks better.

How can you be thankful?

Here’s the thing, saying thanks is probably the most disingenuous thing you can do, if that’s all you ever do.  While I’m not a sanitation worker, I’ll take a stab at a few ways you can say “thanks” through your actions.  These are some things I personally do.

  • Lift the lid up when you go pee (male specific of course). Besides the fact that no one wants to clean up your urine, I suspect YOU don’t want to sit in anyone’s urine either.  I used to think the biggest offenders were kids, until I saw more than a fair share of men (I mean little boys) doing this.  I got news for you, don’t ever sign up for a sharpshooting contest, your aim sucks.
    • For the record, moms, I get that you can’t supervise your kids (or husbands), but you can instill the behavior at home.
  • If you pull the TP and a little shred breaks off, pick it up and throw it in the toilet, don’t leave it lie on the ground.
    • If you don’t want to touch the floor, I assure you where you’re getting ready to put your hands is equal to or dirtier than that floor.
  • If your trash won’t fit in the trash can, go find another trash can, and let management know. Don’t keep stacking the trash.
    • This goes for any trash can for any need.
  • If you spill something on the table or ground, clean up after yourself. No one is expecting you to carry a container of Greenworks around, but you can take a napkin and at least make sure you get the substances removed as best you can.  If it’s bad enough, let someone know.
  • Take all your trash with you when you leave.

I’m sure there’s other things we can all do, and if any sanitation worker wants to make a recommendation, I’ll be glad to add it.

Closing:

As genuine as I possibly can, I want to thank everyone that’s responsible for making our spaces clean.  Like most jobs, no one appreciates you when you do your job well, but everyone will be sure to let you know when you’re not.  I want you to know, I notice when the bathrooms, or tables, or whatever it is that you clean, is clean.

Powershell Scripting: Get-ECSWSUSComputerUpdatesStatusReport

Introduction:

I hate the WSUS reports built into the console.  They’re slow, and when it comes to doing something useful with the data, it’s basically impossible.  That’s why I wrote this function.

I wanted an ability to gather data on a given WSUS computer(s), and work with it in Powershell.  This function gives me the ability to write scripts for bulk reports, automate my patching process (checking that all updates are done), and in general, gives me the same data the standard WSUS report does, but at a MUCH faster rate.

You can find the function here.

Dependencies:

You’ll need my Invoke-ECSSQLQuery function located here.  This is going to mean a few things before you get going.

  • You need to make sure the account you’re running these functions under has access to the WSUS database.
  • You need to make sure the database server is setup so that you can make remote connections to it.
  • If you’re in need of SQL auth instead of windows auth, you’ll need to adjust the Get-ECSWSUSComputer and Get-ECSWSUSComputersInTargetGroup so that the embedded calls to my invoke-ecssqlquery use SQL auth instead of windows.

Secondly, this function doesn’t work without the “object” result of Get-ECSWSUSComputer or Get-ECSWSUSComputersInTargetGroup.  That means you need to run one of these functions first to get a list of computer(s) that you want to run a report against.  Store the results in an array.  Like $AllWSUSComputers = …..

Syntax examples:

if you’re reading this in Feedly or some other RSS reader, it’s not going to look right, you’ll need to hit my site if it looks like a bunch of garble.


$AllWSUSComputers =  Get-ECSWSUSComputer -WSUSDataBaseServerName "Database Server Name" -WSUSDataBaseName "SUSDB or whatever you called it" -WSUSComputerName "ComputerName or Computer Name pattern" -SQLQueryTimeoutSeconds "Optional, enter time in seconds"

Foreach ($WSUSComputer in $AllWSUSComputers)
     {
     Get-ECSWSUSComputerUpdatesStatusReport -WSUSDataBaseServerName "Database Server Name" -WSUSDataBaseName "SUSDB or whatever you called it" -WSUSComputerObject $WSUSComputer -SQLQueryTimeoutSeconds "Optional, enter time in seconds"
     }

Let me restate, you’re pointing at a SQL server.  Sometimes that’s the same server as the WSUS server, or sometimes that’s an external DB.  If you’re using an instanced SQL server, then for the database server name, you’d put “DatabaseServername\InstanceName”

if you actually want to capture the results of the report command,  my suggestion is to create an arraylist and add the results of the command into that array, or dump it to a JSON / XML file.  If you’re only running it against one computer, there’s probably no need for a foreach loop.

Output:

The output is the same not matter which function you run, with the one small exception being that I capture the computer target group name in the computer target group function.The

Name : pc-2158.asinetwork.local
AllPossiableUpdatesInstalled : True
AllApprovedUpdatesInstalled : True
AllPossiableUpdatesNotInstalledCount : 0
AllApprovedUpdatesNotInstalledCount : 0
LastSyncResult : Succeeded
LastSyncTime : 09/30/2017 16:11:33
LastReportedStatusTime : 09/30/2017 16:20:16
LastReportedInventoryTime :

Again, this output is really designed to feed my next function,but you might find it useful to do things like confirm that all WSUS computers are registered that should be, or to simply check the last time they synced.


$WSUSComputer | Select-Object -ExpandProperty UpdateStatusDetailed | Where-Object {$_.Action -eq "Install" -and $_.FriendlyState -ne "Installed"} | Select-Object DefaultTitle

That little snippet will show you all approved updates, that are not installed.  The friendlystate is whether the update is installed or not.  The action is whether the update is approved for install.

If we slightly modify the above command, we can show all updates that are not installed, but applicable by doing the following.


$WSUSComputer | Select-Object -ExpandProperty UpdateStatusDetailed | Where-Object {$_.FriendlyState -ne "Installed"} | Select-Object DefaultTitle

***NOTE1: This report is only as good as the updates that you allow via WSUS. Meaning, if you don’t download SQL updates, SQL updates are not going to show up in this report.

***NOTE2: This report only show non-declined updates. If you declined an update, it won’t show up here.

Closing:

I hope you find this useful. I alway found the default WSUS reporting to be underwhelming and slow. It’s not that it doesn’t work, but it’s really only good for singular computers. These functions can easily be used to get the status of a large swath of systems. Best of all, with it being a Powershell object, you can now also export it in any number of formats, my preference being JSON if I want a full report, or CSV if I just want the summary.

You can also find out how I did all my SQL calls by reviewing the embedded SQL Query in my function if you prefer the raw SQL code.

Powershell Scripting: Get-ECSWSUSComputer and Get-ECSWSUSComputersInTargetGroup

Introduction:

These two functions by themselves aren’t exactly sexy.  Their main goal is to be used to feed my function Get-ECSWSUSComputerUpdatesStatusReport.  Still, I can see some limited value to them outside of that use case.

One thing you’ll notice, is i’m hitting SQL directly instead of querying the WSUS APIs.  I’m doing this, despite it not being recommended, because it’s infinitely faster and far more flexible that the APIs.

Dependencies:

First and foremost, you need my Invoke-ECSSQLQuery function located here.  This is going to mean a few things before you get going.

  • You need to make sure the account you’re running these functions under has access to the WSUS database.
  • You need to make sure the database server is setup so that you can make remote connections to it.
  • If you’re in need of SQL auth instead of windows auth, you’ll need to adjust the Get-ECSWSUSComputer and Get-ECSWSUSComputersInTargetGroup so that the embedded calls to my invoke-ecssqlquery use SQL auth instead of windows.

Syntax examples:

First, if you’re reading this in Feedly or some other RSS reader, it’s not going to look right, you’ll need to hit my site if it looks like a bunch of garble.


Get-ECSWSUSComputer -WSUSDataBaseServerName "Database Server Name" -WSUSDataBaseName "SUSDB or whatever you called it" -WSUSComputerName "ComputerName or Computer Name pattern" -SQLQueryTimeoutSeconds "Optional, enter time in seconds"

Get-ECSWSUSComputersInTargetGroup -WSUSDataBaseServerName "Database Server Name" -WSUSDataBaseName "SUSDB or whatever you called it" -WSUSComputerTargetGroupName "Computer target group name (wildcards supported)" -SQLQueryTimeoutSeconds "Optional, enter time in seconds"

Let me restate, you’re pointing at a SQL server.  Sometimes that’s the same server as the WSUS server, or sometimes that’s an external DB.  If you’re using an instanced SQL server, then for the database server name, you’d put “DatabaseServername\InstanceName”

The ComputerName param (and TargetGroupName) support wildcards.  You can use “*” (and my function will convert it to proper SQL wildcard) or you can use “%”.  Doesn’t matter how many you use, or where you put them.

Output:

The output is the same no matter which function you run, with the one small exception being that I capture the computer target group name in the computer target group function.


ComputerTargetId : 07504bcf-e736-4222-b13c-989c425b7c11
ParentServerId :
Name : The name of the computer
IPAddress : The Computers IP
LastSyncResult : Succeeded
LastSyncTime : 9/30/2017 4:11:33 PM
LastReportedStatusTime : 9/30/2017 4:20:16 PM
LastReportedInventoryTime :
ClientVersion : 10.0.14393.1670
OSArchitecture : AMD64
Make : Dell Inc.
Model : OptiPlex 990
BiosName : Default System BIOS
BiosVersion : A19
BiosReleaseDate : 8/26/2015 12:00:00 AM
OSMajorVersion : 10
OSMinorVersion : 0
OSBuildNumber : 14393
OSServicePackMajorNumber : 0
OSDefaultUILanguage : en-US

Again, this output is really designed to feed my next function,but you might find it useful to do things like confirm that all WSUS computers are registered that should be, or to simply check the last time they synced.

Closing:

Any questions or recommendations, feel free to fire away.  Again, not a super sexy funtion, but I think you’ll like the next one coming up.

Review: 877stockcar.com exotic experiences

Introduction:

This post is 100% off topic, it’s about my “exotic car” experience through 877stockcar.com.  In general, my blog is for tech stuff, but I figure it might be fun to write about something non-tech for once.  This is about the https://877stockcar.com/experiences/exotic-experiences/ located at Pocono Raceway.

I wanted to write this for anyone that might be thinking of dropping up to $700 on their package, so you know what you’re in for.  My wife got me the mid-tier package for Christmas (best gift ever) because she knows I’m a pretty big car nut.

In case someone reads this that’s not familiar with my review style, besides going over the pros and cons, you’ll find that my assessment will be blunt.  While I may have a degree of diplomacy in my views, the point of my review style is to be brutally honest.

The weather:

In my case, I couldn’t have asked for a more perfect day.  70ish and sunny, with no rain for days, which meant the track and waiting area was dry.

Pros:

As usual, I like to start with the good before digging into the bad.

  • For the most part, the cars they had are what I would consider pretty respectable. I personally drove an Audi R8 (v10) and a Maserati MC.  If you’re thinking to yourself “those are six figure cars” I get it, but they’re low six figure cars, as in less than 200k.
  • The cars were clean inside and out. I’m only bringing it up, because you’re paying for an experience and no one wants a dusty dash, and a dirty car. No it doesn’t affect how they drive, but I know it can skive some folks out.
  • They provide something that best I can describe is a head glove, so you keep your germs to yourself. Similar inside the cars, the seats are covered, although I suspect that’s more to protect the interior of the car than the driver.
  • The instructors I had were all friendly, and knew the track like the back of their hand.
  • They had the apex’s all coned off for you. Short of painting a driving line (more on that later) you knew exactly where to go if you were trying to maximize your speed.
  • Similar to above, they had the breakpoint marked off for their one straight away.
  • The helmet they offered fit my large head, which was good. It was honestly a concern I had going in.
  • For the little amount of time you do get with the cars, it is a fun experience.

Cons:

This was my “exotic car” experience. I’m not trying to imply the whole experience was negative, it wasn’t. However, as you’ll see it was far from perfect.

  • Where am I supposed to go? So, I plugged in the address, as marked on the site, and arrived to a locked gate.   A few thoughts on this:
    • We tried calling them to see what’s up. We were greeted by the “we’re closed today, but you can leave a message”.  Here’s the thing, If I’m dropping (or I my wife in this case) anywhere from $250 – $700 for a course that lasts maybe 20 minutes, your ass can staff someone to answer a damn phone during the hours of the event.
    • When I called to make my reservation, there was zero mention of where to go specifically, or that the main gate would be locked. The only thing I was told was make sure I wear socks and sneakers, that’s it.  In my not so humble opinion, I think pointing out something that I imagine is pretty common would make sense to do.
      • Related to this, I did find on their website (https://877stockcar.com/wp-content/uploads/2017/04/Directions.pdf) directions on where to actually go. Now I can see how this being partially on me for not looking (like I’m sure most people don’t), but I’m totally calling bull shit on their inability to provide a set of GPS coordinates (let alone bring attention to the main gate not being the right place to go).    So, you’re telling me, NO ONE in the whole facility, with all the revenue this place probably brings in, can afford or has access to GPS?  Right…
    • Why not place a sign right in front of the main gate, saying something like “go here, wrong entrance”?  Again, just to brow beat the concept, I can’t imagine I’m the only one to do this.
      • Once we started driving down the road (knowing there were a few more entrances) we saw that they had small post signs that eventually led us to the right entrance.
    • Where do we park? I’m not trying to nitpick here, but knowing where to park wasn’t made abundantly clear We guessed where we parked was fine, but there were no signs saying park here.  Actually, adding to that, there were no signs even letting us know we were at the right spot.  I mean it was kind of obvious with a bunch of Lambos running around and a large tent, but there was no official indication that we were at the right spot.  For all we knew, it was some crew area.
    • The check-in: To be honest, the guy at the check-in acted like I was bothering him, and was clearly pre-occupied with something else.  Here’s the thing, it’s my first time (and probably my last with them) and I have zero clue what the process is. He didn’t ask if it was my first time, he didn’t ask how I was, it was “sign here”.    So after checking in, I basically had to keep asking questions in order to figure out where I’m supposed to go, how the process works, etc.
    • The introduction: After standing there for a few minutes, some random employee walked up to the area and asked if anyone just arrived, and the few of us flocked to him.  He proceeded to rapid fire off a rough set of instructions on how the process works, doesn’t ask if anyone has questions and walks off. He was a nice guy, but you could tell he did this all the time, and was probably on auto pilot. Meaning I think he just assumed everyone understood what to do.
    • So when do I drive? After standing around for a bit longer and frankly pretty frustrated, I started observing what others were doing and basically figured out that helmets get dropped off at a table, and we’re supposed to just go fight over them.  Once you figure that out, the next part is just standing in a random area near the drop off / pickup.  And again, it’s more or less a diplomatic fight for going after whatever car you want.
    • Driving:
      • Instructor: Alas, I’m finally sitting in an R8. The instructor is a super nice guy, and goes over adjusting the seat, and basic instructions to get the car into go mode.  We take off for my “warm up” lap, and he takes me through the course showing me the apex’s (while holding the steering wheel, really weird).  And then mostly lets me have at it.  He continues coaching me on trying to hit the apex’s but other than that, pretty much along for the ride.
        • Cool down lap conversation: I figured since we were basically driving as fast as I do in a school zone for the cool down lap that I’d break the awkward silence and try to have a conversation.  I tried asking him about the cars to which he didn’t have much knowledge (or didn’t want to chat).   I get that you don’t need to know the cars to be a good driver, but this is kind of a driver enthusiast experience, I’d think the instructors could talk all about the cars. I don’t know, maybe they’re just busy keeping an eye out for other drivers too.
      • Track: IMO, the track sucks.  Here’s the thing, it’s not that the track was badly maintained or anything like that, it’s just the thing is so damn small.  Their straight away, I’m fairly confident isn’t even a quarter mile.  You spend more time trying to whip through corners (which IS fun) and you never really get a chance to get the car over 100.  Now, being fair, I suspect a good deal of that has to do with my skill, but ALSO the skill of the drivers in front of you, more on that in a sec.  So, when they tell you 4 laps, it’s like ten minutes tops, and that’s if you’re poking around.
      • Other people: The fact is, they have way too many people on the track at a given time.  During both of my group of laps, the busyness varied, but it was very rare that you’d have even close to a wide-open track in front of you.  By the time I was in the Massaratti, I got to a point where I was mostly getting stuck behind other drivers.  The instructor kept telling me if I could catch them we could pass them. I was like a car length and a half, and that was only because I didn’t want to rear end anyone.  So I’m not sure what’s defined as catching, but if you think you’re going to be passing slower drivers, I’ll say you’ll typically burn 25% of your laps before you get the opportunity.  That said, I know they said it’s not racing, so just make sure you have your expectations in line.
      • Picking the car: I kind of knew what I wanted to drive, but they didn’t ask what I wanted to drive.  Hell, they didn’t even tell you what all the cars were, specs or anything like that.  Being fair, they mentioned which cars were RWD vs. AWD.  It was also really disappointing that a few cars were only available for the folks who had the $700 package, more specifically the McLaren.  Although, it sounded like there were reliability issues with it, so maybe not a big deal.
    • The cars: To be blunt, I wasn’t impressed with the car selection. It’s not that they had bad cars, they lacked variety.  I think the fastest car they had was the R8 or the McLaren when it was working..  A lot of their cars were convertibles (lame), and really the variety was lacking.  I would have much rather seen one of a few different types of cars, than having a pick of four or five cars that are basically all the same.  I mean, going to the Lambo and the R8.  It’s basically the same car with a different skin.

Conclusion:

All in all, the experience is plagued with terrible customer service, practically zero training / overviews, a complete lack of organization, overcrowding and ultimately, it’s a ton of money to dump on what is essentially 20 minutes at most of driving.    It was certainly fun to drive the cars, but I’d never give them another dollar of my money.  Instead, I’d probably just spend a little extra and go to a Porsche, BMW or the like driving school.  I suppose if you just want to know what its like to drive the car, it’s an ok experience, but for the money you spend, you could probably rent the car for a whole day.  At least then, you’d get some real seat time with the car.

What would I do differently?

  • The registration process should include detailed directions emailed to you (and discussed over the phone). I would also send a reminder email, along with a restatement of where to go, and where to park.
  • If the main gate is locked, I’d put a sign right in front telling folks to turn around and go this way.
  • I’d staff a person or two on the phones (how about the registration people?) to answer calls during the event times.
  • I would run the event as batches of people, rather than make it a free for all.
    • Everyone would have a helmet
    • The cars would be lined up, with specs and performance numbers outlined.
      • I would let folks look at the cars for a few minutes at the very least so you can see what you might actually want to drive.
    • I would document which cars folks wanted to drive, and have a program that organizes an order when driver x gets car y and how long the wait is estimated to be.
    • I would have the instructors take each person out for a lap to show them the course before having the driver do it.
    • I would then have the instructor take a driver out in something like a Miata for a few laps so they can get familiar with the track in a fuel efficient, affordable sports car.
    • I would limit the track to no more than three cars at a time.  At least if we’re talking the track layout they had.  MAYBE, if the track was longer and they had an actual straight away, they could get away with more cars, without spoiling the experience.
    • Rather than doing “laps” it would simply be a timed event. You get 15 minutes for every $300 or whatever would make business sense.  This way faster drivers don’t lose seat time.
      • And that would be 15 minutes, with the car you want, and with no more need to “warm up”.
      • Cool downs? Just let the car sit for five minutes or so when they’re done.  If people are really seeing brake fade, equip the cars with some better pads.  And if the car can’t handle having the piss beat out of it for at least 15 minutes in a row, it’s not exactly a great exotic car.
    • I would have GoPros on the helmets and the cars themselves. I would record videos that could be purchased, provide lap times, top speed, most g’s pulled, etc.  They had none of that stuff.
    • I would have a larger variety of cars, and 100% of them would be coupes. If you want to drive a freaking convertible, go get a Solara.  To name a few…
      • Corvette ZR1
      • Audi R8 (was a good pick)
      • A real Ferrari, like a 488 GTB
      • BMW M5
      • Ford GT 40
      • Lotus
      • Porsche (maybe GT3?)
      • Ariel Atom (ok not a coupe, but it’s allowed to be excluded).
    • For Pocono Raceway specifically, I would open up the track so that maybe you could end up on the actual race track for a bit, so there’s enough room to actually open the car up a bit. What in the world is the point to a car that can go 180+ if you can’t even get it to 100?  Maybe offer two options, an open track for speed demons, and a closed track for folks that like to feel the G force.
    • I would paint a driving line rather than relying on cones. Or some similar material.
    • How about something for the family to do while they wait?  I don’t have a particular idea of what that might be, but I suspect standing around isn’t their idea of fun.

I realize it’s a business and ultimately, it’s about making money.  The cars aren’t cheap, and I’m sure they’re getting the piss beat out of them, but I think those are some relatively cheap things they could do, that would make a dramatic improvement in the driving experience.

Thinking out loud: VMware, this is what I want from you

Warning:

This post is clicking in at 6k words.  If you are looking for a quick read, this isn’t for you.

Disclaimer:

Typical stuff, these are my personal views, not views of my employers.  These are not facts, merely opinions and random thoughts I’m writing down.

Introduction:

I don’t know about all of you, but for me, VMware has been an uninspiring company over the last couple of years.  VMworld was a time when I used to get excited.  It used to mean big new features were coming, and the platform would evolve in nice big steps.  However, over the last 5 – 7 years, VMware has gotten progressively disappointing.  My disappointment however is not limited to the products alone, but the company culture as well.

This post will not follow a review format like many of you are used to seeing, but instead, will be more of a pointed list of the areas I feel need improvement.

With that in mind, let it go on the record, that in my not so humble option, VMware is still the best damn virtualization solution.  I bring these points up not to say that the product / company sucks, but rather to outline that in many ways, VMware has lost its mojo, and IMO some of these areas would be good steps in recovering that.

The products:

The death of ESXi:

You know, there are a lot of folks out there that want to say the hypervisor is a commodity.  Typically, those folks are either pitching or have switched to a non-VMware hypervisor.  To me, they’re suffering from Stockholm’s syndrome.  Here’s the deal, ESXi kicks so much ass as a hypervisor.  If you try to compare Hyper-V, KVM, Xen or anything else to VMware’s full featured ESXi, there is no competition.  I don’t give a crap about anything you will try to point out, you’re wrong, plain and simple.  Any argument you make will get shot down in a pile of flames.  Even if you come at me with the “product x is free” I’m still going to shoot you down.

With that out of the way, it’s a no wonder that everyone is chanting the hypervisor commodity myth.  I mean, let’s be real here, what BIG innovation has been released to the general ESXi platform without some up charge?  You can’t count vSAN because that’s a separate “product” (more on the quotes later).  vVOLs you say?  Yeah, that’s a nice feature, only took how long?

So, what else?  How about the lack of trickle down and the elimination of Enterprise edition? There was a time in VMware’s history when features trickle down from Enterprise Plus > Enterprise > Standard.  Usually it occurred each year, so by the time year three rolled around, that one feature in Enterprise Plus you were waiting for, finally got gifted to Standard edition.  The last feature I recall this happening too, was the MPIO provider support, and that was ONLY so they could support vVOLS on Standard edition (TMK).

Here is my view on this subject, VMware is making the myth of a commoditized hypervisor a self-fulfilling prophecy.  Not only is there a complete lack innovation, but there’s no trickle down occurring.

If you as a customer, have gone from receiving regular (significant) improvements as part of your maintenance agreement, to basically nothing year over year, why would you want to continue to invest in that product?  Believe me, the thought has crossed my mind more than once.

From what I understand, VMware’s new business plan, is to make “products” like vSAN that depend on ESXi, but that aren’t included with the ESXi purchase.  Thus, a new revenue stream for VMware and renewed dependence on ESXi.  First glance says it working, at least sort of, but is it really doing as well as it could?  While it sounds like a great business model, if you’re just comparing whether you’re black / red, what about the softer side of things?  What is the customer perception of moving innovations to an al a carte model?  For me, I wonder if they took the approach below, would it have had the same revenue impact they were looking for, while at the same time, also enabling a more positive customer perception?  I think so…

  1. First and foremost, VMware needs to make money. I know I just went through that whole diatribe above, but hear me out.  This whole “per socket” model is dead.  It’s just not a sustainable licensing model for anyone.  Microsoft started with SQL and has finally moved Windows to a per core model.  In my opinion, VMware needs to evolve its licensing model in two directions.
    1. Per VM: There are cases, where you’re running monster VMs, and while you’re certainly taking advantage of VMware’s features, you’re not getting anywhere near the same vale add as someone who’s running 20, 30, 50, 100 VM’s per host.  Allowing customers to allocate per VM licenses to single host or an entire cluster would be a fair model for those that aren’t using virtualization for the overcommit, but for the flexibility.
    2. Per Core: I know this is probably the one I’m going to get the most grief from, but let’s be real, YOU KNOW it’s fair.  Let’s just pretend, VMware wasn’t the evil company that Microsoft is, and actually let you license as few as 2 cores at a time?  For all of you VARs that have to support small businesses, or for all of you smaller business out there, how much likelier would you have just done a full blow ESXi implementation for your clients?  Let’s just say VMware charged $165 per core for ESXi standard edition and your client had a quad core server.  Would you think $659 would be a reasonable price?  I get that number simply by taking VMware’s list price and dividing by 8 cores, which is exactly how Microsoft arrived at their trade-ins for SQL and Windows.  NOW, let’s also say you’re a larger company like mine and you’re running enterprise plus.  The new 48 core server I’m looking at would normally cost $11,238 at list for Enterprise Plus.  However, if we take my new per core model, that server would now cost ($703 per core) $33,714.  That’s approximately $22k that VMware is losing out on for just ONE server.  I know what you’re thinking, Eric, why in the world would you want to pay more?  I don’t, but I also don’t want a company that makes a kick ass product to stagnate, or worse crumble.  I’ve invested in a platform, and I want that platform to evolve.  In order for VMware to evolve, it needs capital.
  2. Ok, now that we have the above out of the way, I want a hell of a lot more out of VMware for that kind of cash, so let’s dig into that.
    1. vSAN should have never been a separate product. Including vSAN into that per core or per VM cost just like they do with Horizon, would add value into the platform.  Let’s be real, not everyone is going to use every feature of VMware.  I’m personally not a fan of vSAN, but that doesn’t mean I don’t think I should be entitled to it.  This could easily be something that is split among Standard and Enterprise plus editions.
      1. Yes, that also means the distributed switch would trickle down into Standard edition, which it should be by now.
    2. Similar to vSAN, NSX should really be the new distributed switch. I’m not sure exactly how to split it across the editions, but I think some form of NSX should be included with Standard, and the whole darn thing for Enterprise Plus.
    3. At this stage, I think it’s about time for Standard edition to really become the edition of the 80%. Meaning, 80% of the companies would have their needs met by Standard edition, and Enterprise plus is truly reserved for those that need the big bells and whistles.  A few notable things I would like to trickle down to Standard Edition are as follows.
      1. DRS (Storage and Host)
      2. Distributed Switch (as pointed out in 2ai)
      3. SIOC and NIOC
      4. NVIDIA Grid
  3. As for Enterprise Plus, and Enterprise Plus with Ops manager, those two should merge and be sold at the same price as Enterprise plus. I would also like to see some more of the automation aspects from the cloud suite brought into the Enterprise Plus edition as well.  I kind of view Enterprise Plus edition, as being an edition that focuses on all the automation goodies, that smaller companies don’t need.
  4. IMO, selling vCenter as separate SKU is just silly. So as part of all of this, I would like to see vCenter simply included with your per core or per VM licenses.  At the end of the day, a host can only be connected to one vCenter at a time anyway.
  5. Include a log insight licenses for every ESXi host sold, strictly used for collecting and managing a hosts VMware logs, including the VM’s running on top of them. I don’t mean inside the OS, rather things like the vmware.log as an example.

Evolving the features:

vCenter changes:

I know I was a little tough on VMware in the intro, and while I still stand behind my assertion in their lack of innovation, what they’ve done with the VCSA, it’s pretty kick ass.  I would say it’s long overdue, but at least it finally here.  That said, there’s still a ton of things VMware could be doing better with vCenter.

  1. If you have ever tried to setup a simplistic, but secure profile for some self-service VM management, you know that it’s nightmare. 99% of that problem is attributed to VMware’s very shitty ACL scheme.  The way permission entitlements work is confusing, conflicting, and ultimately leads to having more access granted, so you can get things to work.  It shouldn’t be this difficult to setup a small resource pool, a dedicated datastore and a dedicated network, and yet it is.  I would love to see VMware duplicate the way Microsoft handles ACLS, because to be 100% honest, they’ve nailed it.
  2. In general, the above point wouldn’t even be an issue, if VMware would just create a multi-tenancy ability. I’m not talking about wanting a “private cloud”.  This isn’t a desire for more automation or the like, simply a built-in way, to securely carve up logical resources, and allocated them to others.  I would LOVE to have an easy way for my Dev, QA and DBAs to all have access discrete buckets of resources.
  3. So, I generally hate web clients, and nothing enforced that more than VMware. Don’t get me wrong, web clients can be great, but the vSphere web client is not.  Here is what I would like to see, if you’re going to cram a web client down my throat.
    1. Finish the HTML5, before ripping the c# away from us. The flash client is terrible.
    2. Whoever did the UI design for the c# client, mostly got it right the first time. The web client should be duplicated aspects of the c# client that worked well.  Things like the right click menu, the color schemes and icons.  I have no problem with seeing a UI evolve over time, but us old heads, like things where they were.  The web clients feel like developers just moved shit around for no reason.  The manage vs. monitor tab gets a big thumb up from me, but it’s after that where it starts to fall apart.  Finding simple things like the storage paths, which used to be a simple right click on the datastore have moved to who knows where.  Take a lesson from Windows 8 and 10, because those UI’s are a disaster.  Moving shit around for the sake of moving it around is the wrong.  Apples OS X UI is the right way to progress change.
  4. The whole PSC + vCenter integration, feels half assed if you ask me. I think for a lot of admins, they have no clue why these roles should be separate, how to properly admin the PSC’s, and if shit break, good luck.  It was like one day you only had vCenter, and the next thing you know, there’s this SSO thing that who knows what about, and then the PSC pops out of nowhere.  It wasn’t a gradual migration, rather this huge burst of changes to authentication, permissions and certificate management.  I would say there a better understanding of the PSC’s at this point, but it wasn’t executed in a good way.  Ultimately though, I still think the PSC’s need some TLC.  Here are a few things l’d like to see.
    1. You guys need to make vCenter and the like smart enough to not need a load balancer in front of the PSC’s. When vCenter joins a PSC domain, it should become aware of all PSC’s that exist, and have automated failover.
    2. There should be PowerCLI for managing the PSC’s, and I mean EVERYTHING about them. Even the stuff where you might run for troubleshooting.
    3. There should be a really friendly UI that walks you through a few scenarios.
      1. Removing a PSC cleanly.
      2. Removing an orphaned PSC controllers or other components (like vCenter).
      3. Putting a PSC into maintenance mode. (which means a maintenance mode should exist)
      4. Troubleshooting replication.
        1. Show the status
        2. Let us force a replication
      5. Rolling back / restoring items, like users or certs.
      6. Re-linking a vCenter that’s orphaned, or even transferring a vCenter persona to a new vCenter environment.
      7. How about some really good health monitors? As in like single API / PowerCLI command type of stuff.
      8. Generating an overall status report.
  5. Update manager, while an awesome feature, hasn’t seen much love over the years, and what I’d really like to see are as follows.
    1. Let me remove an individual update, and provide an option to delete the patch on disk, or simply remove the update from the DB.
    2. Scan the local repo for orphaned patches (think in the above scenario where someone deletes a patch from update manager, without removing it from the file system).
    3. Add the dynamic ability baselines to all classifications of updates, not just updates themselves. Right now, we can’t create a dynamic extensions baseline.
    4. Give me PowerCLI admin abilities. I’d love to be able to use PowerClI to do all the things I can do in the GUI.  Anything from uploading a patch, to creating baselines.
    5. Open the product up, so that vendors could integrate firmware remediation abilities.
    6. Have an ability to check the VMware HCL for updated VIBs, that are certified to work with the current firmware we’re running. This would make managing drivers in ESXi so much easier.
    7. Offer a query derived baseline. Meaning let us use things like a SQL query to determine what a baseline should be.
    8. Check if a VIB is applicable before installing it, or have an option for it. Things like, “hey, you don’t have this NIC, so you don’t need this driver”.  I’ve seen drivers installed, that had nothing to do with the HW I had, actually cause outages.
  6. There are still so many things that can’t be adminsterd using PowerCLI, at least not without digging into extension data or using methods. Keep building the portfolio of cmdlets.  I want to be able to do everything in PowerCLI that I can in the GUI.  Starting with the admin stuff, but also on top of that, doing vCenter type tasks like repointing or other troubleshooting tasks.
  7. How about overhauled host profiles?
    1. Provide a Microsoft GPO like function. Basically, present me a template that shows “not configured” for everything and explain what the default setting is.  Then let me choose whatever values are supported then apply that vCenter wide, datacenter wide, folder / cluster wide or host specific.
      1. Similar feature for VM settings.
      2. Support the concept of inheritance, blocking and over rides.
    2. Let me create a host independent profile, and perhaps support the concept of sub-profiles for cases where we have different hosts. Basically, let me start with a blank canvas and enable what I want to control through the profile.
  8. Let us manage ESXi local users / groups and permissions from vCenter its self. In fact, having the ability to automatically create local users / groups via a GPO like policy would be great.
  9. I had an issue where a 3rd party plugin kept crashing my entire vSphere web client. Why in the world can a single plugin, crash my soon to be only admin interface?  That’s a very bad design.  Protect the admin interface, if you have to kill something, kill the plugins, and honestly, I’d much rather see you simply kill the troublesome plugin.  Adding to that, actually have some meaningful troubleshooting abilities for plugins.  Like “hey, I needed more memory, and there wasn’t enough”.
  10. vCenter should serve as a proxy for all ESXi access. Meaning if I want to upload an ISO, or connect to a VM’s console, proxies those connections through vCenter.  This allows me to keep ESXi more secure, while still allowing developers and other folks to have basic access to our VMware environment.
  11. Despite its maturity, I think vMotion and DRS need some love too.
    1. Resource pools basically get ripped apart during maintenance mode evacuations or moving VM’s (if you’re not careful). VMware should develop a similar wizard to what’s done when you move storage.  That is, default to leaving a VM in a resource pool when we switch hosts, but ask if we’d like to switch it to a resource pool.
    2. I would love to see a setting or setting(s) where we can influence DRS decision a bit more in a heavily loaded cluster. For example, I’ve personally had vCenter move VM’s to hosts that didn’t have enough physical memory to back the allocated memory, and guess what happened?  Ballooning like a kid’s birthday party.  Allow us to have a tick box or something that prevents VM’s from moving to hosts that don’t have enough physical memory to back the allocated + overhead memory of the VM’s.
    3. Would love to see fault zones added to compute. For example, maybe I want my anti-affinity rules to not only be host aware, but fault zone aware as well.
      1. Have a concept of dynamic fault zones based on host values / parameters. For example, the rack that a host happens to run in.
    4. Show me WHY you moved my VM’s around in the vMotion history.
  12. How about a mobile app for basic administration and troubleshooting? I shouldn’t need a third party to make that happen.  And for the record I know you have one, I want it to be good though.  I shouldn’t need to add servers manually, just let me point at vCenter(s) and bring everything in.

SDRS, vVOLS, vSAN and storage in general:

If I had to pick a weak spot of VMware, it would be storage.  It’s not that its bad, it’s just that it seems slow to evolve.  I get it, it’s super critical to your environment, but in the same tone, it’s super critical to my environment, and that means I need them to keep up with demand.  Here is some example.

  1. Add support for tape drives, and I mean GOOD support / GOOD performance. This way my tape server can finally be virtualized too without the need to do things like remote iSCSI, or SR-IOV.  I know what some of you might be thinking, tape is dead.  Wish it were true, but it’s not.  What I really want to see VMware do, is have some sort of library certification process, and then enable the ability to present a physical library as a virtual one to my VM.  Either that, or related to that, let me do things like raw device mappings of tape drives.  Give me like a virtual SAS or fiber channel card, that can do a raw mapping of a table library.  Even cooler, would be enabling me to have those libraries be part of a switch, and enabling vMotion too.
  2. I still continue to sweat bullets about the amount of open storage I have on a given host, or at least when purchasing new hosts. It’s 2017, a period of time where data has been growing at incredible rates, and the default ESXi is still tuned for 32TB of open storage?  I know that sounds like a lot, but it really isn’t.  To make matters worse, the tuning parameters to enable more open storage (VMDK’s on VMFS) is buried in an advanced setting and not documented very well.  If the memory requirements are negligible, ESXi should be tuned for the max open storage it can support.  Beyond that, VMware should throw a warning if the amount of open storage exceeds the configured storage pointer cache.  Why burry something so critical and make an admin dig through log messages to know what’s going on (after the fact mind you)?
    1. Related to the above, why is ESX even limited to 128TB (pointer cache)? Don’t get me wrong, it’s a lot of storage, but it’s not like a wow factor.  A PB of open storage would be a more reasonable maximum IMO.   If it’s a matter of consuming more memory (and not performance) make that an admin choice.
  3. RDM’s via local RAID should be a generally supported ability. I know it CAN work in some cases, but it’s not a generally supported configuration.  There are times where an RDM makes sense, and local RAID could very much be one of those cases.  I should be able to carve up vDisks and present them to a VM directly.
  4. How about better USB disk support? It’s more of a small business need, but a need none the less.  In fact, I would say being even more generic, removable disks in general.
  5. Why in the world is removing a disk/LUN such an involved task still? There should literally be a right click, delete disk, and then the whole work flow kicks off in the background.  Needing to launch PowerCLI, do an unmount, detach process is just a PITA.  There shouldn’t even need to be an order of operations.  I mean, in windows I can just rip the disk out and no issues occur (presuming nothings on the disk of course).  I don’t mind VMware making some noise about a disk being removed, but then make it an easy process to say “yeah, that disk is dead, whack it from your memory”.
  6. Pretty much everything on my vSAN / what’s missing in HCI posts has gone unimplemented in vSAN. You can check that out here and here.  That said, they have added a few things like parity and compression / dedupe, but that’s nothing in the grand scheme of things.
    1. What I really wished vSAN was / is, is a non-hyperconverged storage solution. As in, I wish I could install vSAN as a standalone solution on storage, and use it as a generic SAN for anything, without needing to share it with compute.  Hedvig storage has the right idea.  Don’t know what I’m talking about, go check them out here.  Just imagine what vSAN could do with all that potential CPU power, if it didn’t have to hold its self-back for the sake of the VM’s.  And yes, THIS would be worth of a separate product SKU.
  7. SDRS:
    1. I wish VMware would let you create fault zones with SDRS. This way when I create VM anti-affinity rules and specific different fault zones, I’d sleep better at night knowing my two domain controllers weren’t running on the same SAN, BUT, that they could move wherever they needed to.
    2. It would be really great to see SDRS have the ability to balance VM’s across ANY storage type. And have expanded use to local storage as well.  For example, I would love to see vVOLs have SDRS in front of it.  So, my VM’s could still float from SAN to SAN, even if they’re a vVOL.  For the local storage bit, what if I have a few generic local non-san luns.  I could still see there being value in pooling that storage from an automation standpoint.
    3. I would love to see a DRS integration for non-shared storage DRS. I know it would be REALLY expensive to move VM’s around.  But in the case of things like web servers, where shared storage isn’t needed, and vSAN just adds complexity, I could see this being a huge win.  If nothing else, it would make putting a host into maintenance mode a lot easier.
    4. Let me have affinity rules in standard edition of VMware. This way I can at least be warned that I have two VM’s comingling on the same host that shouldn’t be.
  8. vFlash (or whatever it’s called)
    1. It would be nice to see VMware actually continue to innovate this. For example.
      1. Support for multiple flash drives per host and LARGE flash drives per host.
      2. Cache a data store instead of a single VM. This way the cache is used more efficiently.  Or make it part of a storage policy / profile.
      3. Do away with static capacity amounts per VMDK. In essence offer a dynamic cache ability based on the frequency of the data access patterns.
      4. I would also suggest write caching, but let’s get decent read caching first.

ESXi itself:

The largest stagnation in the platform has been ESXi its self.  You can’t count vSAN or NSX if you’re going to sell it as a separate product.  Here are some areas I would like to see improved.

  • I would love to see the installation wizard ask more questions early on, so that when they’re all answered, my host is closer to being provisioned. I understand that’s what the host deploy is for, but that’s likely overkill for a lot of customers.
    • ASK me for my network settings and verify they work.
    • ASK me if I want to join vCenter and if so, where I want the host located
    • ASK me if I want to provision this host straight to a distributed switch so I don’t need to go through the hassle of migrating to one later.
  • Let the free edition be joined to vCenter. This way we can at least move a vm (shutdown) from one host to another, and also be able to keep them updated.  I could see a great use case for this if developers want / need dedicated hosts, but we need to keep them patched.  I’m not asking for you do anything, other than let us patch them, move vm, and be able to monitor their basic health of the host.  Keep all the other limits in place.
  • Give us an option to NEVER overcommit memory. I’d rather see a VM fail to power on, not migrate or anything if it’s going to risk memory swapping / ballooning.
  • Make reservations an actual “reservation” If I say I want the whole VM’s memory reserved, pre-reserve the whole memory space for that VM, regardless of whether the VM is using it.
  • Support for virtualizing other types of HW, like SSL offload cards and presenting them to VMs. I suspect this would also involve support from the card vendors of course, but it would still be a useful thing to see.  For example, SSL offloading in our virtual F5’s.
  • I want to see EVERYTHING that can done in an ESX CLI and other troubleshooting / config tools also be available in PowerCLI.
  • Have a pre-canned command I can run to report on all hardware, its drivers, firmware and modules.
  • I think it would be kind of slick to run ESXi as a container. Perhaps I want to carve up a single physical ESXi host, into a couple of smaller ESXi hosts and use the same license.  Again, developers would be a potentially great use case for this.
  • I would like to see an ability to export and import and ESXi image to another physical server. Simple use case would be migrating a server from one host to another.  Maybe even have a wizard for remapping resources such as the NICS, and the log location.  I’m not talking about a host backup, more like a host migration wizard.
  • Actually, get ESXi joining to an Active Directory working reliably.
  • How about showing us active NFC connections, how much memory they’re consuming and the last time they were used. While we’re at it, how about supporting MORE NFC connections.
  • Create a new kernel for NFC and cold migration traffic with a related friendly name.
  • Help us detect performance issues easier with top. Meaning, if there are particular metrics that have crossed well known thresholds, maybe raise an event or something in the logs.  Related though, perhaps offing a GUI (or PowerCLI) related option for creating / scheduling an ESXTOP trace and storing the results in a CSV.

Evolving the company:

Documentation:

Look, almost everyone hates being stuck with documenting things, or at least I do.  However, it’s something that everyone relies on, and when done well, it’s very useful.   I get that VMware is large and complex, so I have to imagine documentation is a tough job.  Still, I think they need to do better at it.  Here is what I see that’s not working well.

  • KB articles aren’t kept up to date as new ESXi versions are released. Is that limitation still applicable?  I don’t know, the documentation doesn’t tell me.
  • There is a lack of examples on changing a particular setting. For example, they may show a native ESXCLI method, while completely leaving out PowerCLI and the GUI.
  • There is a profound lack of good documentation on designing and tuning ESXi for more extreme situation. Things like dealing with very large VM’s, designing for high IOPS or high throughput, large memory and vCPU VM’s.  I don’t know, maybe the thought is you should engage professional services (or buy a book), but that seems overkill to me.
  • Tuning and optimizing for specific application workloads. For example, Microsoft Clustering on top of VMware.  Yeah they have a doc, but no it’s not good.  Most of their testing is under best case scenarios, small VM’s, minimal load, empty ESXi servers, etc.  It’s time for VMware to start building documentation based on reality.  To use a lazy excuse like “everyone’s environment is different” doesn’t absolve even an attempt at more realistic simulations.  For example, I would love to see them test a 24 vCPU, 384GB of vRAM VM with other similarlay sized VM’s on the same host, under some decent load.  I think they’d find, vMotion causes a lot of headaches at that scale.
  • Related to above, I find their documentation a little untrustworthy when they say “x” is supported. Supported in what way?  Is vMotion not supposed to cause a failover, or do you simply mean, the vMotion operation will complete?  Even still, there are SO many conflicting sub-notes it’s just confusing to know what restrictions exist and what doesn’t.  It’s almost like the writer doesn’t understand the application they’re documenting.

Support:

If there is one thing that has taken a complete downward spiral, it’s support.  Like, the VMware execs basically decided customers don’t need good support and decided to outsource it to the cheapest entity out there.  Let me be perfectly clear, VMware support sucks, big time, and I’m talking about production support just to be clear.  Sure, I occasionally get in touch with someone that knows the product well, communicates clearly, and actually corresponds within a reasonable time, but that’s a rarity.  Here are just a few examples of areas that they drop the ball in.

  • Many times, they don’t contact you within your time zone. Meaning, if I work 9 – 5 and I’m EST, I might get a call at 5 or 6, or an email at 4am.
  • Instead of coordinating a time with you, they just randomly call and hope you’re there, otherwise its “hey, get back to me when you’re ready”, which is followed by another 24-hour delay (typically). Sometimes attempts to coordinate a time with them works, other times it doesn’t.
  • I have seen plenty of times where they say they’ll get back to you the next day, and a week or more goes by.
  • Re-Opening cases, has led to me needing to work with a completely different tech. A tech that didn’t bother reading the former case notes, or contacting the original owner to get the back story.  In essence, I might as well have opened a completely new case.
  • Communication is hit or miss. Sometimes, they communicate well, other times, there’s a huge breakdown.  It’s not so much understanding words, but an inability to understand tone, the severity of the situation, or other related factors.
  • Being trained in products that have been out for months. I remember when I called about some issues with a PSC appliance 6 MONTHS after vSphere 6 was released, and the tech didn’t have a clue on how the PSC’s worked.  I had to explain to him the basics, it was a miserable experience.
  • Having a desire to actually figure out an issue, or really solve a problem. It’s like they read from a book, and if the answer isn’t there, they don’t know how to think beyond that.

While we’re still on the support topic, this whole notion of business critical and mission critical support is a little messed up.  I guess VMware basically wants us to fund the salary of an entire TAM or something like that, which is bluntly stupid.  It doesn’t matter if I’m a company with one socket of Enterprise Plus, or a company with 100 sockets, we apparently all pay the same price.  I don’t entirely have a problem with pay a little extra to get access to better support, but then it should be something that’s an upgrade to my production support per socket, not a flat fee. Again, it should be based around fair consumption.

Sales:

You know when I hear from my sales team, when they want to sell me something.  They don’t call to check-in and see if I’m happy.  They’re not calling to go over the latest features included with products I own to make sure I’m maximizing value, none of that happens.  All that kind of stuff is reactive at best.  It’s ME reaching out to learn about something new, or ME reaching out to let them know support is really dropping the ball.  I spend a TON of money on VMware, I’d like to see some better customer service out of my reps.  I have vendors that reach out to me all the time, just to make sure things are going ok.  A little effort like that, goes a long way in keeping a relationship healthy.

Website:

I want to pull my hair out with your website.  Finding things is so tough, because your marketing team is so obsessed with big stupid graphics, and trying to shove everything and anything down my throat.  You’re a company that sells lean and mean software, and your website should follow the same tone.  Everything is all over the place with your site.  Also, it’s 2017, having a proper mobile optimized site would be nice too.

Finally, you guys run blogs, but one thing I’ve noticed is you stop allowing new comments after “x” time.  Why do you do this?  I might need further clarification on a topic that was written, even if it’s years ago.

Cloud and innovation:

This one is a tough area, I’m not sure what to say, other than I hope you’re not the next Novell.  You guys had a pretty spectacular fail at cloud, and I could probably go into a lot of reasons, and most of them wouldn’t be related to Microsoft or AWS being too big to beat.  I suspect part of it was you guys got fat, lazy and way too cocksure.  It’s ok, it happens to a lot of companies, and professionals alike.  While it’s hard for me to forsee someone wanting to consume a serverless platform from you guys, I wouldn’t find it hard to believe that someone might want to consume a better IaaS platform than what’s offered by Microsoft or AWS.  While they have great automation, their fundamental platform still leaves a lot to be desired.  That to me, is an area that you guys could still capture.  I could foresee a great use case for a virtual colocation + all the IaaS scalability and automation abilities.  I still have to shutdown an Azure VM for what feels like every operation, need I say more?

Closing:

Look I could probably keep going on, and one may wonder why stop, I’m already at 6,000 plus words.  I will say kudos to you, if you’ve actually read this far and didn’t simply skip down.  However, the point of this post wasn’t to tear down VMware, nor was it to go after writing my longest post ever.  I needed to vent a little bit, and wanted VMware to know that I’m frustrated with them and what they could do to fix that.  I suspect a lot of my view points aren’t shared by all, but in turn, I’m sure some are.  VMware was the first tech company that I was truly inspired by.  To me, they exemplified what a tech company should strive to be, and somewhere along the way, they lost it.  Here’s to hoping VMware will be with us for the long haul, and that what’s going on now, is simply a bump in the road.

 

Powershell Scripting: Microsoft Exchange, Configure client-specific message size limits

Introduction:

If you don’t know by now, I’m a huge PowerShell fan. It’s my go to scripting language for anything related to Microsoft (and non-Microsoft) automation and administration. So when it came time to automating post exchange cumulative update setting, I was a bit surprised to see some of the code examples from Microsoft, not containing any PowerShell example. Surprised is probably the wrong word, how about annoyed? I mean, after all, this is not only the company that shoved this awesome scripting language down our throat, but also the very team that was the first one to have a comprehensive set of admin abilities via PowerShell. So if that’s the case, why in the world, don’t they have a single PS example for configuring client-specific message size limits?

Not to be discouraged, I said screw appcmd, I’m PS’ing this stuff, because it’s 2017 and PS / DSC is what we should be using. Here’s how I did it

The settings:

If you’re looking for where the setting are that I’m speaking of / about, check out this link here. That’s how you do it in the “old school” way.

The new school way:

My example below is for EWS, you need to adjust this if you want to also include EAS.


     Write-Host "Attempting to set EWS settings"
    Write-Host "Starting with the backend ews custom bindings"
    $AllBackendEWSCustomBindingsWebConfigProperties = Get-WebConfigurationProperty -Filter "system.serviceModel/bindings/custombinding/*/httpsTransport" -PSPath "MACHINE/WEBROOT/APPHOST/Exchange Back End/ews" -Name maxReceivedMessageSize -ErrorAction Stop | Where-Object {$_.ItemXPath -like "*EWS*https*/httpstransport"} 
    Foreach ($BackendEWSCustomBinding in $AllBackendEWSCustomBindingsWebConfigProperties)
        {
        Set-WebConfigurationProperty -Filter $BackendEWSCustomBinding.ItemXPath -PSPath "MACHINE/WEBROOT/APPHOST/Exchange Back End/ews" -Name maxReceivedMessageSize -value 209715200 -ErrorAction Stop
        }
    Write-Host "Finished the backend ews custom bindings"
    
    Write-Host "Starting with the backend ews web http bindings"
    $AllBackendEWwebwebHttpBindingWebConfigProperties = Get-WebConfigurationProperty -Filter "system.serviceModel/bindings/webHttpBinding/*" -PSPath "MACHINE/WEBROOT/APPHOST/Exchange Back End/ews" -Name maxReceivedMessageSize -ErrorAction Stop | Where-Object {$_.ItemXPath -like "*EWS*"} 
    Foreach ($BackendEWSHTTPmBinding in $AllBackendEWwebwebHttpBindingWebConfigProperties)
        {
        Set-WebConfigurationProperty -Filter $BackendEWSHTTPmBinding.ItemXPath -PSPath "MACHINE/WEBROOT/APPHOST/Exchange Back End/ews" -Name maxReceivedMessageSize -value 209715200 -ErrorAction Stop
        }
    Write-Host "Finished the backend ews web http bindings"

    Write-Host "Starting with the back end ews request filtering"
    Set-WebConfigurationProperty -Filter "/system.webServer/security/requestFiltering/requestLimits" -PSPath "MACHINE/WEBROOT/APPHOST/Exchange Back End/ews" -Name maxAllowedContentLength -value 209715200 -ErrorAction Stop
    Write-Host "Finished the back end ews request filtering"

    Write-Host "Starting with the front end ews request filtering"
    Set-WebConfigurationProperty -Filter "/system.webServer/security/requestFiltering/requestLimits" -PSPath "MACHINE/WEBROOT/APPHOST/Default Web Site/EWS" -Name maxAllowedContentLength -value 209715200 -ErrorAction Stop
    Write-Host "Finished the front end ews request filtering" 

Is it technically better than appcmd?  Yes, of course, what did you think I was going to say?  It’s PS, of course it’s better than CMD.

As for how it works, I mean it’s pretty obvious, I don’t think there’s any good reason to go into a break down.  I took what MS did with AppCMD and just changed it to PS, with a foreach loop in the beginning to have even a little less code 🙂

You should be able to take this, and easily adapt it to other IIS based web.config settings.  My Get-WebConfigurationProperty in the very beginning, is a great way to explore any web.config via the IIS cmdlets.

Anyway, hope this helps someone.

***Update 07/29/2017:

So we did our exchange 2013 cu15 upgrade, and everything went well with the script, except for one snag.  My former script had an incorrect filter that added an “https” binding to an “http”  path.  EWS didn’t like that very much (as we found out the hard way).  Anyway, should be fixed now.  I updated the script.  Just so you know which line was affected you can see the before and after below.  Basically my original filter grabbed both the http and https transports.  I guess technically each web property has the potential for both.  My new filter goes after only https EWS configs + https transports.


#I changed this:

$AllBackendEWSCustomBindingsWebConfigProperties = Get-WebConfigurationProperty -Filter "system.serviceModel/bindings/custombinding/*/httpsTransport" -PSPath "MACHINE/WEBROOT/APPHOST/Exchange Back End/ews" -Name maxReceivedMessageSize -ErrorAction Stop | Where-Object {$_.ItemXPath -like "*EWS*"}

#To this

$AllBackendEWSCustomBindingsWebConfigProperties = Get-WebConfigurationProperty -Filter "system.serviceModel/bindings/custombinding/*/httpsTransport" -PSPath "MACHINE/WEBROOT/APPHOST/Exchange Back End/ews" -Name maxReceivedMessageSize -ErrorAction Stop | Where-Object {$_.ItemXPath -like "*EWS*https*/httpstransport"}