Thursday, September 15, 2011

Capturing CPU Trends with PowerCLI

Inspired by the creeping CPU that we see in Linux guests and helped greatly by @BoerLowie at his blog, I’ve come up with a little PowerCLI to capture CPU trends of the top consumers per cluster.
This is my first cut and will likely see changes over time, like any script should. HTML output and emailed results are the most likely candidates.
The script should be fairly self-explanatory. For each cluster, traverse all VMs and get their OverallCpuUsage (the number that you see in the vSphere Client when selecting a cluster and then the Virtual Machines tab).  Take the top X consumers based on that number and get their average CPU usage performance statistic for N days back in time and compare it to today’s.
The output looks something like this:
CPU-Trend


So here you go:
#
#  Produce guest CPU trending from a time period back versus a shorter 
#  more immediate time frame.  e.g. 30 days ago versus past 2 days.
#
param(
    [string] $vCenter
)
 
$DaysOld = -30        # compare to full day stats this many days back
$DaysRecent = -1    # get stats for this many recent days.
$GetTop = 10        # look at top x CPU consumers
 
Add-PSSnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue
 
#if ($vCenter -eq "") {
#    $vCenter = Read-Host "VI Server: "
#}
 
#if ($DefaultVIServers.Count) {
#    Disconnect-VIServer -Server * -Force -Confirm:$false
#}
#Connect-VIServer $vCenter
 
$AllClusters = Get-Cluster
 
Foreach ($Cluster in $AllClusters) {
    Write-Host "`n$($Cluster.Name)"
    
    $VMs = Get-Cluster $Cluster | Get-VM | `
        Where-Object { $_.PowerState -eq "PoweredOn" }
    $NumVMs = $VMs.Count
    
    # Get the Overall CPU Usage for each VM in the cluster.  Then cap that 
    # list at the top $GetTop highest for Overall CPU Usage
    $vm_list = @()
    $Count = 0
    Foreach ($vm in $VMs)
    {
        $Count += 1
        Write-Progress -Activity "Getting VM views" -Status "Progress:" `
            -PercentComplete ($Count / $NumVMs * 100)
            
        # the vSphere .Net view object has the OverallCpuUsage 
        # (VirtualMachineQuickStats)
        # http://www.vmware.com/support/developer/vc-sdk/visdk400pubs/ReferenceGuide/vim.vm.Summary.QuickStats.html
        $view = Get-View $vm
        
        $objOutput = "" | Select-Object VMName, CpuMhz
        $objOutput.VMName = $view.Name
        $objOutput.CpuMhz = $view.Summary.QuickStats.OverallCpuUsage
        $vm_list += $objOutput
    }
    # Reduce to our Top X
    $vm_list = $vm_list | sort-object CpuMhz -Descending | select -First $GetTop 
        
    #
    # For each of those VMs, get the statistics for past and current CPU usage
    $NumVMs = $vm_list.Count
    $Out_List = @()
    $Count = 0
    Foreach ($vm in $vm_list)
    {
        $Count += 1
        Write-Progress -Activity "Compiling CPU stats" -Status "Progress:" `
            -PercentComplete ($Count / $NumVMs * 100)
            
           [Double] $ldblPerfAged = (Get-Stat -Entity $vm.VMName -Stat cpu.usage.average `
            -Start $((Get-Date).AddDays($DaysOld)) `
            -Finish $((Get-Date).AddDays($DaysOld + 1)) -ErrorAction Continue | `
            Measure-Object -Average Value).Average
        
        If ($ldblPerfAged -gt 0) {
               [Double] $lblPerfNow = (Get-Stat -Entity $vm.VMName -Stat cpu.usage.average `
                -Start $((Get-Date).AddDays($DaysRecent)) `
                -ErrorAction Continue | Measure-Object -Average Value).Average
            [Int] $lintTrend = (($lblPerfNow - $ldblPerfAged) / $ldblPerfAged) * 100
        
            $objOutput = "" | Select-Object VMName, CpuMhz, PerfAged, PerfNow, Trend
            $objOutput.VMName = $vm.VMName
            $objOutput.CpuMhz = $vm.CpuMhz
            $objOutput.PerfAged = "{0:f2}%" -f $ldblPerfAged
            $objOutput.PerfNow = "{0:f2}%" -f $lblPerfNow
            $objOutput.Trend = "{0}%" -f $lintTrend
        
            $out_list += $objOutput
        }
    }
 
    # Spit 'er out
    Write-Host "Top CPU Consumers Trending, $($DaysOld) days vs today`n"
    $out_list | Format-Table -Property VMName, `
        @{Expression={$_.CpuMhz};Name='CPU Mhz';align='right'}, `
        @{Expression={$_.PerfAged};Name='CPU Aged';align='right'}, `
        @{Expression={$_.PerfNow};Name='CPU Now';align='right'}, `
        @{Expression={$_.Trend};Name='Trend';align='right'}
}

Wednesday, September 7, 2011

Linux Guest CPU Creep

We run a lot of tiny VMs on vSphere 4 in a rather unique environment.  The densities are high and the kernel OS is officially unsupported Fedora Core 8 (2.6.26 kernel). This causes us to be more tolerant of aberrations.

The biggest aberration of note has been CPU creep.  The tiny guests will run along just fine using 30 - 40 MHz of CPU and then start a slow upward trend.  It will creep slowly over the course of a week.  No useful perspective can be gained from within the guest using traditional means.  More interesting, performing a guest-initiated reboot will reveal a slow crawl all the way through the BIOS at boot and no CPU dip beyond the new baseline.  They are stuck, and a reset from the vSphere client resolves the issue.

This has been acceptable so far.  The guests are stateless, only a few are impacted at any one time, and no one guest is critical by itself.  We automated the remediation, became accustomed, and moved on.  The issue has stuck to one functional cluster and persisted across minor vSphere 4 upgrades.

Becoming accustomed caused us to miss another occurrence.

The software architects have been busy troubleshooting the core application running in a separate vSphere cluster on Ubuntu Server 8.04 LTS (2.6.24 kernel).  CPU has been creeping slowly up for the past couple of months with a marked recent acceleration. We’ve been attributing it to increased load as we grow.  The software was optimized and the CPU remained steady and on its upward path.

MQ-Creeping

The solution:

Stop all running processes, verify a higher than expected CPU load, and reset the VM.  We’re down substantially.

MQ-Creeping2

In a small shop with few resources and too many projects, it’s time to implement trending alerts.

Have you experienced this behavior before?

Friday, August 26, 2011

What a Rush

I’m not even there yet and I’m excited.  VMworld 2011 is upon us. 

I’ve had the pleasure of attending VMworld since 2007, when I was just getting into virtualization and wanted to learn more.  VMware Workstation was it for me, but I wanted to learn and prepare for server virtualization.  We had no suitable servers and no shared storage.

Fast forward to this year where my team manages a heavily virtualized environment with very high densities including a unique business case. VMworld was a major destination on the path.

Sure, it was all achievable without.  But I’d venture to say that a proper course includes VMworld. 

The ability to learn at VMworld hit new highs last year with course content, Solutions Center, and the excellent Hands on Labs.  This year can only be better.

That said, one of the biggest opportunities is to network with your peers.  I missed out on much of that the first two VMworlds that I attended, and that was a mistake.  Get out there and meet your peers.  You’ll find a vast pool of great folks that can relate to your virtualization and cloud efforts and others that will blow you away with endless possibilities in real world scenarios.

Will you be there?  Please come up, say hello, and forgive me if I can not quickly connect your name to your Twitter handle or the discussion that we had a couple of years back.

I look forward to seeing you there!

Friday, June 10, 2011

Tech Field Day 6, Day 1 - Outsider In

Technically it is now day 2 of Gestalt IT’s fantastic Tech Field Day—but let’s not split hairs, shall we?

If you are unfamiliar with Tech Field Day, please follow the links and learn. If you want the short summary:  a small group of independent peers afforded the opportunity to talk to vendors in their space while being exposed to internal details and future direction of the their products.

Conceived and orchestrated by The Man Who Does Not Sleep, Stephen Foskett and the Uber1-Fantastic Claire Chaplais, delegates are elected by their peers.  It is an honor to be selected and TFD6 has been my honor.

Today brought us VKernel and their vOperations Suite, VMware with Mobile Virtualization Platform, Site Recovery Manager and Student Cloud, and SolarWinds hot off their acquisition of Hyper9. We closed out the day with many of our virtualization peers at Beantown Party as a Service (BPaaS) held at the EMC Club in Fenway Park.  What a ride!

I’m the verbally quiet camera-shy guy that tries to tweet a lot while consuming all that he can.  Please follow along with all of the delegates--there’s plenty more to come on day 2.

1 “Uber” attribution copyright @lynxbat

Saturday, May 21, 2011

Three Lines, One Conduit

This one is a brief food-for-thought post.  I work for an SMB, with the focus on “S” during a large part of my tenure. Things come fast in this environment.  Requirements can be loosely defined and based on confidence in the key individuals, not their experience.  Timelines are short with most implementations starting same-day.

You know that you need connectivity and you know that it needs to be diverse.  If you’re truly SMB, you’ve started with one Internet connection and it has suited you well—until it fails or your need more bandwidth.  Add a 2nd, and you’re sure to choose an alternate provider and an alternate path into your site.  Repeat with the 3rd, throw some wonderful hardware at it, and enjoy your resiliency.

Or so you think. 

Do you know where your feeds go after they leave your building?  And if you have this knowledge, how far does it carry?  Around the corner, down the block, or a mile out? 

We found this information hard to obtain and it cost us.  The city relegated some of the routes and they therefore became common.  3 feeds with three entry points over 2 buildings ultimately shared the same conduit a short 4 blocks down the street.

A construction mishap brought this realization to light.  We had wireless at our disposal, but our hardware needed a software upgrade and complete reprogramming under a new paradigm before it could be used. This would take hours, and in the end there was little that we could do. While bad, we fared better than many who waited a week for all of the fiber and copper to be spliced back together.

Know your paths, beyond your site.

Wednesday, May 18, 2011

The Great Enabler - Intro

"Choose a job you love, and you will never have to work a day in your life." –Confucious

I am fortunate in my career.  I work within a field that started and continued as a hobby—something that I love to do.  Driven by an interest and a knack, skill gets to follow along for the ride.  I believe that what makes me most fortunate, however, is the continual opportunity to make an impact.  To be rewarded with the ability to see thoughts and ideas materialize and change the tools, capabilities, or even manner at which we do business.

My fondest example is virtualization.  I’ve had the fortune to see my interest in articles read in 2007 flow from a casual conversation with my manager to relocation of our headquarters and main datacenter.  The road was paved with VMworld 2007 through 2010, virtual IT labs, developers’ stations, virtualized tier 3 workloads that soon became tier 1 workloads, first exposure to SANs, and a whole lot of peers, friends, and fun.

We’ve moved from 0% to 98% virtualized.  99% is around the corner with Private Cloud (there, I said it) in sight. 

We’re small.  Resources are tight.  Time has always been tighter.  We learned, we stumbled, we learned again.  I wanted to capture much of the most recent leg of the journey, so here we are.

I hope that you may find some of this interesting.  There’s more to come.