Azure Alert Aggregator


Well it’s been a while since I last wrote up what I’ve been up to in the world of Scripting.

Note: there are two full scripts in this post

Disclaimer: This script works for me in its entirety and I’m getting ready to productionalize it.  My plan is to have it in a Module eventually.  For now the script and logic is what I’m presenting in this post.

I was brought a problem  where alerts in Azure Fire to frequently and was asked if I could create some sort of logic to detect when an alert fires (via webhook) and then operate and evaluate that alert (aggregate the alert data).   The below illustrates the coded PowerShell Scripts.

AlertAggregator-Animation

So I Started this venture by exploring how a webhook works.  This video gave a lot of insight into how a webhook works in azure. Thanx @pcGeek86.  With this information I was then able to take one of my alerts and discover how to get them to fire to my Azure Automation:

2016-05-10 15_32_36-Edit Rule - Microsoft Azure

In the webhook area I put in the address of the web hook that I created for my Run Book that is in my automation account.

2016-05-10 15_34_49-Settings - Microsoft Azure One thing to note about webhooks. I worked several times on getting the gui to create the webhook only to find that it was never saved when using the gui so I had to use PowerShell to create the webhook for my runbook.  Your results may very.

I took the example from –> help New-AzureRmAutomationWebhook -Examples

Modified it slightly to expire in 2 years ((get-Date).addyears(2)) .

$Webhook = New-AzureRmAutomationWebhook -Name "AlertHook" -IsEnabled $True -ExpiryTime ((get-Date).addyears(2)) -RunbookName "ContosoRunbook" -ResourceGroup 
 "ResourceGroup01" -AutomationAccountName "AutomationAccount01" -Force

I didn’t need parameters since the only thing I really need is the webhook object. This is something that the alerts take care of for you.

Now that I have my webhook created I can now write some code to consume the webhook. So I began with this snippet of code:

param ( 
        [object]$WebhookData
    )

    # If runbook was called from Webhook, WebhookData will not be null.
    if ($WebhookData -ne $null) {   

        # Collect properties of WebhookData
        $WebhookName    =   $WebhookData.WebhookName
        $WebhookHeaders =   $WebhookData.RequestHeader
        $WebhookBody    =   $WebhookData.RequestBody
        $write-output $webhookdata

The first item I look at in the webhook data is the value for Status. If My status is RESOLVED then I don’t need to take any further action if its some other status then I drop into the next set of logic:

 If($WebhookBody.Status -ne 'Resolved')    
{}
 else
 {
 write-output 'State has been resolved'
 }

I have  the webhook I need the rest of the data for processnig from the webhook.  I can do this by converting the Json data into a powershell object

$WebhookBody = (ConvertFrom-Json -InputObject $WebhookBody)

Now that I know what is in my webhookbody I can now write out each of the values for each of the items in the object I’ll refer to the data from this Json parse as AlertContext or as AlertPayload (interchangeably):

 write-output "`nALERT CONTEXT DATA"
 write-output '==================='
 write-output "Name:`t`t $($AlertContext.name)"
 write-output "Subscriptionid:`t`t $($AlertContext.subscriptionId)"
 write-output "MetricName:`t`t $($AlertContext.condition.MetricName)"
 write-output "MetricValue:`t`t $($AlertContext.condition.metricvalue)"
 write-output "Threshold:`t`t $($AlertContext.condition.Threshold)"
 write-output "ResourceGroupName:`t`t $($AlertContext.resourceGroupName)"
 write-output "ResourceName:`t`t $($AlertContext.resourceName)"
 write-output "ResourceType:`t`t $($AlertContext.resourceType)"
 write-output "ResourceID:`t`t $($AlertContext.resourceId)"
 write-output "Timestamp:`t`t $($AlertContext.timestamp)"

Now that I have that detail now comes the harder part. This is where I spent 5 days out of this whole process.   The thing I learned is if you receive an error:

Run Login-AzureRmAccount to login

The first thing you should do is check the version of your azure modules in your automation account and ensure they are up to snuff.

Now that we have our azure modules up to date we can begin the process of logging in as an AD account.  You could use the Service principal but that may take quite a bit more setup and is outside the scope of this Blog article.

To do this you’ll need to create a Credential Asset in your automation account which is demonstrated here. Then we’ll use the Credential Asset in our automation with the  Get-AutomationPSCredential.  Once we have the credential asset in our runbook now we need to use Login-AzureRmAccount which is an alias for Add-AzureRmAccount.

$cred = Get-AutomationPSCredential -Name 'ThomSchumacher'
 write-output 'login to azure with automation account'
 Add-AzureRmAccount -Credential $cred
Set-AzureRmContext -subscriptionName 'Azure Testing' 

If you have more than one subscription you’ll need to make certain you change to the same subscription that your azure automation account is running in.  In the alert payload we get the following items, subscriptionid, PortalLink, AlertName, ResourceType, ResourceId, Alert Threshold, Alert WindowSize, & Alert Metric Name.

Since I know I need each one of these for scheduling the Azure automation. I’m going to create variables  that contain each of these items:

 $AlertContext = [object]$WebhookBody.context
 $credential ='ThomSchumacher'
 $SubscriptionId = $AlertContext.subscriptionId
 $portalLink = $AlertContext.portalLink
 $AlertName = $AlertContext.name
 $ResourceType = $AlertContext.resourceType
 $ResourceId = $AlertContext.resourceId
 $AlertMetric= $AlertContext.condition.Threshold
 $WindowSize= ($AlertContext.condition.windowsize) /2 +1
 $AlertDateTime = get-date
 $AlertMetricName = $AlertContext.condition.MetricName
 $counterType = ($AlertContext.condition.MetricName) -replace (' ','')
 $subscriptionName = 'Azure Testing' 
 $resourceId = ($AlertContext.resourceId) 
 $minutes = '1'
 $jobName = "CaptureAlerts-$counterType"
 $resoureGroupName = 'AzureTesting'
 $AutomationAccountName = 'AutomationAccountTest'
 $runbookName = 'check4Alerts'
 $description = "$counterType`: checkforAlerts job"

 $ht = @{}; Get-Variable -Name ('Subscriptionid','Portallink','alertname',`
'resourcetype','resourceid','alertmetric','alertdatetime','countertype') `
| foreach { $ht.Add($_.Name,$_.Value)}

Then I’ll create a $ht (hastable) object will be passed on to the runbook that will be scheduled.

 

Now I’ll need the rest of the values passed is what is our Alert WindowSize.   The Window size is actually the amount of time that you tell Azure to alert on your value to be watched see below:

windowSize

The values are 5, 10, 15, 30, 45, 1hour, 2hours, 3hours, 4hours, 5hours and 6hours.

I’ve decided for my purposes I’m going to stick to 45 minutes or less. So I’ll use that value to make certain I’m not over that value. If I’m over the 45 minute mark I’m not going to fire my automation.  One other thing I’ve found is that if you try and create a schedule for your runbook that is less than 5 minutes then Azure will not allow you to schedule it.

So I have two situations I must account for.

  1. Any windowsize that is 5 minutes we must run for at least 3 minutes. This would be half the sample size plus one minute. This situation we cannot setup a schedule for the runbook
  2. If the windowsize is greater than 45 minutes then write a message that the windowsize is greater than accepted values.
  if(($WindowSize -ge 6) -and ($WindowSize -le 45))

Now that I have the situations accounted for if the value is over 5 minutes I can create a schedule for my child run book which will actually check the alert values.  The first thing I check is to see if there is a jobname already defined.

Get-AzureRmAutomationSchedule -name $jobname -ResourceGroupName `
$resoureGroupName -AutomationAccountName $AutomationAccountName `
-Verbose -ErrorAction SilentlyContinue

If there is I must delete it as I have no way of rescheduling the job.  There is no commandlet for it at this time.

Remove-AzureRmAutomationSchedule -Name $jobName -ResourceGroupName`
 $resoureGroupName -AutomationAccountName $AutomationAccountName -Force

Now I know I can create a schedule and then associate(register) the schedule to my runbook. When I create the schedule I add the windowsize to my when I want the schedule to run… Which if you look at the variable earlier the size is set to the windowsize / 2 + 1.

 write-output "create a new adoc schedule $jobName"
 write-output "new-azurermAutomationschedule -Name $jobName -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -StartTime ((get-date).AddMinutes($windowSize)) -OneTime -Description $description "
 new-azurermAutomationschedule -Name $jobName -ResourceGroupName $resoureGroupName `
 -AutomationAccountName $AutomationAccountName `
-StartTime ((get-date).AddMinutes($windowSize)) -OneTime -Description $description 
 write-output "Adding the job schedule to the runbook $runBookName"
 Write-Output "register-AzureRmAutomationScheduledRunbook -RunbookName $runbookName -ScheduleName $jobName -Parameters $ht -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName"
 $newSchedule = register-AzureRmAutomationScheduledRunbook `
 -RunbookName $runbookName -ScheduleName $jobName `
-Parameters $ht -ResourceGroupName $resoureGroupName `
 -AutomationAccountName $AutomationAccountName -Verbose

I’ve chosen to call this script CaptureAlerts.ps1 here is the entire script:

#requires -Version 3 -Modules Azure, AzureAutomationAuthoringToolkit, AzureRM.Insights, AzureRM.Profile,AzureRm.Automation
param( [object]$WebhookData)

if ($WebhookData -ne $null)
 { 
 # Collect properties of WebhookData.
 $WebhookName = $WebhookData.WebhookName
 $WebhookBody = $WebhookData.RequestBody
 $WebhookHeaders = $WebhookData.RequestHeader
 write-output $WebhookData 
 # Information on the webhook name that called This
 write-output "This runbook was started from webhook $WebhookName."
 # Obtain the WebhookBody containing the AlertContext
 $WebhookBody = (ConvertFrom-Json -InputObject $WebhookBody) 
 write-output "`nWEBHOOK BODY"
 write-output '============='
 write-output "Status:`t`t $($WebhookBody.status)"
 If($WebhookBody.Status -ne 'Resolved') 
 {
 $AlertContext = [object]$WebhookBody.context
 $credential ='ThomSchumacher'
 $SubscriptionId = $AlertContext.subscriptionId
 $portalLink = $AlertContext.portalLink
 $AlertName = $AlertContext.name
 $ResourceType = $AlertContext.resourceType
 $ResourceId = $AlertContext.resourceId
 $AlertMetric= $AlertContext.condition.Threshold
 $WindowSize= ($AlertContext.condition.windowsize) /2 +1
 $AlertDateTime = get-date
 $AlertMetricName = $AlertContext.condition.MetricName
 $counterType = ($AlertContext.condition.MetricName) -replace (' ','')
 $subscriptionName = 'Azure Testing' 
 $resourceId = ($AlertContext.resourceId) 
 $minutes = '1'
 $jobName = "CaptureAlerts-$counterType"
 $resoureGroupName = 'AzureTesting'
 $AutomationAccountName = 'AutomationAccountTest'
 $runbookName = 'check4Alerts'
 $description = "$counterType`: checkforAlerts job"

 $ht = @{}; Get-Variable -Name ('Subscriptionid','Portallink','alertname','resourcetype','resourceid','alertmetric','alertdatetime','countertype') | foreach { $ht.Add($_.Name,$_.Value)}
 write-output "Check for the existence of the adhoc schedule for CaputureAlerts Runbook -- $jobname"
 write-output 'login to azure with automation account'

 #$Conn = Get-AutomationConnection -Name AzureRunAsConnection 
 #Add-AzureRMAccount -ServicePrincipal -Tenant $Conn.TenantID -ApplicationId $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint
 $cred = Get-AutomationPSCredential -Name 'ThomSchumacher'
 write-output 'login to azure with automation account'
 Add-AzureRmAccount -Credential $cred
 Set-AzureRmContext -subscriptionName 'Azure Testing'
 Write-output "Window Size must be between 10 and 45 minutes for this automation to work"
 if(($WindowSize -ge 6) -and ($WindowSize -le 45))
 {
 Write-Output "Get-AzureRmAutomationSchedule -name $jobname -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -Verbose -ErrorAction SilentlyContinue"
 if(Get-AzureRmAutomationSchedule -name $jobname -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -Verbose -ErrorAction SilentlyContinue)
 {
 Write-Output "Remove-AzureRmAutomationSchedule -Name $jobName -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -Force"
 Remove-AzureRmAutomationSchedule -Name $jobName -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -Force
 }
 write-output "create a new adoc schedule $jobName"
 write-output "new-azurermAutomationschedule -Name $jobName -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -StartTime ((get-date).AddMinutes($windowSize)) -OneTime -Description $description "
 new-azurermAutomationschedule -Name $jobName -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -StartTime ((get-date).AddMinutes($windowSize)) -OneTime -Description $description 
 write-output "Adding the job schedule to the runbook $runBookName"
 Write-Output "register-AzureRmAutomationScheduledRunbook -RunbookName $runbookName -ScheduleName $jobName -Parameters $ht -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName"
 $newSchedule = register-AzureRmAutomationScheduledRunbook -RunbookName $runbookName -ScheduleName $jobName -Parameters $ht -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -Verbose
 }
 else
 {
 Set-AzureRmContext -subscriptionName 'Azure Testing'
 write-output "subtrackting $windowsize Minutes"
 $AlertDateTime = ((get-date).AddMinutes(-($windowsize)))
 $ht = @{}; Get-Variable -Name ('Credential','Subscriptionid','Portallink','alertname','resourcetype','resourceid','alertmetric','alertdatetime','countertype') | foreach { $ht.Add($_.Name,$_.Value)}
 Write-Output "Starting $runbookName paramaters $ht"
 Write-Output "Start-AzureRmAutomationRunbook -Name $runbookName -Parameters $ht -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -Verbose"
 Start-AzureRmAutomationRunbook -Name $runbookName -Parameters $ht -ResourceGroupName $resoureGroupName -AutomationAccountName $AutomationAccountName -Verbose
 }
 }
 else
 {
 write-output 'State has been resolved'
 }
 }
 else 
 {
 Write-Error 'This runbook is meant to only be started from a webhook.' 
 }

Now on to the second script: 

What this script does is take values passed to it from the webhook above or from any other scripting.

The paramaters it requires are the same as what is defined in the hastable from the first script:

$ht = @{}; Get-Variable -Name ('Subscriptionid'`
,'Portallink','alertname','resourcetype','resourceid','alertmetric'`
,'alertdatetime','countertype') | foreach { $ht.Add($_.Name,$_.Value)}

Again we’ll have to get a credential and login:

To do this you’ll need to create a Credential Asset in your automation account which is demonstrated here. Then we’ll use the Credential Asset in our automation with the  Get-AutomationPSCredential.  Once we have the credential asset in our runbook now we need to use Login-AzureRmAccount which is an alias for Add-AzureRmAccount.

$cred = Get-AutomationPSCredential -Name 'ThomSchumacher'
 write-output 'login to azure with automation account'
 Add-AzureRmAccount -Credential $cred
Set-AzureRmContext -subscriptionName 'Azure Testing' 

If you have more than one subscription you’ll need to make certain you change to the same subscription that your azure automation account is running in.

Now we’ll do some calculations based on the data passed to this runbook.  First thing we need to do is get the current time so we can inform our user what the end date of the schedule is and this is the value that we use to tell the scripting the end time as well.

$nowTime = get-date

Since we have the time we’ll need to get the metrics for the object passed to this Script by using the Get-AzureMetric Cmdlet.

 Write-output "(((Get-AzureRmMetric -ResourceId $resourceId -StartTime $AlertDateTime -EndTime (get-date) -TimeGrain (new-timespan -Minutes 1)).where{$_.name -eq $counterType}).metricvalues).total"
 $numberOfCounters = (((Get-AzureRmMetric -ResourceId $resourceId `
 -StartTime $AlertDateTime -EndTime $nowTime `
-TimeGrain (new-timespan -Minutes 1)).where{$_.name -eq $counterType}).metricvalues).total

The value of $numberofCounters is equal to the number of counters for the given metric.

The Get-AzureMetric values are updated every minute this is why the script is designed around minutes.  Now what we can do is calculate the number of counters that are greater than our MetricValue or what is also known as Threshold in the Azure blades. Now we need to find out what half the measurement is using the variable $halfMetric.  This is half the count of $numberOfCounters.

 $overMetric = ($numberOfCounters.where{$PSItem -gt $metricValue}).count
 $halfMetric = [math]::Floor( $numberOfCounters.count /2) 
 Write-output "calculating the number of metrics collected to see if we were over for half the metric time for MetricVslue: $metricValue"
 Write-output "half the metrics rounded down: $halfmetric"
 Write-output "Number of metrics that were over the metrice value: $overmetric"

If the value of $overmetric is greater than $halfmetric then we are going to send an email.

For the purposes of my design I used Send-Grid

Write-output 'Alert would be thrown'
 Write-output 'If alertstatus variable is defined set the email flag to true.'
 Write-output "MetricValue: $metricValue Number of Alerts over the metric ------: $overmetric "
 $message = "The following alert $($AlertName) has exceeded the AlertThreshold $($Alertmetric). It Exeeded it $($overMetric) times. Alert First triggered $(($alertDateTime).DateTime) -- Ending measurement Time $(($nowTime).DateTime). Portal Address for this alert $($Portallink)" 
 Write-output $message
 Write-Output $AlertName 
 $sendGridApiUser = 'your api user'
 $sendGridApiPwd = 'yourPassword'
 $sendGridApiKey = 'yourkey'
 $from = 'someemailIpicked@ok.com'
 $to = Get-AutomationVariable -Name 'EmailAddress'
 $subject ="Alert From $Alertname - Exceeded threshold $overmetric Times - ResourceType: $resourceType"
 [uri]$sendGridApi = 'https://api.sendgrid.com/api/mail.send.json'
 write-output "building To list $to"
 if($to.Contains(',') -or ($to.countains("'")) -or ($to.countains('"')))
 {
 write-output 'To: removing uneeded chars and putting in Array'
 $to= $to -replace ("'",'')
 $to= $to -replace ('"','')
 $to = $to.split(',')
 } 
 foreach($t in $to)
 {
 $mailto +="&to=$t"
 }
 $SendGridPost = "api_user=$sendGridApiUser&api_key=$sendGridApiPwd&to=$mailto&subject=$subject&text=$message&from=$from"
 Invoke-RestMethod -Method post -Uri $sendGridApi -Body $SendGridPost 
 }

Here is that script in its entirety

Param
 (
 [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
 [string]$SubscriptionId, 
 [Parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
 [string]$portalLink, 
 [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
 [string]$alertName, #CPUHigh ApiDev
 [Parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
 [string]$resourceType, #microsoft.web/serverfarms
 [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
 [string]$resourceId, 
 [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
 [string]$alertMetric, #2 aka threshold
 [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
 [datetime]$AlertDateTime, #
 [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
 [string]$counterType #CPUPercentage
 )
 $cred = Get-AutomationPSCredential -Name 'ThomSchumacher'
 write-output 'login to azure with automation account'
 Add-AzureRmAccount -Credential $cred
 $write
 if($cred)
 {
 $nowTime = get-date
 write-output 'login to azure with automation account'
 Set-AzureRmContext -subscriptionName 'Azure Testing'
 Write-output "(((Get-AzureRmMetric -ResourceId $resourceId -StartTime $AlertDateTime -EndTime (get-date) -TimeGrain (new-timespan -Minutes 1)).where{$_.name -eq $counterType}).metricvalues).total"
 $numberOfCounters = (((Get-AzureRmMetric -ResourceId $resourceId -StartTime $AlertDateTime -EndTime $nowTime -TimeGrain (new-timespan -Minutes 1)).where{$_.name -eq $counterType}).metricvalues).total
 $overMetric = ($numberOfCounters.where{$PSItem -gt $metricValue}).count
 $halfMetric = [math]::Floor( $numberOfCounters.count /2) 
 Write-output "calculating the number of metrics collected to see if we were over for half the metric time for MetricVslue: $metricValue"
 Write-output "half the metrics rounded down: $halfmetric"
 Write-output "Number of metrics that were over the metrice value: $overmetric"
 Write-output "Cpu percentage numbers: $cpuPercentage"
 if($overMetric -gt $halfmetric) #need to throw email if the number of overmetrics is greater than the under metric
 { 
 Write-output 'Alert would be thrown'
 Write-output 'If alertstatus variable is defined set the email flag to true.'
 Write-output "MetricValue: $metricValue Number of Alerts over the metric ------: $overmetric "
 $message = "The following alert $($AlertName) has exceeded the AlertThreshold $($Alertmetric). It Exeeded it $($overMetric) times. Alert First triggered $(($alertDateTime).DateTime) -- Ending measurement Time $(($nowTime).DateTime). Portal Address for this alert $($Portallink)" 
 Write-output $message
 Write-Output $AlertName 
 $sendGridApiUser = 'your api user'
 $sendGridApiPwd = 'yourPassword'
 $sendGridApiKey = 'yourkey'
 $from = 'someemailIpicked@ok.com'
 $to = Get-AutomationVariable -Name 'EmailAddress'
 $subject ="Alert From $Alertname - Exceeded threshold $overmetric Times - ResourceType: $resourceType"
 [uri]$sendGridApi = 'https://api.sendgrid.com/api/mail.send.json'
 write-output "building To list $to"
 if($to.Contains(',') -or ($to.countains("'")) -or ($to.countains('"')))
 {
 write-output 'To: removing uneeded chars and putting in Array'
 $to= $to -replace ("'",'')
 $to= $to -replace ('"','')
 $to = $to.split(',')
 } 
 foreach($t in $to)
 {
 $mailto +="&to=$t"
 }
 $SendGridPost = "api_user=$sendGridApiUser&api_key=$sendGridApiPwd&to=$mailto&subject=$subject&text=$message&from=$from"
 Invoke-RestMethod -Method post -Uri $sendGridApi -Body $SendGridPost 
 }
 else
 {
 Write-output 'no alert needed' 
 Write-output "MetricValue: $metricValue Number of Alerts with the metric: $overmetric"
 $stop = $true
 } 
 }

 

It was a long haul but worth it I hope this Blog Post helps someone else out in their quest to keep alerting down from azure.

 

Until then

 

Keep Scripting

 

Thom

One thought on “Azure Alert Aggregator

Leave a comment