Archive

Archive for the ‘Availability Groups’ Category

PowerShell Function to Automate Availability Group Failover

November 18, 2017 1 comment

I’ve been working with Availability Groups (AG) since their release in SQL Server 2012 and have always wanted to leverage PowerShell to administer them.  Recently I received a request to develop an automated process for failing Availability Groups over gracefully prior to server patching.  Believe it or not, but a hard shutdown of the primary replica is NOT the best way to force AG failover.

The function takes a replica name as input and queries system tables for Availability Groups running as secondary that are online, healthy, and synchronous.  For each AG found, the function generates an ALTER AVAILABILITY GROUP statement.  If the -noexec parm is set to 0, the command will be executed.  If -noexec is set to 1, the command will be written out to a file.

When writing the function, I started out trying to use the native PowerShell Availability Group cmdlets.  After several false starts, I found it easier to develop the T-SQL code in Management Studio and use Invoke-Sqlcmd to execute the code.  The code is available below.  I hope you can put it to use.


<#
Author: Frank Gill
Date: 2017-11-17
#>
function Invoke-AgFailover {
<#
.SYNOPSIS
Checks specified instance for healthy, synchronous Availability Groups running as secondary and
fails them over
.DESCRIPTION
Checks the instance passed in for healthy, synchronous Availability Groups running as secondary and
fails them over.  If the instance is not hosting secondary replicas, a message will be output.
If there are AGs running as secondary, a message will output for each, including AG name, destination,
and failover duration.
.EXAMPLE
Invoke-AgFailover -Instance YourSecondaryInstance -NoExec 0;
Any Availability Groups running as secondary on YourSecondaryInstance will be failed over.
.EXAMPLE
Invoke-AgFailover -Instance YourSecondaryInstance -NoExec 1;
If Availability Groups are running as secondary on YourSecondaryInstance, T-SQL commands for each AG failover
will be generated and written to C:\AGFailover\failover_YourAgName_YYYYMMDD_HHMMSS.sql.
.PARAMETER Instance
The instance to check for secondary replicas.
.PARAMETER NoExec
Set to 1 to generate T-SQL script for failover.
#>
[CmdletBinding()]
param
(
[Parameter(Mandatory=$True,
Position = 1,
ValueFromPipeline=$True,
ValueFromPipelineByPropertyName=$True,
HelpMessage='Which instance do you want to check for secondary replicas?')]
[Alias('secondaryinstance')]
[string[]]$instance,

[Parameter(Mandatory=$True,
Position = 2,
ValueFromPipeline=$True,
ValueFromPipelineByPropertyName=$True,
HelpMessage='Set to 1 if you want to execute the database restore.  Otherwise ')]
[Alias('dontrun')]
[string]$noexec
)
process
{
<# If the $noexec is set to 1, create file path to hold out files #>
if($noexec -eq 1)
{
$rundate = Get-Date -Format yyyyMMdd_HHmmss;
$outpath = "C:\AgFailover";
if((Test-Path -Path $outpath) -eq $true)
{
Remove-Item -Path "$outpath\*" -Recurse;
}
else
{
New-Item -Path $outpath -ItemType Directory;
}
}

<# Create query to check for failover-eligible AGs #>
$query = "SELECT g.[name], ar.replica_server_name
FROM sys.dm_hadr_availability_replica_states r
INNER JOIN sys.availability_replicas ar
ON ar.group_id = r.group_id
AND ar.replica_id = r.replica_id
INNER JOIN sys.availability_groups g
ON g.group_id = ar.group_id
WHERE r.role_desc = N'SECONDARY'
AND r.recovery_health_desc = N'ONLINE'
AND r.synchronization_health_desc = N'HEALTHY'
AND ar.availability_mode_desc = N'SYNCHRONOUS_COMMIT';"

<# Execute failover-eligible query #>
$secondaries = Invoke-Sqlcmd -ServerInstance "$instance" -Database master -Query $query;

<# Output message if there are no failover-eligible AGs #>
if($secondaries.Count -eq 0)
{
Write-Output "There are no Availability Group replicas available to fail over to $instance."
}

<# If eligible AGs exist, loop through them #>
foreach($secondary in $secondaries)
{
$secreplica = $secondary.replica_server_name;
$ag = $secondary.name;

$query = "ALTER AVAILABILITY GROUP $ag FAILOVER;"

<# If $noexec is set to 0, execute the AG failover
and output a message when complete #>
if($noexec -eq 0)
{
$starttime = Get-Date;
Invoke-Sqlcmd -ServerInstance "$instance" -Database master -Query $query;
$endtime = Get-Date;
$duration = (New-TimeSpan -Start $starttime -End $endtime).Seconds;
Write-Output "Failed Availability Group $ag to replica $instance in $duration seconds";
}
<# If $noexec is not set to 0, write a file out to the path built above for each AG #>
else
{
$comment = "/* Run against instance $instance */" ;
$comment | Out-File -FilePath "$outpath\failover_$ag`_$rundate.sql" -Append;
$use = "USE master;";
$use | Out-File -FilePath "$outpath\failover_$ag`_$rundate.sql" -Append;
$query | Out-File -FilePath "$outpath\failover_$ag`_$rundate.sql" -Append;
}
}
}
}

Monitoring and Alerting for Availability Groups and Why I Love PASS Summit

October 31, 2016 Leave a comment

I have returned from my eighth PASS Summit and, as in years past, they just keep getting better.  I am amazed and humbled that I am a part of the PASS community.  PASS has given me the opportunity to meet and learn from many of the best and brightest in the SQL world.  When I meet first-timers, I tell them to talk to people, no matter who they are.  If you share an interest, they will share with you.  If you have never been to PASS Summit, I recommend it.  If you cannot make Summit, get involved with your local PASS Chapter or SQL Saturday.  Virtual Chapters are another great way to find out what PASS has to offer.

One of the highlights of this year’s Summit was Shawn Meyers’ (t) presentation on Monitoring and Alerting of Availability Groups. (You will need to be logged in to the PASS website to access the link.)  I have worked with Availability Group since its release in SQL Server 2012 and Shawn provided the best solution for monitoring and alerting.

Shawn suggested implementing three alerts:

  • 1480 – AG Role Change
  • 35264 – AG Data Movement Suspended
  • 35265 – AG Data Movement Resumed

Additionally, Shawn provided an MSDN link with recommendations for using Policy-Based Management (PBM) to monitor AGs.  The custom policies monitor the recovery time objective (RTO) and recovery point objective (RPO) for your availability groups.  Microsoft recommends 600 seconds, or 10 minutes, for RTO, and 3600 seconds, or 60 minutes, for RPO. You can set your own values when defining the policy conditions.

I have scripted out the PBM conditions, policies, and SQL Server Agent alerts.  The conditions and policies can be used as-is, because they use system schedules.  The alerts need to updated to replace the TestOperator with an SQL Agent operator of your own.

I hope you can make use of these scripts.  And I encourage you to get involved with PASS. You will be amazed at what it can do for you.

agmonitoringandalertingscripts

Fun With READ_ONLY_ROUTING_LIST Syntax

July 13, 2016 Leave a comment

I finally am getting to work with SQL Server 2016 and spent the past two days building out an Availability Group to test the new read-only replica functionality.  This allows connections with an ApplicationIntent of ReadOnly to round-robin between all instances in the READ_ONLY_ROUTING_LIST, load balancing the read-only activity.

For my test, I built out three replicas and set all three to be readable secondaries.  Using Microsoft’s documentation on configuring read-only routing, I coded the following:

ALTER AVAILABILITY GROUP SKREEBYHA
MODIFY REPLICA ON 'AGNODE1\SKREEBYAG'
WITH
(
PRIMARY_ROLE
(
READ_ONLY_ROUTING_LIST=('AGNODE2\SKREEBYAG','AGNODE3\SKREEBYAG')
)
);

To test the new setup, I used the SQLCMD examples from Eric Russo’s blog post on Load Balancing 101.  Running the following command:

SQLCMD -S SKREEBYLIST -K ReadOnly -d testdb -Q "SELECT @@SERVERNAME"

returned AGNODE2\SKREEBYAG in each case. I could not figure out what I was doing wrong. After much frustration, I noticed the routing check query at the bottom of Eric’s post:

SELECT ar_Primary.replica_server_name 'Primary',
ar_secondary.replica_server_name 'Route_to',
ror.routing_priority 'Priority'
FROM sys.availability_read_only_routing_lists ror
INNER JOIN sys.availability_replicas ar_Primary
ON ror.replica_id = ar_Primary.replica_id
INNER JOIN sys.availability_replicas ar_secondary
ON ror.read_only_replica_id = ar_secondary.replica_id
INNER JOIN sys.availability_groups ag
ON ag.group_id = ar_Primary.group_id
WHERE ag.name = 'SKREEBYHA'
ORDER BY 1,3

Running it returned the following results:

RoutingListResults

So, in technical terms, something was screwy.  Looking at Eric’s post again, I noticed his third example for setting a routing list:

ALTER AVAILABILITY GROUP SQL2016AG
MODIFY REPLICA ON 'MyNode3'
WITH
(
PRIMARY_ROLE
(
READ_ONLY_ROUTING_LIST= (('MyNode4','MyNode1','MyNode2'))
)
)

Notice the second set of parentheses around the READ_ONLY_ROUTING_LIST value. Based on this, I changed my code to:

ALTER AVAILABILITY GROUP SKREEBYHA
MODIFY REPLICA ON 'AGNODE1\SKREEBYAG'
WITH
(
PRIMARY_ROLE
(
READ_ONLY_ROUTING_LIST=(('AGNODE2\SKREEBYAG','AGNODE3\SKREEBYAG'))
)
);

and, voila, the routing list check query returned the following:

RoutingListResultsFixed

Rerunning the SQLCMD test, the read-only connections round-robined between AGNODE2 and AGNODE3.

Many thanks to Eric Russo (b|t) for a great blog post.  I hope this saves you some frustration when working with read-only routing.