Archive
Monitoring and Alerting for Availability Groups and Why I Love PASS Summit
I have returned from my eighth PASS Summit and, as in years past, they just keep getting better. I am amazed and humbled that I am a part of the PASS community. PASS has given me the opportunity to meet and learn from many of the best and brightest in the SQL world. When I meet first-timers, I tell them to talk to people, no matter who they are. If you share an interest, they will share with you. If you have never been to PASS Summit, I recommend it. If you cannot make Summit, get involved with your local PASS Chapter or SQL Saturday. Virtual Chapters are another great way to find out what PASS has to offer.
One of the highlights of this year's Summit was Shawn Meyers' (t) presentation on Monitoring and Alerting of Availability Groups. (You will need to be logged in to the PASS website to access the link.) I have worked with Availability Group since its release in SQL Server 2012 and Shawn provided the best solution for monitoring and alerting.
Shawn suggested implementing three alerts:
- 1480 - AG Role Change
- 35264 - AG Data Movement Suspended
- 35265 - AG Data Movement Resumed
Additionally, Shawn provided an MSDN link with recommendations for using Policy-Based Management (PBM) to monitor AGs. The custom policies monitor the recovery time objective (RTO) and recovery point objective (RPO) for your availability groups. Microsoft recommends 600 seconds, or 10 minutes, for RTO, and 3600 seconds, or 60 minutes, for RPO. You can set your own values when defining the policy conditions.
I have scripted out the PBM conditions, policies, and SQL Server Agent alerts. The conditions and policies can be used as-is, because they use system schedules. The alerts need to updated to replace the TestOperator with an SQL Agent operator of your own.
I hope you can make use of these scripts. And I encourage you to get involved with PASS. You will be amazed at what it can do for you.
Joining sys.dm_exec_requests to sp_whoisactive results
Previously, I've blogged about returning formatted estimated_completion_time values from sys.dm_exec_requests and getting sortable output from sp_whoisactive. In this post I combine the two, joining the two result sets together. In my environment, we use Ola Hallengren's Maintenance Solution to run our backups. Because the backup job defaults to run backups for all user databases, it can be difficult to tell which database is currently being backed up.
sp_join_whoisactive_execrequests
This procedure takes the output from sp_whoisactive and joins it to sys.dm_exec_requests on session_id, returning the estimated completion time and current backup statement in a single pass. Here is a sample result set from the procedure (Click to Enlarge):
The sql_command column contains the full command being run, in this example Ola Hallengren's DatabaseBackup procedure. The sql_text column contains the actual SQL statement being run, BACKUP DATABASE [AdventureWorks2012] TO DISK.... The DatabaseBackup procedure runs database backups in alphabetical order based on the database name, so this allows tracking a job progress.
Many thanks to Adam Machanic (b|t) for sp_whoisactive, Ola Hallengren (b|t) for his Maintenance Solution, and Aaron Bertrand (b|t) for his advice on formatting the estimated completion time more elegantly. I finally had an opportunity to use it.
I hope you can put this procedure to good use. Let me know if you have any questions.