How to monitor GAP in AlwaysOn?

Good morning everyone! Today we will see how to monitor GAP in AlwaysOn to be able to react quickly and minimize the impact of recovering to the hardening point. This point is of crucial importance since it is the threshold where the log empties, causing the stop of other nodes due to it and leading to unexpected problems. More information about AlwaysOn at https://www.gpsos.es/2020/09/alta-disponibilidad-en-versiones-standard/

How to monitor GAP in AlwaysOn?

To monitor we will use the tables:

sys.dm_hadr_availability_replica_states
sys.dm_hadr_database_replica_states 
sys.availability_replicas

Crossing these tables we obtain the number of seconds of difference of synchronization by DB as well as the fate they sync against:

 ; WITH UpTime AS
  (
 SELECT DATEDIFF (SECOND, create_date, GETDATE ()) [upTime_secs]
 FROM sys.databases
 WHERE name = 'tempdb'
  ),
 AG_Stats AS 
  (
 SELECT AR.replica_server_name,
 HARS.role_desc, 
 Db_name (DRS.database_id)[DBName] , 
 CAST (DRS.log_send_queue_size AS DECIMAL (19,2)) log_send_queue_size_KB, 
 (CAST (perf.cntr_value AS DECIMAL (19,2)) / CAST (UpTime.upTime_secs AS DECIMAL (19,2))) / CAST (1024 AS DECIMAL (19,2)) [log_KB_flushed_per_sec]
 FROM sys.dm_hadr_database_replica_states DRS 
 INNER JOIN sys.availability_replicas AR ON DRS.replica_id = AR.replica_id 
 INNER JOIN sys.dm_hadr_availability_replica_states HARS ON AR.group_id = HARS.group_id 
 AND AR.replica_id = HARS.replica_id 
 --I am calculating this as an average over the entire time that the instance has been online.
 --To capture a smaller, more recent window, you will need to:
  --1. Store the counter value.
  --2. Wait N seconds.
  --3. Recheck counter value.
  --4. Divide the difference between the two checks by N.
 INNER JOIN sys.dm_os_performance_counters perf ON perf.instance_name = Db_name (DRS.database_id)
 AND perf.counter_name like 'Log Bytes Flushed / sec%'
 CROSS APPLY UpTime
  ),
 Pri_CommitTime AS 
  (
 SELECT replica_server_name
 , DBName
  ,  [log_KB_flushed_per_sec]
 FROM AG_Stats
 WHERE role_desc = 'PRIMARY'
  ),
 Sec_CommitTime AS 
  (
 SELECT replica_server_name
 , DBName
 --Send queue will be NULL if secondary is not online and synchronizing
 , log_send_queue_size_KB
 FROM AG_Stats
 WHERE role_desc = 'SECONDARY'
  )
 SELECT p.replica_server_name [primary_replica]
 , p.[DBName] ACE [DatabaseName]
 , s.replica_server_name [secondary_replica]
 , CAST (s.log_send_queue_size_KB / p.[log_KB_flushed_per_sec] AS BIGINT) [Sync_Lag_Secs]
 FROM Pri_CommitTime p
 LEFT JOIN Sec_CommitTime s ON [s].[DBName] = [p].[DBName]

The output will be following:

Once we have the seconds per DB, it is easy for us to add a sum and schedule a job so that when said sum is greater than 5 min (300) it jumps. We must change the output of the WITH with the following code:

declare @TEMPPGAP int;
SELECT sum (CAST (s.log_send_queue_size_KB / p.[log_KB_flushed_per_sec] AS BIGINT))[Sync_Lag_Secs] into ## TEMPPGAP
 FROM Pri_CommitTime p
 LEFT JOIN Sec_CommitTime s ON [s].[DBName] = [p].[DBName]
    
 set @ TEMPPGAP = (select[Sync_Lag_Secs] from ## TEMPPGAP)
    
 If (@TEMPPGAP> 3600)
 begin
EXEC msdb..sp_send_dbmail @ profile_name = 'Alert Manager',
 @ recipients = 'alerts @ gpsos.es',
 @ subject = 'GAP with replicas',
 @ body = 'Check nodes',
 @ query = 'select[Sync_Lag_Secs] from ## TEMPPGAP ',
 @query_result_header = 1,
 @query_no_truncate = 0
 end

From now on we will receive an email when our replicas are out of sync.

If you want us to help you with your environment, be it management or consulting, do not hesitate to contact us !! See you next time!!

Feel free to subscribe to our monthly newsletter so you don’t miss any of our publications with just one email per month.

Still do not know Query Performance ? Find out how it can help you in your Oracle environment. More information on their page LinkedIn .

Follow GPS on LinkedIn

More information about AlwaysOn at https://docs.microsoft.com/es-es/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server?view=sql-server-ver15

How to monitor GAP in AlwaysOn?

Leave a Reply