Once, while doing a load testing with my client, I was shown below error while taking log backups on secondary replica.
Msg 35295, Level 16, State 1, Line 1
Log backup for database “MyAppDB” on a secondary replica failed because the last backup LSN (0x0016ff11:00032258:0001) from the primary database is greater than the current local redo LSN (0x0016ffec:0006ee28:0001). No log records need to be backed up at this time. Retry the log-backup operation later.
Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.
This error was interesting because as per documentation we can take backup on the secondary replica in AlwaysOn configuration. I quickly checked AlwaysOn Dashboard from Management studio, all were looking green and databases were synchronized. Then I checked transaction log file (LDF) size and it was very big which made me think in different directions. Went to dashboard again and added Redo queue size. We found that REDO thread was blocked by a user query. As per Sys.dm_exec_requests we found that SPID 28 (which was showing DB STARTUP) was blocked by a user SPID 255. Then we used conventional way and found that SPID 255 was running SELECT INTO query for many hours. This was from a job which had a complex join with huge tables.
If you find the same error while taking log backups on secondary replica, then check REDO THREAD and check if its blocked. Based on criticality you need to take a decision to KILL the SPID which is blocking system SPID.
As soon as we killed user query, the “Recovery Queue” for that database started coming down. Once it came down to zero, we could take a transaction log backup on the secondary.
Have you seen REDO blocked in production environment? Do you monitor this by monitoring tool?
Reference:Pinal Dave (http://blog.SQLAuthority.com)