Sunday, 9 October 2011

When is a SQL backup not a SQL backup?

Like many, I back up SQL databases to a disk area every evening, to be picked up by tape backup software (Netbackup) in the early hours of the next day.  It's a straightforward disk-to-disk-to-tape system.

Monitoring of the disk-to-disk part is simple: I use a simple SQL script to determine whether databases have been backed up in the last 24 hours. The query looks something like this:

SELECT name,
MAX (backup_finish_date) as last_backup_date FROM
(SELECT * from master.sys.databases
WHERE state_desc !='OFFLINE'
AND name NOT IN (select secondary_database from msdb.dbo.log_shipping_secondary_databases)
AND name != 'tempdb') SD
LEFT OUTER JOIN
(SELECT database_name, backup_finish_date from msdb..backupset where type = 'D') bs
ON bs.database_name = sd.name
WHERE BS.Is_Snapshot <> '1'
GROUP BY name
HAVING MAX(backup_finish_date) < GETDATE()

I encapsulate this into an stored procedure, which then runs through a SQL Agent Job and sends the results out through database mail. It's crude, so I'm moving to a solution that uses the Policy Based Management framework, but I'll come back to that later.

I created a fresh SQL 2008 R2 instance on a Windows 2008 R2 server, running as a VMWare virtual machine. This is the first time I'd run SQL in a virtual machine; all my other SQL Servers are physical.

The virtual machines are backed up using Netbackup for VMWare 7.1. This performs a 'whole-VM' backup every night. I hadn't yet configured the disk-to-disk SQL backups, but my backup script was reporting that databases were being backed up - and at around the same time as the 'whole-VM' backup was running. Could Netbackup for VMWare be doing something unexpected?


A closer look at the 'backupset' and 'backmediafamily' tables give some clues. Here is the query:

SELECT
msdb.dbo.backupset.database_name,
msdb.dbo.backupset.backup_start_date,
msdb.dbo.backupset.backup_finish_date,
msdb.dbo.backupmediafamily.device_type,
msdb.dbo.backupset.is_snapshot,
msdb.dbo.backupset.name AS backupset_name
FROM
msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset
ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE (CONVERT(datetime, msdb.dbo.backupset.backup_start_date, 102) >= GETDATE() - 7)
ORDER BY msdb.dbo.backupset.database_name, msdb.dbo.backupset.backup_finish_date 

On my new virtual servers, this returns rows for master, model and msdb with device_type = 7 and is_snapshot = 1. Disk-to-disk backups should generate rows with device_type = 2 (disk) and is_snapshot = 0.


So, it looked like my rogue backups were something to do with snapshots.



Microsoft KB 951288 reveals:

Consider the following scenario. You install Microsoft SQL Server on a volume. You use a Volume Shadow Copy Service (VSS) application to back up files on the volume. In this scenario, SQL Server records a backup operation in the backupset history table. This problem occurs even if you did not actually back up the database files of SQL Server. 

So, it's VSS. What's VSS?  Mr Backup provides a solid overview. Basically, SQL provides a VSS writer (known as SQL Writer) that will quiesce SQL before a snaphot is taken, allowing an open SQL instance and databases to be backed up in a consistent manner. You can see the SQL Writer by querying the installed VSS writers.

vssadmin list writers

Writer name: 'SqlServerWriter'
  Writer Id: {a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}
  Writer Instance Id: {2ffe3778-7cdb-47bb-8d86-d01084b4316c}
  State: [1] Stable
  Last error: No error

One of the first things Netbackup for VMWare does prior to backing up the VM is to take a VMWare snapshot. The snapshot enables a 'point-in-time' image of the entire virtual machine to be backed up. If you are using vSphere 4.1 and Windows 2008, VMware calls the SQL Writer VSS provider (and any other application-level VSS writers) as part of the snapshot process (see VMWare note 1028881).  This ensures that SQL is in a consistent state on the backup image, so that you don't have to worry about crash recovery of SQL if you restore the VM.

As I saw, SQL Server regards this as a 'proper' backup and records it in the backupset table. But what I really want to know is whether my disk-to-disk backup worked, not whether the VSS writer has been called as part of a VMWare snapshot. It's just a simple modification to my script:

SELECT name,
MAX (backup_finish_date) as last_backup_date FROM
(SELECT * from master.sys.databases
WHERE state_desc !='OFFLINE'
AND name NOT IN (select secondary_database from msdb.dbo.log_shipping_secondary_databases)
AND name != 'tempdb') SD
LEFT OUTER JOIN
(SELECT database_name, backup_finish_date from msdb..backupset where type = 'D') bs
ON bs.database_name = sd.name
WHERE BS.Is_Snapshot <> '1'                /* Added a filter on Is_Snapshot */
GROUP BY name
HAVING MAX(backup_finish_date) < GETDATE()

That's all well and good if you 'roll your own' monitoring solution. But what if you use a monitoring tool, from either Microsoft or a third-party? My advice is to run SQL Profiler to work out exactly what the tool is reporting as a 'proper' backup.

The bad news - Policy Based Management for SQL does not take account of Snapshot backups in the provided facets. The good news - you can 'roll-your-own' condition to work around this. I'll describe how in a later post.

Resources:





1 comment:

  1. Thanks for blogpost, it´s still a problem on SQL2014 on with VM snapshot to Azure Recovery Service Vault and SQL Managed Backup.
    Restore from GUI doesn´t work because of that.

    ReplyDelete

Please leave a comment.