Backup & Disaster Recovery

Backup Verification and Restore Testing for MSPs

How to verify that backups actually work and prove it with documented restore tests. Covers automated verification, manual testing cadence, and what to document for auditors.

Workflow guide Â· Updated Feb 2026

By MSP Workflows Team

Contents

1.Verification Is Not Testing
2.What Automated Verification Catches
3.Define your restore testing cadence
4.Perform the restore in an isolated environment
5.Verify application functionality
6.Document actual RPO and RTO
7.Store results and report to the client
8.Screenshot verification is not enough
9.How long does a restore test take?
10.Should MSPs charge clients for restore tests?
11.What if a restore test fails?

Verification Is Not Testing

These two activities are related but distinct, and confusing them creates a false sense of security. Backup verification confirms that the backup job completed, the data was written, and the backup image can boot. Most modern BDR tools do this automatically through screenshot verification or integrity checks. This runs after every backup and catches obvious failures. Restore testing confirms that you can actually recover a system to a usable state within the client's RTO. This requires a human to initiate a restore, verify application functionality, and document the results. No automated tool can fully replicate this. You need both. Verification catches day-to-day issues. Testing validates that the entire recovery process works end to end.

What Automated Verification Catches

Automated verification should run after every backup job. Most BDR appliances (Datto, Axcient) include this natively. Veeam's SureBackup provides similar functionality. At minimum, automated verification should confirm that the backup completed without errors, the backup chain integrity is intact (especially for incremental backups), the backup image can boot to a login screen (screenshot verification), and offsite replication has completed. What automated verification cannot catch: data corruption within a running application, database integrity issues, application-specific recovery requirements, and whether the restore meets the client's RTO target. These require manual testing.

Define your restore testing cadence

Test frequency should match system criticality. Critical servers (domain controllers, database servers, file servers with active data) should be tested quarterly. Non-critical servers and workstations can be tested semi-annually. SaaS data restore tests (recovering a mailbox or SharePoint site) should run quarterly. Put the schedule in your PSA as recurring tickets with assigned owners. If restore tests don't have a scheduled date and an owner, they won't happen.

Perform the restore in an isolated environment

Never restore test data into the production environment. Use a sandbox VM, a spare physical machine, or a cloud-based test environment. The goal is to validate recovery without any risk of overwriting production data. For BDR appliances, use the local virtualization feature to spin up the backup as a VM on the appliance itself. For cloud-based backups, restore to a temporary cloud VM.

Verify application functionality

Booting to a login screen is not a successful restore test. Log in. Open the critical applications. Verify that the database responds to queries. Confirm that file shares are accessible. Check that services are running. If the client has a specific LOB application, open it and confirm it loads data. Document what you tested and what worked. If something didn't work, document that too and include it in the remediation plan.

Document actual RPO and RTO

Record two numbers: the age of the data at the time of restore (actual RPO) and the elapsed time from "restore initiated" to "system usable" (actual RTO). Compare these to the targets defined in the client's service agreement. If actual RTO exceeds the target, investigate: is the backup medium too slow? Is the restore process adding unexpected steps? Is the target unrealistic for the current infrastructure? Adjust either the process or the target.

Store results and report to the client

Save restore test results in the client's documentation alongside their backup configuration. Include the date, the system tested, what was verified, the actual RPO and RTO, and any issues found. Include restore test results in the next quarterly business review. This is also the evidence cyber insurance providers and auditors request. Having documented, dated restore tests on file significantly strengthens the client's compliance posture.

Screenshot verification is not enough

A screenshot showing a Windows login screen proves the OS can boot. It does not prove the application data is intact, the database is consistent, or the system can be recovered within the client's RTO. Screenshot verification is a useful automated check, not a substitute for a real restore test.

How long does a restore test take?

Plan 30 minutes to 2 hours per system depending on the backup size and recovery method. A local BDR appliance restore is faster (15 to 30 minutes to boot). A cloud restore depends on download speed. The documentation and reporting takes another 15 to 30 minutes. Budget accordingly.

Should MSPs charge clients for restore tests?

Include a defined number of restore tests per year in the service agreement (typically 4 for critical systems, 2 for non-critical). Tests beyond the included count are billable. This ensures testing happens while preventing scope creep.

What if a restore test fails?

A failed restore test is a finding, not a crisis (assuming you still have the backup data). Document the failure, diagnose the root cause, remediate, and retest. Common causes: corrupted backup chain, missing drivers for the restore target, expired credentials in the backup configuration, or application dependencies that weren't backed up.

← Back to all guides