M
MSP Workflows
Backup & Disaster Recovery

Restore Test Runbook for MSPs

Step-by-step runbook for planning, executing, and documenting restore tests. Designed to be handed to any technician and produce consistent, auditable results.

Runbook · Updated Feb 2026

Purpose of This Runbook

This runbook provides a repeatable procedure for restore tests that any technician on your team can follow. The goal is consistency: every restore test should produce the same documentation, measure the same metrics, and verify the same success criteria regardless of who performs it. Use this runbook for scheduled quarterly restore tests, for validating a new backup configuration, and for verifying recovery after a backup tool migration. Customize the success criteria per client based on their specific applications and SLA targets.
1

Prepare the test environment

Identify the system to be tested and confirm the most recent successful backup. Prepare an isolated restore target: a test VM on the BDR appliance, a sandbox in your hypervisor, or a temporary cloud VM. Confirm you have the credentials needed for the restore (backup encryption key, admin password, application service accounts). Notify the client if the test requires any coordination.

2

Define success criteria

Before starting the restore, write down what "success" means for this specific system. At minimum: the OS boots to a login screen, the primary application opens and responds, critical data is present and recent, and network services are functional. For database servers, add: the database engine starts, queries return expected results, and transaction logs are intact.

3

Execute the restore

Start a timer when you initiate the restore. Monitor progress and note any errors or warnings. Record the elapsed time at each milestone: restore initiated, data transfer complete, OS boot complete, login successful, application verified. If the restore fails at any step, document the failure point, the error message, and stop the timer. Record the partial RTO and investigate the failure.

4

Verify against success criteria

Walk through each success criterion defined in step 2. Log in to the restored system. Open the critical applications. Query the database. Check file share accessibility. Verify that the data is from the expected backup point (not older than the RPO target). Take screenshots of key verification points. These serve as evidence for the test documentation.

5

Document results

Record the following in the client's documentation platform: Test date and technician name. System tested and backup source. Actual RPO (age of the data at restore). Actual RTO (time from initiate to usable system). Success criteria results (pass/fail for each item). Any issues encountered and their resolution. Recommendations for improving recovery (if applicable). Store this alongside the client's backup configuration documentation.

6

Clean up and schedule next test

Delete the test VM or sandbox. Confirm no test artifacts remain in the production environment. Update the PSA recurring ticket with the next scheduled test date. If the test revealed issues, create remediation tickets and schedule a retest after remediation.

Never restore to production

Restore tests must use an isolated environment. Restoring test data into the production network risks overwriting current data, creating IP conflicts, or confusing Active Directory replication. Always use a sandboxed VM with no network connectivity to the production environment.

Use a standardized form

Create a restore test template in your documentation platform with pre-filled fields for date, system, success criteria, RPO, RTO, and results. A standard form ensures every test produces the same documentation regardless of which technician performs it.

How often should restore tests be performed?

+

Quarterly for critical systems (domain controllers, database servers, primary file servers). Semi-annually for non-critical systems. After any major change to backup configuration (new tool, new storage target, new retention policy). The cadence should be defined in the client's service agreement.

What if the client's actual RTO exceeds the target?

+

Investigate the bottleneck. Common causes: slow backup storage medium, large dataset requiring extended transfer time, or recovery steps that weren't accounted for in the target. Either optimize the recovery process (faster storage, pre-staged recovery environment) or adjust the RTO target to reflect reality and communicate the change to the client.

Should restore tests include SaaS data recovery?

+

Yes. Test recovering a mailbox from your M365 backup tool, restoring a SharePoint site, and recovering files from OneDrive. SaaS restore tests are quick (usually under 15 minutes) and often reveal issues with backup coverage that aren't apparent from the job dashboard.

Related Guides
← Back to all guides