cloud hosting

Sep 20, 2018

13 min read

VPS backups are simple—you’re just overthinking it

Written by

Vippy The VPS

People keep thinking that manual VPS backups are some impossible task. They demand GUIs and automated tools. They spend hours and hours trying to use obscure terminal-based backup tools.

Instead, let me be the first one to say that VPS backups are actually incredibly simple. Here’s the only command you need to know:

$ rsync USER@IP_ADDRESS:/ -aAXvh \
--exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} \
/home/USER/backups/

Done. Can we stop worrying about VPS backups now?

Okay, that looks like a lot, but I promise it isn’t. We’ll come back to what that command does in a moment—first, let’s talk about why you should be backing-up your VPS in the first place.

Why do VPS backups matter?

Whether you’re an owner of small business, host a simple personal website, or are working in a top-notch organization, your data is always important. But because it’s online, it always remains vulnerable to hackers, ransomware, or even accidental deletion.

Having a proper backup and recovery plan is essential to protect yourself against these unexpected events. If you keep duplicate copies of your important files and store them in a separate and safe location, you can recover them if you have issues with the integrity of your data—no matter the source.

Instead of just running that command without thought, you should spend a few minutes creating a proper, standardized backup policy to minimize risk and make your life easier.

The frequency and time of your backup depend on how often the data changes, how much time it takes to make a backup, how much data you need to duplicate, and when visitors or user will be using your service the most. Since the backup process can use lots of system resources, you should schedule your backups for low usage times of the day. For a personal VPS, you should be doing weekly or even daily backups.

Remember: Data backup and data recovery are two different things. Backups are the act of duplicating files, whereas recovery is the process of restoring data from your backup. Restoration isn’t as easy as backup, but you can’t even try restoration without a proper backup!

Let’s get into what rsync is all about

rsync stands for remote synchronization, and is a utility program to synchronize files and directories from one host to another in an efficient manner. rsync replicates an entire data set between the source and destination when it runs for the first time. After that first run, rsync only transfers data that has changed. These changes are called a delta.

rsync uses compression and sends data over an encrypted SSH tunnel for robust security.

The most basic use of rsync is to replicate a folder on the same host. The following example will sync all files and folders from source_folder to destination_folder.

$ rsync -av ~/source_folder/ destination_folder/

The -a option signifies archive mode, and is an alias for other flags (-rltpgoD), and -v option turns on verbose mode for details about the transfer.

Now that rsync has copied source_folder once, I can add a new file and rerun the same command. This time, rsync will not copy the entire folder again to the destination. It will only transfer the modified files or the files those have been added since the last run.

$ cd source_folder
$ touch pattern.txt
$ vi IP.txt
$ rsync -av ~/source_folder/  dest_folder/
sending incremental file list
./
IP.txt
pattern.txt

sent 353 bytes  received 61 bytes  828.00 bytes/sec
total size is 13  speedup is 0.03

This type of backup is an example of incremental backup.

To synchronize files and folders between your local system and remote VPS, you need the SSH credentials and install rsync on the remote VPS. The following example will sync a folder from your local system to the remote VPS.

$ rsync -av ~/source_folder USER@IP_ADDRESS:/home/USER/backup/

The above rsync command will sync the source_folder from your local system to the remote VPS in the folder /home/USER/backup/.

If SSH is running on any non-standard port in your remote VPS, then you need to specify the non-standard port of SSH using the -e flag.

$ rsync -avP source_folder/ -e 'ssh -p 2222' USER@IP_ADDRESS:/home/USER/backup/

The -P flag combines the flags --progress and --partial. The former will produce a progress bar in the terminal, and the latter tells the VPS to keep any partially transferred files if there are any interruptions during the transfer.

Turning the tables: backing up your VPS to your local machine

Now that I’ve shown you how to push data from your local machine to your VPS, time to pull data from your VPS back to your local machine.

$ rsync -avP USER@IP_ADDRESS:/var/www/html /home/dd/backups/
receiving incremental file list
html/
     612 100%  597.66kB/s    0:00:00 (xfr#1, to-chk=2/4)
html/drupal/
html/wordpress/

sent 59 bytes  received 820 bytes  195.33 bytes/sec
total size is 612  speedup is 0.70

The only difference between this command and the push from earlier is that we’ve swapped the source and destination folders.

We can still use the -e flag if you need to change the SSH port.

$ rsync -avP -e 'ssh -p 2222' USER@IP_ADDRESS:/var/www/html /home/dd/backups/

This works great for a single folder, but what if you want to backup the entire VPS? We can do that, too, but we’ll want to exclude a few folders. The --exclude flag does exactly this by excluding files based on a pattern. rsync doesn’t support regex, so only standard file matching will work.

$ rsync --dry-run USER@IP_ADDRESS:/ -aAXvh --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} /home/dd/backups/

The --dry-run flag in the above example will not transfer any files but will show you the output of the command. After checking the output carefully, you can then omit the --dry-run option to pull files from remote VPS. The / path right after IP_ADDRESS: instructs rsync to sync entire file system, excluding the folders specified in the --exclude={ } flag.

Let’s turn this magic into a script

Remember when I said a good backup strategy makes your life easier? Enter scripting.

Now we’re back to thinking too hard.

The script will not delete the old snapshots but will link the recent snapshots to a folder by the name latest. To customize the script for your environment, change the value of source_dir, destination_dir, ssh_user, ip_address, ssh_port_no and symbolic_name_recent_backup in the following script.

#!/bin/bash

#Create a timestamp
date=date "+%Y-%m-%dT%H_%M_%S"

#Source location, you can change '/' to something like /var/www/html
source_dir="/"

#Backup location on your local system
destination_dir="/home/dd/Documents/"

#Name of Backup folder
backup_folder_name=backup-$date

#Full path of backup; concatenation of above two paths
final_destination_dir=$destination_dir$backup_folder_name

#Create backup directory
mkdir -p $final_destination_dir

#rsync options
rsync_option="-aAXvhP"

#SSH username
ssh_user="peter"

#SSH Port
SSHPort=2222

#IP address of remote host
ip_address="123.45.67.89"

#Symbolic name of latest backup
symbolic_name_recent_backup="latest"

#Exclude folders that you don't want to backup

exclude_folders=(
  "/dev"
  "/usr"
  "/var"
  "/sbin"
  "/home"
  "/etc"
  "/proc"
  "/sys"
  "/tmp"
  "/run"
  "/mnt"
  "media"
)

#Change to the destination directory where rsync will pull data from remote VPS

cd $destination_dir

#Get the most recent snapshot folder name that will be symbolically linked to the latest folder.

latest_backup_dir=$(ls -td -- backup* | head -n 1 | cut -d'/' -f1)

#Place all the exclude folders in a single variable

for item in "${exclude_folders[@]}"
do
  exclude_flags="${exclude_flags} --exclude ${item}"
done

#Remove the folder which was symbolically linked to the snapshots folder earlier

if [ -L $symbolic_name_recent_backup ];
then
     echo "Removing previous symbolic link to the snapshots"
     rm -rf $symbolic_name_recent_backup
fi

#Create a new symbolic link to the latest snapshots

echo "Creating new symbolic link to the latest snapshots"
$(ln -s $latest_backup_dir latest)

#Run rsync

rsync $rsync_option ${exclude_flags} -e  "ssh -p $SSHPort" $ssh_user@$ip_address:$source_dir $final_destination_dir || echo "rsync died with error code $?" >> /var/log/backup.log

Automate the script to run once in a week

Once you have tested the above script in your environment, automate the script to run at least once in a week using a cron job. You can choose the running interval of the script according to your requirements. Make sure you can authenticate yourself to the remote VPS using the key-based method, and without a passphrase, otherwise the cron job won’t work.

Just run crontab -u USER -e in the terminal, choose an editor, and add a line. Specify the time interval you’d like, along with the path to where you saved the above script.

My backup strategy is to run the backup script at 9 AM every Monday, hence the 0 9 * * time interval specification. If something goes wrong, you can check the log file /var/log/backup.log for more information.

$ crontab -u USER -e
...
...
0 9 * * Mon /path/to/your/backup/rsync_backup.sh

The script is a simple one, and maybe isn’t a comprehensive solution for your needs, but it makes backing up a VPS incredibly easy. You can backup a VPS to your local machine, or even one VPS to another.

See what happens when you stop thinking too hard and just using the fantastic tools that your VPS already has?

Leave a Reply