Update on 2008-03-26: I changed the behavior of my backup script a while back, but never updated this page since I felt that I did not have the time to upgrade the description. Now I figured that I might as well just post the script as some people can still find it useful. It is no longer making backups to a disk image, but straight to the external drive, and it also has some additional features. If you set up a scheduled wake up in Energy Saver, the script will put the computer back to sleep if these two conditions are met 1) the computer was actually waken up as scheduled (i.e. not left on on purpose to do something), and 2) the computer has been idle for a specified time (i.e. nobody started using it while the backup was running). By the way, I recommend cron for scheduling the script in this scenario, as launchd is a bit buggy (at least was on Tiger). Here is the new script:
/usr/local/bin/newbackup.sh - Download - View in browser
This article describes how I am using rdiff-backup to do nightly backups to a disk image on an external hard drive. Scheduling is done using launchd, and any potential errors are reported via email. This document is not meant to be a general guide on how to do backups, but instead it just describes how I ended up doing it.
NOTICE: Backing up your work is one of the most important things to take care of when dealing with computers. It is necessary to know what you are doing, so if you do not feel comfortable working with the command line, writing shell scripts, or editing configuration files, you might want to look for other alternatives.
Mac OS X 10.5 Leopard is only a couple of months away and will bring us Time Machine, which is a backup solution integrated to the operating system. In the mean time, I thought that I still need some way of backing up my files. When thinking about the solutions, I came up with the following criteria for the system:
While going through different solutions, I came across rdiff-backup, which seemed to do what I wanted. I decided to set up nightly backups using rdiff-backup and launchd. In this document I am outlining that process and also discussing how to keep external hard disks mounted when no users are logged in, and how to get mail to work.
Installing rdiff-backup is easy using MacPorts as it automatically installs all the dependencies as well. In addition to installing the rdiff-backup package, you should also install the xattr python module, which takes care of extended attributes. These two commands will install rdiff-backup and xattr:
As you can see, I went for the development version of rdiff-backup. The current stable release is 1.0.5 (MacPorts has 1.0.4), and the development version is at 1.1.9 (1.1.5 on MacPorts). For the xattr module, MacPorts has 0.2 while the latest is 0.4. After a while, I decided that I want to go for the cutting edge releases. So, I deactivated the versions installed through MacPorts (sudo port deactivate rdiff-backup-devel and sudo port deactivate py-xattr), downloaded the source packages, and built them myself. That was not too hard either. The newest xattr can be downloaded here. Even if you want to go for the latest versions, installing first with MacPorts does have the benefit of automatically installing all the dependencies.
I have an external hard drive that I am using for backups. However, the disk also has some other uses and has a separate Mac OS X installation for testing purposes. I did not want to interfere with this installation, and more importantly, I did not want this test installation to interfere with my backups. So, instead of backing up directly to this drive, I decided to create a disk image to store my backups. This backup script mounts the disk image, performs the backup, and unmounts the image.
Also, the script has an alert level and gives a warning when the disk usage goes over this limit. This was added because rdiff-backup has an option to remove backups older than some specified time limit. Naturally, I would like to keep the backups for as long as possible in terms of the available disk space. So, instead of specifying a time limit right away, I have set the script to alert at 90%. When the time comes, I will be notified and can then make a decision about the appropriate time to store old backups.
The standard way of scheduling tasks on Mac OS X is using launchd. The following file will run the backup script daily at 3:30 AM. You are free to name the file whatever you want, but it should be placed in /Library/LaunchDaemons/ so that it is loaded when the computer boots.
You probably noticed that the plist refers to mybackup.sh and not mybackup-script.sh. The reason will come clear in a moment.
To load the plist manually right now without restarting, use this command:
Imagine your hard drive does break some day and you need to recover everything from the backups. It would not be very nice to find out that for whatever reason, your last backup is three months old. Maybe you renamed your external hard drive forgetting that the backup script will not be able to find it anymore. For this reason, I wanted to make sure that if something goes wrong, I will notice it. You can always direct the output from the backup script to some log file, but that would require you to check that file on a regular basis. That is something that you are likely to forget, so I figured that it would be better to get these notices by email.
launchd does allow you to specify log files for stdout and stderr, but I was not able to figure out how you could pipe them to mail. So, I wrote another script called mybackup.sh, which is just a small wrapper for mybackup-script.sh. It detects if you are running it interactively or if it is being run without a terminal. In the first case, it just calls the original backup script. In the latter, it pipes the output to mail, so you will get notified if anything goes wrong. Just make sure that the emails do not get stuck in a spam filter.
It is quite like that sending email with mail does not work with the default configuration. This is because quite a few ISPs block port 25 to all other addresses, but their own SMTP server. The solution is to configure Postfix (the builtin tool for sending mail in Mac OS X) to relay all emails through your ISPs server. This is done by adding the following line to the /etc/postfix/main.cf configuration file. Naturally, replace smtp.yourisp.com with the name of the SMTP server of your ISP:
By default, your mails appear to originate from root@name-of-your-computer.local, which is something that the mail server is likely to reject. The solution is to define what should come after the @ sign:
Using the default configuration, Postfix accepts all computers in the same subnet to send mail. If you do not want to allow this, adding the following line narrows it down to your machine only.
Normally, Mac OS X does not mount external hard drives at boot time, but instead only when a user logs in. Also, the drives are unmounted when the user logs out. For this reason, the backup script fails if there are no users logged in at the time of its execution, as it cannot find the backup volume. If you want to be able to run the backup script at all times, this command will make Mac OS X keep the external volumes mounted even when no one is logged in:
There is also a separate tool called rdiff-backup-statistics for getting more information about the backup runs. It tells you how long it took to do the backup, how many new/deleted/changed files there were, etc. As my backup is on a disk image, I cannot run this tool directly without first mounting the image, so I wrote a small script for that purpose as well. It just mounts the image, runs rdiff-backup-statistics, and unmounts.
Without any arguments, you will get an average of all the backup runs. If you are only interested about the last run, run it with argument last.
So that is my backup setup. As I said in the beginning, when it comes to backing up your data, you should really know what you are doing. So if you are not familiar with the tools and techniques used above, you might want to look for other alternatives. Also, I am by no means an expert in writing shell scripts, so my scripts are likely to suffer from poor design and errors. But anyway, I hope this description can be helpful for other people when they are designing their backup systems.