I recently had to go through the exercise of creating a backup routine for a number of Drupal based websites on Linux servers. Even though these sites were all based on the same program (and similar versions too), there were sublte differences that resulted in different approaches for the various sites. I'm documenting my experience here in the hopes that this might be useful to someone else. (not to mention I'm likely to forget this before I need it again...)
Backing up Drupal requires two distinct steps:
- Backup the files
- Backup the database
The biggest problem with these two simple seeming steps is that the "backup" process is different depending on whether or not you have root access to the host server. If you do, you take one approach, if not, the approach you take depends on what type of access you do have.
We'll take the simple case first - where we have root access to the server.
NOTE: Drupal supports both MySQL and PostgreSQL. This document assumes MySQL is being used. However, the database backup command is very similar between the two (i.e. mysqldump versus pg_dump), and they have almost the same arguments. So PostgreSQL users can still get some use from this document.
NOTE 2: I've made the assumption that if you are concerned about doing a backup of a Drupal web site, you likely know how to work with the Linux command line, and have some basic knowledge about creating scripts, and making those scripts executable. If you do not have this knowledge, you might want to start with a Google Search for "Linux script"
With Root Access
The simplest way to handle backups is to have the web server backup the database to a file (afterall the web server HAS to have access to the database), and then backup the database file and all the Drupal files (and any added by users) to another location. This other location could be a different directory on the same box, or a remote box. I like the idea of creating a backup file on the web server, then connecting with my remote box to get that file in a secure manner.
So to that end, I am using two scripts. The first script runs on the web server periodically, and the second script runs on the backup server to retrieve that resulting file.
The first script can be as simple as the following:
#!/bin/bash #use some variables to make changes easier SRCDIR=/home/www/avro #website directory BAKDIR=/home/backup #where to store the backup DBUSER=myusername #database username DBPASS=mypassword #database password DB=mydatabase #database name #First, make sure the backup directory exists mkdir -p $BAKDIR #archive the website files into the backup directory via rsync rsync -a $SRCDIR $BAKDIR #backup the database into the database directory mysqldump -u $DBUSER --password=$DBPASS $DB > $BAKDIR/avro/database/avro.sql #now create a compressed tar file for backup tar -zcf $BAKDIR/avro.tar.gz $BAKDIR/avro #If space is at a premium, now would be a good time to remove our working files #with a command similar to 'rm -rf $BAKDIR/avro' - just be VERY careful not to #accidently delete the original files.
After this script runs, we have a single file in the /home/backup folder (or wherever you choose to put it). I would recommend that this file NOT be accessible from the web, for security reasons. Afterall, there's probably more information in the database file than you'd really like made public (i.e. private postings, possible passwords, etc.).
Next, we need to retrieve that file. A simple rsync command can do the trick, but I prefer to automate this so I don't have to worry about when I last did the backup. So, I've chosen to set up the two servers involved with SSH keys so that I won't get a password prompt when I SSH to the web server. You can find lots of detail on SSH keys with a Google Search for "ssh key".
Then I run a very simple script on my backup server to retrieve the file:
#!/bin/bash #make sure backup directory exists mkdir -p /home/backup rsync myusername@thewebserver:/home/backup/avro.tar.gz /home/backup/avro.tar.gz
With these two scripts, we now have a full backup of the website. And if they were scheduled in cron, say an hour or two apart (script 2 runs an hour or two AFTER script 1), then we would have a backup of the website every night.
The downside though is that by creating the tar file, we will be downloading the FULL website anytime a change is done. Even if the change was only to a single file out of thousands. An alternative is to NOT create the tar file, but rsync the full directory to the local computer, and then let the local computer create a tar if needed. If you have ample bandwidth, or even a local network connection, then this is a non issue. But for those of us who have to pay for our bandwidth (almost everyone with a remote server), here's a revision of the second script:
#!/bin/bash #this file backs up a drupal website #make sure backup directory exists mkdir -p /home/backup #synchronize the files rsync -a myusername@thewebserver:/home/backup/avro /home/backup/avro #create a tar file (if needed) #tar -zcf /home/backup/avro.tar.gz /home/backup/avro
We've added the archive option (-a), and only specified the directories (rather than file names). The first time we run this command would be no different than script 2 above in terms of bandwidth use. But everytime after that would only transfer the files that have changed - a significant improvement on our bandwidth usage. If we follow this approach, we don't even need the tar file in script 1 above.
No Root Access
When we don't have root access to the computer, we have to change how we access the files. In my experience, when we don't have root access to the server, we are usually given FTP access to our files. We can use the wget command to synchronize our files to the backup server, and let the backup server do whatever further processing we need. But this doesn't handle the remote database - we'll have to work out a different method to get our data.
Getting the files is relatively straight forward - you issue a wget command similar to this:
cd /path/to/local/directory wget -m ftp://username:password@host/path/to/site
First we change the directory to wherever we want to store our backup files. Then we give the wget command which says to use ftp, to connecto to 'host', using the credentials of 'username:password', and then get a copy of the file(s) at /path/to/site. The '-m' command says to mirror that directory, and sub directories. So, that takes care of getting our files. Now for the database:
In the section above, we used the mysqldump command to backup the database. We can still use the same command, but it needs some changes, AND some tweaks to MySQL itself. The revised command would look like this:
mysqldump -h host -u myuser --password=mypass mydb > backup.sql
This command says to connect to 'host', using the credentials of 'myuser' and 'mypass', and use the 'mydb' database. Then, dump the commands necessary to create this database, and redirect the output into a file called backup.sql.
But running this command the first time may not work. In a hosted environment, you may not be allowed access to the MySQL database from a remote location. You'll have to check with your host to see if you do have this access. If they won't give it to you, then you'll be restricted to using the Database Administration module in Drupal to do your database backups.
The MySQL database must be configured to allow a remote connection, and permissions granted to the user account from the specified host. When the MySQL database is on the same computer as the web server, it's smart to configure the dataase to only allow access from localhost. You would need to tell the database that your remote backup server is also allowed to access the database. If you get Access Denied messages, you can learn more at http://dev.mysql.com/doc/refman/5.0/en/access-denied.html.
So, knowing this, we can create a simple script to do the full job for us:
#!/bin/bash #set some variables to make changes easier BAKDIR=/home/backup TARGET=ftp://clug:mypassword@ftp.clug.ca DBHOST=www.clug.ca DBUSER=clug_drupal_user DBPASS=clug2005password DB=clug_drupal #make sure the backup directory exists mkdir -p $BAKDIR #retrieve the files cd $BAKDIR wget -m $TARGET #Backup the database #mysqldump -h $DBHOST -u $DBUSER --password=$DBPASS $DB > clug.sql
This method could also work even if you do have root access, but you would still need FTP access as well. I'll leave it up to you to decide which method is best in your situation.
Bringing it all together
In my case, I have three Drupal sites on a server with local network access, one on a remote server where I have root access, and one on another where I only have FTP access. What I have done is create individual tailored scripts for each server. These scripts are very similar to what we've discussed above. In my case, I require at least one weeks worth of backups for each site. To do this, I've created a master backup script, that calls each of the individual server scripts, then manipulates files based on the date.
#!/bin/bash #variables BAKDIR=/home/backup SCRIPTS=/root/cron #backup each server $SCRIPTS/remote1 #with root access $SCRIPTS/remote2 #with FTP access $SCRIPTS/local1 $SCRIPTS/local2 $SCRIPTS/local3 #Use the current day for a directory name. DAY="_`date '+%a'`" #create the directory if needed mkdir -p $BAKDIR/$DAY #copy any tar files into the day directory mv $BAKDIR/*.tar.gz $BAKDIR/$DAY
This script simply calls each of the individual backup files (which I've stored in a directory at /root/cron). Next it creates (if needed) a directory based on the day of the week. Finally, it moves all the tar files in the backup directory into the day directory. This script is scheduled to run once a night.
If you have a need to be able to recover a file from the past month, you can do this by changing what value we get back from the date command. Using "%d" would give us the day of the month, and "%u" the week number of the year. There are plenty of other options for the date command - you can learn more by typing "date --help" or "man date" at the command line.
The final result of all these scripts and scheduled cron jobs is a full backup routine for our Drupal websites. This is not the ONLY way to do this, of course, but it meets my own particular needs. Perhaps it will also meet your needs, with little or no changes.