Distance Learning From NITOL - HiST
Materials used in this course are the property of the author. These lessons may be used only by course participants for self-study purposes. Application for permission to use these materials for other educational purposes such as for teaching or as a basis for teaching should be directly submitted to the author.
Subject: LAN Administration
Summary: Backups are one of the most important tasks of a system administrator. The top priority of any normal system is to have a fully operational backup-system. This is because the network has become a central part of a business' information system. In this lesson we will discuss how to make a good backup -strategy /-routine. We will also discuss catastrophe plans and in the end of the lesson we will discuss software for backups.
Copyright: Arne B. Mikalsen/TISIP
Introduction
In the last lessons we have been discussing important principles concerning administration of a local area network. We first discussed the general administration and security, concentrating on accesses rights and other types of computer security. Further on we discussed the printer environment, and the principles that are important here.
In this lesson we will be discussing one of the more important principles, backups. This task is one of the highest prioritised ones the system administrator has. A network environment contains large amounts of data. This data is divided into three main parts. The user data, which mostly is of the sensitive type, the software, and the data stored on the workstations in the network. It is important that the first two are safe no matter what might happen. The third one is data with low security, and the backup routines concerning this data are mostly left to the users.
In this lesson we will be discussing general backup of data, restoring data from backup, utilities to help with the backup and a few words about catastrophe planning.
The physical equipment for backing up data (tapes, optical disks, floppies...) where discussed in lesson two. We did also discuss the ups and downs of the different types of backup hardware.
Backing up data
Sooner or later computers crash. These crashes can be extremely expensive. Valuable data is lost, and a lot of valuable time is lost in the attempts to restore these data. Many firms depend on a network that works nearly 100%. For this to be possible, efficient and well-planned routines for backing up data is very important, it might even be the most important factor. Computers are vulnerable, and we have to make up for this by constructing good and efficient routines for backing up data.
The need for backup
There are many reasons to make a backup. It is not necessarily the fear of a fatal error in the computer. Other reasons might be:
- A user deletes important files by a mistake. This happens all the time, and no matter how "hopeless" a system administrator say a user is, the firm still looses money if the lost data cannot be recreated relatively fast.
- System errors or disk crashes is the best known reason for making backing up data a system operators most important task.
- Virus is a third reason. Virus might destroy data (by for example writing over disks). These lost data has to be recreated, and the backup will do this. In addition it might be smart to have a "healthy" copy from the time before the virus destroyed the data. We will be discussing having generations of backups later in this lesson.
Stable power supply
Before we start discussing backing up data and other important things concerning this, we will take a look at something that often is present during a disk- or system-crash. The failing power supply. The power supply to a server is an important and critical part of the network. Variations in the current or extreme coincidences as lightning strike or other short breaks in the power supply might result in the server shutting down. Uncontrolled server shutdowns can be hazardous. Files that are being written onto the disk at the time of the server shutdown can result in problems. In the best case these files are only damaged, which is bad enough by it self. In the worst case it might lead to a system crash.
The solution of such a problem is an Uninterruptible Power Supply, UPS. An UPS is in reality a battery constructed to power the computer long enough to power down the server securely. If the power interruption is only a few seconds long, the UPS will power the server long enough for the power to return, and nobody will know about the power interruption. All servers with any self esteem has an UPS installed. UPS are also available for workstations, and it might be wise to install it for those as well.
Static and dynamic data
Static data is data that seldom or never are changed. Examples of such are executables or other files that do not that are not changed any time. Any file might off course be more or less static, but no files are static forever? Backing up static data is a bit special. It does not need any advanced and thorough routines, copying the installation sets are sufficient.
A general rule for backups is also true for static data; it's a good rule to always make two copies of a backup. One copy is stored on-site (inside the building), inside a fireproof safe. The other copy is stored offsite (anywhere else but in the building). The system administrator might take it home for safekeeping.
Dynamic data is the other type of data we have to consider when discussing backup. Dynamic data is constantly changed. Examples of such are documents. The demands for backing up such data are stricter.
We can imagine an example in a network where a backup is made each weekend (Saturday night). If a disk crash where to happen during Wednesday, two and a half day worth of work will be wasted of dynamic data, while nothing of the static data are lost because these are backed up at the time of installation or changes.
How often a backup should be taken will be a calculation of how expensive it will be to loose any data (how much data can be afforded to be lost) and how much work and resources you are willing to put into it. Ideally we could imagine having a backup each hour (or each minute). Then we would never loose more than an hour of work. On the other hand backing up to often demands a lot work. The administrator has to swap mediums (for example tapes) often, and the users will often find files to be unavailable during the backup. A file is normally not accessible during backup.
Full/ incremental backup
Since we discussed the difference between static and dynamic data, it is natural to discuss the difference between full and incremental backup. Full backup backups all the files fully, while an incremental backup saves mostly only files that are changed since the last backup. The status of a file is set by the operating system. Novell NetWare has, for example, an attribute, archive, available for all files. This attribute is set whenever a file is changed, and when the backup program is doing an incremental backup, it will only back up those files which have this attribute set. After the backup the attribute is reset to "not changed".
Incremental backups are faster than full backups, and needs less space. Therefore it is normal to use both. The full backup has to be made regularly for the incremental backups to be able to restore as much data as possible.
This example illustrates this. Imagine a network where a full backup is made once each weekend (Friday night), while an incremental backup is made the night to Tuesday, Wednesday, Thursday, and Friday. If a system failure should occur during the Wednesday, the data from the previous full backup is restored, while the changed data is restored from the incremental backups from Monday and Tuesday. After this the network should be as it where Wednesday morning. The only truly lost data is the data produced during the day.
We can see that the calculation of how often we should backup the disks is a comparison between how much time to and space to use, and how much time we want to use on restoring data. I we in the previous example had a full backup the first of each month, and an error occurs on the 30.th in the same month, we would have to make 29 incremental restorations.
Backup routine
A good and thorough backup routine has to include information about
- how often we should backup - the backup frequency
- what data should be backed up - static/dynamic data, which files/folders
- how long to store the data - how old should a backup be allowed to be?
- where should a backup be stored - especially; should the backup be stored offsite?
Figure 1 - backup strategyFigure 1 shows us a motto for a backup strategy. It is also very vague. We can see that it consists of words without any precise amount (Often enough, thorough enough and productive enough). The backup routine (or strategy) therefore has to visualise these vague goals. We can view the motto as a main strategy that are visualised and made usable through the backup routine. How thorough a backup routine is can be measured with this main strategy.
We can, for example, take a look at the environment here at IDB to illustrate the vague in the goals about backup strategies.
- Often enough varies during the semester. It is not necessary to backup each computer every day the entire year. During summer there are no students present, ad there is no need for a frequent backup. It is necessary to have a frequent backup in the periods where the students have several large papers to deliver (for example at the end of the term). These are things a thorough backup strategy should consider.
- Thorough enough will also vary during the semester. Looking above we can see that it is not necessary to backup the students home areas during the summer, while the teachers and other employees might produce a lot of teaching material during summer and could need a backup.
- Productive here can be that the students and the employees efficiently can do their job even though a server crashes. By efficient we can mean that we should never loose more than one day worth of work.
The board of leaders (or someone else) should formulate some demands for to the network about how much can be "afforded" to loose. The backup routine has to be made to fulfil these demands.
In an article in "Datatid" no. 1/96 that discussed backup I found the following quote:
"If a computer system is out of order more than one week, most firms will risk a bankrupt."
Even though this is a harsh way to put it, it shows us how dependant many (and the article consider this most firms) are on their computer system, and the data stored there. This demands that there is a backup routine, and that it is thorough enough to survive with as few losses as possible after a disk crash.
Storing
A "thorough enough" strategy should have the following things clear:
- How many backup-copies should there be? Several copies should be made, at least some times. One copy should be saved locally to be easily accessed during a crash (it should be available in a few minutes). The other copy should be saved somewhere else to protect against larger catastrophes. This other place could be at the system operator's home, but often a firm does not wish for backup tapes with sensitive information to be placed in anyone's home. Then this backup is stored in a bank box or something similar.
Another reason to make several copies is that there might occur an error on the backup. As with any other computer equipment the backup media might make a write error at the time of the backup. If the error is on a big one, the medium might become unreadable. This makes it important to make several backups. Then the possibility for this will be minimal.- Several generations of backup is also important (figure 3). Several generations of backup means that the copy of the last backup is not used to make a new backup. A five generation backup will mean that we store the backups five steps backwards before the media is used again.
Figure 1 - Generations of backup
There are several reasons for this to be important. There might occur a system failure or disk crash during the backup. If there is no earlier copies available, all that is left is half a useless backup, and a broken computer-system, which sums up to nothing. And errors in files are generally found to late for the latest backup to contain a "healthy" version.
We can imagine a system that is used for bookkeeping and taxes, and that are supposed to be sent every second month. A serious error has occurred with some important files that should be used in the next book. The backup strategy is made to make a backup once a week and the copies are stored in 4 generations. This means that the errors that came two months ago still has not been discovered, and there has been taken 8 backups of them. The backup version holding the "healthy" version has been re used and destroyed, and only damaged files are stored on the backup. In this case it was not enough to only have four generations of backups, and often this is enough. Some of the copies have to be saved longer than the others to protect against such errors. This part of the backup strategy is often called the retention policy and is an important part (Figure 4).
Figure 2 - Retention policy
Figure 3 shows us the backup strategy here at IDB (as it was the spring '96). We can se that there are included information about which parts of the network the backup is supposed to be taken of, if it should be a full or incremental backup, and how often a backup is to be taken. This strategy is misses the retention policy from being complete.
Figure 3 - IDB's backup strategyFrequency of backup
How often should we make a backup? As I mentioned in the introduction this question is a comparison between how secure the system should be, and how much resources there is available to do this. Therefore a consequence analysis has to be made. Such an analysis compares the two sides. An example of such a consequence analysis is demonstrated in this table (we are calculating with a 7,5-hour working day).
Frequency Avg. loss Worst case Costs at losses (avg.) 3 x daily 1,25 time 2,5 timer 18.000,- Daily ½ day 1 day 50.000,- weekly 3 ½ (2 ½) day 7 (5) days 300.000,- It is difficult to give any practical guidelines for finding a reasonable frequency for backups. There are several factors that are decisive, and they change from case to case. The only thing to do is to find a point that makes a good compromise.
What time of day should we make backups? The normal is for users to be unable to open files, or to update these files while the backup are being made, so doing it during office hours are not wise. It is normal to make backups during the night, at the time of the day with the least net use. To prevent the system operator from having to work nights to make backups, there are possibilities to set the time and day the backup is supposed to be made. And each morning after a "backup-night" the system operator can check if the backup is made, and that it is working (has anybody ever experienced to set the timer functions on the VCR and still not getting your favourite program being taped?)
Restoring
Sometimes the fears become a fact; the system has crashed, let's save what can be saved! This is when the backup routines are truly tested
Before a restoration might begin, the error has to be located and fixed. As a system administrator it is important to document everything well. By doing this it will be easier to fix similar problems in the future.
There often occurs error in the restoration. A good tip is therefore to take a preliminary backup of the crashed computer in case something should go wrong during the restoration process, and restore as little as possible. Please remember that the backed up files are older than the ones on the disk. If there are any "healthy" files left, it might be smart to keep these.
After the process of restoring data, it is important to test thorough to be sure that the error is properly fixed, and that the error doesn't crash again.
Catastrophe planning
A catastrophe plan is a complete plan that handles destruction on a large scale. The computer system should be able to survive fires, earthquakes or other catastrophes if there exist a proper catastrophe plan. Backups are an important part of this plan, but are not enough! The goal of the catastrophe plan is to get the system quickly back up. Remember the quote from the magazine "Datatid" about most firms being bankrupt after a week without their computer system. This demands a lot from the catastrophe plan.
Few firms that really depend on their network have a fully working catastrophe plan. It might be because they haven't thought about the consequences of eventual catastrophes. "We have an insurance" they may say. But an insurance might be a spoonful of sugar to make the medicine go down, but occasionally the catastrophe might be more severe.
The base of the catastrophe plan is off-site storing if data, but there are several other moments that should be present:
- Plan for reconstructing the network. The building use today might not be available any more, so the plan should be outline how to get a new building. To get a fully operational network after a catastrophe, a lot of work has to be done, often more than available locally. A good catastrophe plans includes contacts to get help if the network should fail, for example deliverers that are able to make a quick delivery, or technicians able to come quickly.
- Thorough backup over several generations
- Backups from all the workstations.
Software for backups
Most operating systems have software for backups as a part of the system. Still there are many that buy additional software. This is often done because it gives better software that has more functions. A well-known and much used application is ARCserve. Information about this program is located at http://www.cheyenne.com/. There are versions of ARCserve for Novell, NT and several other well-known systems. At http://www.cheyenne.com/Product-Info/Datasheets/12000.html. there are located some screen captures from the version for Novell NetWare. Figure 6 and figure 7 shows two of these as a demonstration of the user interface. For those of you who want to test the programs there are test versions available from http://www.cheyenne.com/TestDrive/.
Figure 4 - ARCserve 1
Figure 5 - ARCserve 2Summary
In this lesson we have been discussing backup copying, and important principles concerning this. The goal of the lesson has been that everybody should get an idea about what a good backup strategy should consist of. The backup is one of the most important task of the system operators because many firms are totally dependent on all the information stored there.
We have also been taking a look at catastrophe planning, and a very brief look at a much-used backup program, ARCserve. Those of you who wish to know more have got the links from where to get documentation, and a free trial version.