Friday, March 21, 2008

RoboCopy and Buffalo TeraStation

Recently I set up a new recurring backup scheme from my PC’s main data drive to a 1TB Buffalo TeraStation II Pro NAS device but ran into a strange problem. I was planning to use RoboCopy as an easy and cheap (read: free) solution to a recurring backup. However, I was baffled by RoboCopy’s insistence on copying some files despite options instructing it to only copy changed files. No matter what I did, RoboCopy would always re-copy some files.



I would launch RoboCopy with the following command line options:
robocopy /job:Backup /log:Z:\Mirror-D.log D:\ Z:\Mirror-D
The job file Backup.rcj in turn had several options:

::
:: Copy options :
::
 /S  :: copy Subdirectories, but not empty ones.
 /E  :: copy subdirectories, including Empty ones.
 /COPY:DAT :: what to COPY (default is /COPY:DAT).
 /PURGE  :: delete dest files/dirs that no longer exist in source.
 /MIR  :: MIRror a directory tree (equivalent to /E plus /PURGE).
 /A-:A  :: remove the given Attributes from copied files.
::
:: Retry Options :
::
 /R:5  :: number of Retries on failed copies: default 1 million.
 /W:30  :: Wait time between retries: default is 30 seconds.
::
:: Logging Options :
::
 /V  :: produce Verbose output, showing skipped files.
 /TS  :: include source file Time Stamps in the output.
 /FP  :: include Full Pathname of files in the output.
 /NDL  :: No Directory List - don't log directory names.
 /NP  :: No Progress - don't display % copied.
 /TEE  :: output to console window, as well as the log file.
Now, if I were copying files from NTFS to FAT, I would expect some loss of detail due to the fact that the FAT filesystem stores far less file metadata (dates, times, attributes, access rights). However, in my case both the source drive (D:, 500GB RAID1) and the destination drive (Z:, 700GB RAID5) are formatted as NTFS and therefore the metadata would be copied faithfully. Or so I thought.
After scratching my head for a while, I realized that the time stamps stored on the TeraStation were slightly off compared to the originals. Not much, but enough to throw RoboCopy off to think that the destination files were either older or newer than the source files. Turns out that whenever I copied a file over the TeraStation, the time stamps would get rounded up or down by up to a few seconds. This was difficult to notice at first since the standard DIR command output only shows hours and minutes in the timestamps.
Fortunately it so happens that the authors of RoboCopy anticipated this problem, and provided a handy option to tell RoboCopy to accept up to 2-second difference in the timestamps for the files to qualify as “same”. Thus, simply adding the following option to the backup job file solved the problem:

 /FFT  :: assume FAT File Times (2-second granularity).
Now instead of copying 440GB of files every now and then, I was able to copy only the truly changed files. I further improved the backup job by adding two more options:

 /MON:5  :: MONitor source; run again when more than n changes seen.
 /MOT:30  :: MOnitor source; run again in m minutes Time, if changed.
The combination of these two options means that on my system, I can leave an instance of RoboCopy running at all times. What happens is that if any point in time I have more than 5 changes to files on the disk, RoboCopy wakes up and immediately mirrors those changes to the TeraStation. If on the other hand there are less than 5 changes, then RoboCopy wakes up after 30 minutes and again mirrors the changes to the TeraStation. That way the TeraStation is always at most only 30 minutes behind the current state on the primary data drive.

No comments:

Post a Comment