Photo/Video Storage

I store my photos and videos using a simple yet relatively robust filing system. It has evolved over the years and while it may seem a bit redundant, it has served me well through several bad experiences with software and hardware bugs and failures. It also shares some concepts outlined by Peter Krogh.

Requirements

My basic requirements for the filing system were:
  1. It may not rely on any particular operating system or filesystem features. However, at a minimum I do expect that the OS and filesystem are capable of using more than 8+3 characters per filename, and that spaces and dashes within folder and filenames are OK. However, I do not expect the filesystem to properly handle anything more than basic US-ASCII characters in filenames.
  2. It provides at least minimum redundancy against corruption in metadata stored outside of the files themselves. I must be able to reasonably navigate the filing system even in the absence of the DAM software I use for more finer grain management.
  3. It may not have ambiguities in the storage structure, either due to regional differences in expressing date/time stamps or due to organizing around concepts where a single photo could fit multiple choices. For date/time stamps, I prefer to use unambiguous representations as recommended by the ISO 8601:2004 standard.
  4. It must be rigid enough to be easily traversed and processed with scripting languages.

Folders

I store all my photos in a tree structure starting from a common root folder. Within the root folder, I have a sub-folder for each year, and within each year I have a sub-folder for each month in which photos were taken. Within these monthly folders, I store photos in sub-folders where each of them represents a single shoot or event. I use the term shoot loosely to indicate a set of related photos whether or not they were in fact taken in a single uninterrupted sequence. In other words, a shoot could be a set of photos taken during a 3-hour birthday party, or it could be a set of photos that span a 10-day vacation in Hawaii. Within a given shoot folder I may have additional sub-folders for specific needs, for example that hypothetical Hawaii vacation shoot might have sub-folders for each major event like Luau, Dive Trip, Beach Fun, and so on. There may also be multiple shoots within any given day, hence the need for some additional information besides the date stamp. To summarize, the folder naming convention has the following format:

root-folder\YYYY\MM\YYYYMMDD[-yyyymmdd] shoot-name[\sub-folders]

where:
root-folder
Root folder name; for example on a Windows PC this might be D:\Pictures.
YYYY and MM
Year and month of the shoot.
YYYYMMDD
Starting year, month, and day of month of the shoot. Each component of the date stamp is zero-padded and 1-based, so for example January 5th, 2001 would be 20010105.
yyyymmdd
Optional ending year, month and day of month of the shoot; may be omitted if the shoot occurred during a single day
shoot-name
Freeform descriptive name of the shoot, for example Little Johnny's 8th Birthday Party, Hawaii Vacation, Christmas Dinner, etc.
sub-folders
Optional additional sub-folders to further sub-divide the shoot
I chose this particular strategy over some other common strategies:
  1. Name folders using a monotonically increasing counter.
    I would have to decide the maximum number of folders ahead of time in order to zero-pad that number to ensure proper folder name sorting. Furthermore, I would still have to further divide those resulting sub-folders into more manageable groups because of filesystem performance issues.
  2. Divide photos into buckets of equal size (either by number of files, or by disk space occupied).
    I would have to decide the folder size ahead of time. Furthermore, this would impose a completely arbitrary division of photos across folders, thus violating one of the requirements above.
  3. Place photos in sub-folders corresponding to some organizational paradigm.
    This runs into trouble with the requirement I mentioned above: ambiguous placement of content. For example, if a photo was taken during a festival while vacationing in Australia, should I file it in a folder for events (Festivals), or activities (Vacations), or places (Australia)? I'd rather use the unambiguous concept of time for folder storage, and express the additional metadata within the DAM software with unlimited tags, keywords, or categories.
My scheme does have some redundancy in it due to the repeated use of year, month and day. I introduced the divide-by-months model when I realized how cumbersome it can be to navigate a year folder filled with more than 365 sub-folders. I kept the full date stamp with the shoot folder name because otherwise the shoots within a month would be listed in arbitrary (or more commonly, alphabetical) order, again making it difficult to navigate the folders. Most importantly though, I chose this model because it is the most straightforward model to use outside of a DAM software application. Within the DAM software, I have other ways to navigate and filter the data (by date, by tags, ...) so all of these issues around the physical filesystem folder structure are somewhat irrelevant.

Filenames

The naming scheme I've chosen is similar to the folder naming scheme in that it has somewhat redundant encoding. The overriding reason for that is the desire to assign a unique filename across the entire set of photos in my collection. It simplifies locating a given photo if all I have to go on is the filename alone, without any other metadata or folder tree information. Thus, my photo files are named using the following pattern:

YYYYMMDD-hhmmss-nnnn[-extra].ext

where:
YYYYMMDD
Year, month and day of month of the shot. Each component of the date stamp is zero-padded and 1-based, so for example January 5th, 2001 would be 20010105.
hhmmss
Hours, minutes, and seconds of the day of the shot. Each component of the time stamp is zero-padded and represents time on a 24-hour clock in the local timezone where the shot was taken. See below for issues with this decision.
nnnn
Sequential frame number generated by the camera (zero-padded, 4 digits). This obviously wraps around at 10,000 hence the need to incorporate date/time in the name. It is also needed to distinguish between frames taken during the same time stamp, for example with my Canon EOS 7D that can take up to 8.5 frames per second. For Canon cameras, this is the 4-digit number from the filenames in the format IMG_nnnn.ext
extra
Additional strings introduced in derived versions of the photo. The original photo file has no such decorations. For example, if I take the photo 20060416-064250-6904-img.jpg and modify it in any way, I save the resulting file as 20060416-064250-6904-img-mod.jpg and if I frame it for publishing on the web gallery, I save it as 20060416-064250-6904-img-pub.jpg. I use two common suffixes (mod and pub) but there is no rigid rule for the suffix: it is simply used to distinguish the derived works from the original photo. I use DAM software to express much richer set of metadata about the derived works.
ext
Filename extension, typically JPG, AVI, THM, CRW, or CR2 - basically, whatever the camera produced in the first place.
That covers the basic layout of the file storage. I don't consider this solution a perfect one, but it works for my needs. I'll cover some of the known shortcomings in the following sections.

Known Issues/Problems

Timezones


Above I mentioned that my photos are tagged with the local date/time when the shot was taken. This creates several annoying problems.
  1. In the absence of a timezone indicator there is ambiguity around what the date/time stamps actually means, and in particular it creates a potential headache with daylight savings times.
  2. If I forget to adjust the camera's date/time correctly for the local timezone while traveling, it will produce files tagged with incorrect date/time stamps. Either I will need to fix up those stamps, or at the very least tag the photos in my DAM software to indicate how much to adjust the date/time, if I ever need to derive accurate date/time stamps for them.
  3. If I travel towards east across timezones and especially across the international dateline, and I do adjust the date/time for the local timezone there is a possibility that I may end up with photos that seemingly go back in time.
  4. I occasionally take photos in the middle of a flight, for example of particularly beautiful cloud formations, mountains, or other landscape. Should those photos have the date/time of the origin or destination, or somewhere in the middle?
It seems that setting my cameras to UTC date/time and thus having unambiguous time reference would solve all of these problems. However, I also value the simplicity of dealing with local date/time stamps when it comes to relating photos to inherently local events like sunsets, sunrises, and so on. In conclusion, this problem has not been serious enough to warrant switching to UTC date/time as of yet.

Multiple Cameras

I have several cameras, and sometimes I use more than one of them during a single shoot. You'll note that my naming scheme does not explicitly deal with this situation. It is theoretically possible that this would result in duplicate filenames. However, in order to do so all cameras involved would need to have their internal clocks in perfect synchronization. Furthermore, since I use the frame counter as part of the name, those would also need to be in perfect synchronization. In conclusion, this problem has not been serious enough to warrant inclusion of camera identifying information in the naming scheme.
If I was really worried about this happening, I would include the body serial number or some other distinguishing tag based on the source camera. 

Scanned Photos

The naming scheme as it stands does not specifically address photos scanned with a scanner: for example, old film photos scanned for archival purposes. In those cases I manufacture a filename using the same naming convention as above except the date/time fields are either an approximation of the date/time of the original film photo (if known) or the current date, and the frame counter is fixed at 0000, and the extension reflects whatever file format I used to save the scanned image (typically PNG or TIFF).

Derived Works

My folder/directory structure does not distinguish between original works and derived works. In other words, if I edit a photo I save it in the same folder along with the original photo. In some other schemes there may be a sub-folder named Edits, Edited, or Develops, or perhaps a whole separate sub-folder structure as described in Peter Krogh's book. Rather than proliferate additional sub-folders or whole hierarchies, I have chosen to keep the derived works with the original works and deal with these types of workflow issues in my DAM software.

No comments:

Post a Comment