On Monday, I began the sampling experiment on a collection of VHS to try and gauge if a random sample of the videos could represent a collection. After consulting my adviser, the sampling method I decided to use is as follows:

1. Take one, five minute sample beginning at the 00:02:00 mark of each VHS (00:02:00 – 00:07:00).

2. If the tape goes over one hour, take a second five minute sample at the 01:00:00 mark, and another five minute sample at each following hour mark (02:00:00, 03:00:00, etc.).

I chose to start the first sample two minutes into the video because the beginning of a tape recording is often the camera operator setting up the video camera, getting it on a tripod, or making technical adjustments. This usually does not happen again later in the tape.

When I began taking samples, I came across many VHS tapes that are broken into multiple videos with a “.1,” “.2,” or “.3” extension in the file name. Several times, the second sample at the hour mark would stretch over two videos. I did the math to take an accurate five minute sample beginning at 01:00:00 on one video and ending five minutes later in the next video. I later found out that our digitization department breaks up the longer VHS videos at the time of digitization because only one hour of video will fit on the DVDs we create for donors. If we adopt this sampling method for repeated use, we should consider just taking the sample from the second video instead of stretching a sample over two videos and having to do additional editing. During the experiment, I was concerned with keeping the times very exact, so starting at exactly the one hour mark made sense. It would not be efficient for the long-run, though.

Using 16 VHS tapes, I created 25 samples. I created a record and performed full descriptive cataloging for each sample, following standard practice at TAMI. My total time spent to complete cataloging on the samples was 3 hours, 43 minutes, and 55 seconds. This is far below the normal amount of time I spend cataloging a VHS collection. Part of that can be attributed to a good amount of metadata from the original labeling, which cut down on research time, but it was mostly due to the drastic cut in hours of video to review.

At this point, I feel like the sample did a pretty good job of representing the content on the tapes. Of course, when I review the tapes in depth, I could find fantastic footage on these tapes that I would have missed by only using random samples or find the samples to not be representative of the collection at all, but right now, it feels like the samples are pretty consistent with what I usually find after a full review. I don’t know if we could ever seriously adopt this method on a full time basis because of the potentially great things we could be missing (it may eat us up inside), but I was surprised by the content the samples were able to capture. They didn’t seem as arbitrary as I expected. However, I do think that if we use this method when not in a strict experimental environment, we should adjust the beginning and end of samples to appropriate starting and stopping points, making them into more cohesive segments. So, to sum up, I’m surprised at the quality of the segments that were produced by the sampling method, but I could definitely be eating my words after I finish the next phase of this experiment.