On the Structure and Analysis of Home Videos John R. Kender Department of Computer Science, Columbia University New York, NY 10027, United States and Boon-Lock Yeo Microcomputer Research Labs, Intel Corporation Santa Clara, CA 95052, United States A study of approximately 300,000 frames (about 3 hours) of unedited home videos taken by several people indicates that there are human limits on their capture and their perception, which impose upon them a measurable and exploitable structure. This structure suggests a novel approach to their automatic analysis and summarization, which we demonstrate. In general, home camera motion appears quite limited and is quasi-static, with the use of zoom and shutter being the predominant indicators of visual ``significance''. Surprisingly, shot lengths appear to follow a lognormal distribution similar to those appearing in professionally edited videos. But neither shot length, nor even the segmentation of frames into shots, is needed to find ``significant'' frames. Even without sound, color, or full detail, the detection of sequences of zooming-in, followed by relative camera stability, appears to be a robust video ``interest operator''. We describe and demonstrate such a zoom-and-hold filter, and discuss its performance. We suggest that this approach and this algorithm is the first of several potential exploitations of implicit human visual attention rules, whose articulation and embodiment in algorithms appear highly applicable to the understanding, summarization, and indexing of home---and more structured---videos.