The purpose of video summarization, when used in an interface, is to extract from a video a limited number of key-frame that convey the meaning of the whole video at a glance. The development of cut detection and key-frame selection algorithm is based on the unspoken assumption that such meaning is conveyed not only by the sheer number of frames that is being presented, but that the comprehension of the video is increased by a careful choice of these frames.
This paper presents an experiment that challenges this assumption. Two videos were synthetized using careful frame selection and using a uniform samplong of the video with the same number of frames, and the comprehension of two group of subjects was tested by asking them questions about the videos. The experiment revealed no difference between the comprehension gained by seeing the carefully selected key-frames and that gained by looking at the uniform sampling of the video.