, Arizona State University
Pages: pp. 18-19
The human preoccupation with capturing and archiving memorable experiences witnessed astonishing technological advancement in the 20th century, progressing from diaries and paintings to the dawn of the digital camera and camcorder era—and ushering in our multimedia community. Today, we must expand our notion of media, because audio and video recording can also be supplemented in many ways, including with temperature, heart rate, location, acceleration, humidity, Web pages visited, and logging how we use many devices.
At the same time, digital storage has become inexpensive and plentiful enough to enable personal archives on a previously unimaginable scale. This has led researchers to focus less on the representation, archival, and transmission of isolated events (such as images of a party or video of a wedding) and consider what might happen if media were recorded continuously, or nearly continuously. We believe that the increase in the quantity and fidelity of media that will occur in the coming years, and our ability to archive, organize, and retrieve this information will fundamentally affect our culture.
We dedicate this special issue to an emerging area of research that deals with the capture, archival, and retrieval of all media relating to personal experiences—known as CARPE.
Personal storage of all media throughout a person's lifetime has been desired and discussed since at least 1945, when Vannevar Bush published "As We May Think," positing the Memex device "in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility." 1 His vision was astonishingly broad for the time, including annotations, hyperlinks, vast storage, and even stereo cameras mounted on eyeglasses. Today, Memex has become feasible and even affordable. Indeed, we can now look beyond Memex at new possibilities.
The three articles in this special issue capture two important themes of CARPE: passive capture and automatic labeling. Passive capture means that recording requires no effort by users—they don't have to stop experiencing the moment to, say, take a picture; it just happens. Automatic labeling means that generally speaking, a user doesn't have to pore over the records to hand-label them. Indeed, it wouldn't be feasible to hand-label the media in the quantities considered.
Mase et al. describe how the experience of visiting booths at an exhibition can be captured and shared, even via a robot walking the aisles. Visitors wearing cameras and microphones have their location tracked using infrared IDs (IRIDs). Booth presenters also record video and audio. The system first segments the recorded material, then it identifies low-level activities such as looking at something or speaking. Finally, it labels higher-level concepts, such as dialogue or group discussion.
Ellis and Lee describe their experiment with continuously recorded wearable audio—an experiment that anyone could perform now, simply using an inexpensive MP3 player with a built-in microphone. While such a recording will clearly contain a lot of useful information, the challenge of usefully searching and browsing the material is severe.
They share their progress in segmenting and labeling the audio, and present ideas on how to make the audio archive useful, based on several years of valuable real-world experience.
The iSensed project by Blum et al. considers how continuous personal capture can automatically classify and label interesting experiences. Their wearable setup consists of a camera and microphone mounted on the chest, accelerometers on the hip and wrist, and wireless frequency access-point sniffing. The system classifies captured data according to location, posture, activity, and any speech taking place. They can then apply an algorithm that predicts the interestingness of the situation, based on the user's current state and recent history.
While these articles highlight the essential points of passive capture and automatic labeling, there's much more to CARPE than what we're able to present in this limited space. Consider the application to health: We're not far from the day when people will no longer puzzle over "When did I first start feeling this way?" and instead will show the doctor data from wearable health sensors that provide their temperature and heart rate. This shift to quantitative health assessment will be enjoyed by therapists as well as patients, as evidenced by work at Arizona State University that enables the logging of precise movements of stroke patients in their therapy.
Another important application area is education, where students and teachers can use CARPE directly to aid in learning. There are many other important application areas, including meeting capture, museums, child care, the military, and personal information management.
System-level issues also abound. The quantity of media generated in CARPE calls for study into scalability for previously solved problems. For example, research has begun at Princeton to consider scaling existing image and audio similarity algorithms. User interfaces have similar scalability challenges. Issues related to data mining, content analysis, visualization, and many other areas are being tackled by a growing community of researchers, and there's still much to learn and explore.
For those readers who want to learn more about CARPE, visit the Special Interest Group on Multimedia's CARPE Web site ( http://www.sigmm.org/ Members/jgemmell/CARPE). There you can find out about the CARPE annual workshop, mailing list, and many more informative links.