1. Capture using skilled videographers. Videographers are trained to use techniques such as pan, zoom, and overlay to create a video that captures all the interesting events that occurred during a lecture.
Some universities are already addressing the problem of students in different locations and lack of synchronous meeting times using distance education and online courses, respectively. They capture and broadcast the lectures using videographers. These universities can simply distribute these videos for review purposes.
This approach poses two significant disadvantages:
a. Production cost is prohibitive: the videographers need to be present during the entire lecture which adds to the cost of video capture. The in-house audio/visual department charges US$100/hr for video recording and US$120/hr for editing and digitization of the video. Each course typically meets for forty lectures; potentially costing over US$8,000 per semester. The Educational Technology Services (ETS) at Berkeley 1 provides a richer capture option. However, they charge US$535 for set up and US$572/hr to capture and distribute a video containing an audio, video, and screencast of a lecture.
b. Videographers are not always familiar with the topics being covered: their notion of important events need not coincide with those of the instructor. For example, they might zoom in on the instructor when the contents on the blackboard were more important.
2. Automated video capture. Many research efforts address the videographer's expense by automating aspects of the capture workflow.
Brotherton and Abowd [ 6] described their experiences in collaboratively creating lecture review notes among the instructor and the students in a fully instrumented lecture hall [ 7]. They observed that videos were not popular because of the poor quality capture and inadequate network resources to remotely access them. The authors recognized the value of videos especially in disambiguating pronouns from the audio track. However, the technology limitations experienced by them are no longer applicable either for capture, processing, or consumption of high quality media. Mukhopadhyay and Smith [ 8] developed a system that combined the video streams from a static overview camera as well as a stream from a tracking camera along with the lecture slides to create a synchronized presentation media. These tools were used to further develop mechanisms to capture and distribute lecture videos as the Berkeley Internet Broadcasting System (BIBS) [ 9], [ 10]. Similarly, Rui et al. [ 11], [ 12], [ 13] developed a video capture system that fully automated the lecturer and audience tracking and performed all the capture functionality while achieving the video quality close to that of human-operated systems.
Other projects enabled search capabilities. Ziewer [ 14] captured the screen contents using VNC. They created a fully indexed and searchable videos using VNC protocol messages, instructor annotations and through an external optical character recognition program. Similarly, Hilbert et al. [ 15] automatically captured the slide projection using a specialized hardware. They used optical character recognition to segment the videos and index the various slides along with the audio narration. Müller and Ottmann [ 16] focused on automated authoring and retrieval of lecture videos. Repp et al. [ 17] automated the indexing process of stored lecture videos in order to ease content-based browsing. Adcock et al. [ 18] created a searchable text index of the slides from publicly available lectures videos. The performance of this system is adversely affected by video overlays and authors present mechanisms to improve recognition for these videos.
Few research projects were transitioned to a large scale production service. The production BIBS [ 19] service is currently available in five lecture halls and uses a mixture of custom and off-the-shell tools. Burdet et al. [ 20] describe the effort at the University of Geneva to automate the lecture capture. The faculty collaborated with the IT staff to automatically capture the videos. In older classrooms that were not fitted with modern A/V capture infrastructure, they developed and deployed a custom capture solution using Mac Mini computers. Their solution is actively deployed in 35 lecture halls. Eth Zurich is using the Replay 2 system to manage and distribute audiovisual recordings. Their software is available through the opencast initiative (opencastproject.org). Talkminer [ 18] provides slide search capability for lecture videos as a public service at talkminer.com.
Transitioning prior research and implementing an automated capture system is difficult. Researchers spent most of their effort in developing novel techniques to automate the capture rather than in developing deployable solutions. Ease of deployment and maintenance requires using commodity components that are inexpensive and easily available. Currently, we require considerable technical expertise and customized hardware and software to implement, deploy, and maintain these systems. Such resources are not available in many universities.
1. Minimizing the amount of faculty time required for the capture workflow. Some of the times are dictated by the registrar (e.g., time between lectures restricts the time available to set up and pack the equipment), some depend on the technology limitations (e.g., time to transfer video from camcorder) while others are under the control of the instructor (e.g., amount of annotations).
2. Only using commodity, off-the-shelf components. This allows us to minimize costs and maximize the number of faculty who can capture their lectures.
2.3.1 Capture Equipment The lecture halls were equipped with a LCD and document projector and a lectern computer; video cameras were not already deployed. Hence, the video capture equipment should be easy to carry, set up, and pack at the lecture hall. Ultimately, what can be captured depends on the weight of the equipment as well as the time required to set up them. This is particularly important because the university allotted duration between lectures is small. Often, prior lectures overran the allotted time; further reducing the time available for set up.
We leverage the low cost advantage of commodity components. We used the Sony HDR-HC1 HDV camcorder (US$1,350 in January 2006) which can record 64 minutes of 1080i HD video (rectangular pixels) on mini-DV tapes. Depending on the class, our lectures either lasted for 50 or 75 minutes. The Sony HDR-HC1 was one of the first consumer grade HDV cameras. Newer tapeless mechanisms such as AVCHD can store the video in hard disks and flash memory. For example, the Sony HDR-XR150 HD Handycam retails for under US$700 and offers 120 GB of hard drive-based storage that can store up to 50 hours of HD video. These newer camcorders provide adequate storage for lecture capture.
2.3.2 Video Capture Setup Typically, most of the seats in the classroom were occupied. Finding a location to set up the video camera that provides a good view for video capture while also not obstructing any students from viewing the lecture was challenging, especially since it was impractical to carry tall tripods to each lecture. The layout of each classroom was different; we used four different types of halls over the past seven semesters; Fig. 3 illustrates the layout of three such lecture halls. The small classroom was flat, the seats in the medium room was elevated while the seats in the large room was steeply elevated to accommodate about 120 students. Flat classrooms require the camera to be installed in the front (unobstructed view) while elevated classrooms require placement further back. We mounted the camcorder on a Manfrotto 209 Tabletop Tripod with a 482 Micro Ballhead (portable, about “4” in height and retails for US$55). The height of this setup was unobtrusive. In general, placing the tripod further back increases the camera field of view. Wider field of view can capture more aspects of the lecture. However, it will also capture (the backs of) some of the students, which is undesirable for our purposes. Note that cameras that are installed by the university and mounted on the ceiling (like in [ 24]) could be installed further back and still avoid capturing the students.
At the beginning of each semester, we surveyed the lecture hall and chose a good location to place the video camera. Depending on the topics planned for a particular lecture, we tweaked this location. For example, when we expected to use the blackboards much more than the LCD projection, we adjusted the camera to place more importance on the blackboard. In general, the quality of the video was robust against the location choice. Typically, we chose a location toward the end of the first row ( Figs. 3a and 3b); we chose the third row in the larger classroom ( Fig. 3c). Placing the video camera among students caused the camera to capture student murmurs. We used a bluetooth wireless microphone that directly connected to the camcorder (Sony ECM-HW1) to prevent the camera from capturing student conversations. After the initial choice of a location, we experienced little problems in reusing the same location to place our video camera (students also typically sat in the same location throughout the semester). We made sure that we did not capture any students in the video in order to protect their privacy. We manually removed scenes where students walked into the camera field of view to (say) turn in their assignments.
2.3.3 Capture Experience Between the Spring 2006 and Spring 2009 semesters, we recorded the lectures of seven courses in four different types of lecture halls. In the Spring of 2006, 2007, and 2008, we recorded the lectures of a junior level Operating Systems course. These classes convened for 50 minutes each, three days a week for a total of about 36 lectures. This was a core required course for all Computer Science majors. This course provided the necessary background for the graduate Operating Systems course which we taught in the Fall 2008 semester. The graduate course was considered to be a qualifying exam and was required of all incoming graduate students. Note that many of the graduate students did not graduate from the university itself; they likely took the undergraduate Operating System course from their own institutions. Some of the graduate students did not hold an undergraduate degree in Computer Science and hence never took an undergraduate Operating Systems course. Regardless, all the graduate students were strongly encouraged to review the course materials covered in our undergraduate Operating System course (especially since graduate students did not receive any graduate level credit for taking the Junior level course). This graduate course met twice a week for 75 minutes each for a total of 26 lectures; the camcorder could only capture about 64 minutes of each of these lectures. In the Fall 2006 and Spring 2009 semesters, we taught an undergraduate Multimedia Systems course which was also cross listed as a graduate course. The Fall 2006 course was offered twice a week for 75 minutes each while the Spring 2009 course was offered thrice a week with each lecture lasting 50 minutes. In the Fall 2007 semester, we also taught a undergraduate/graduate course on Networked Sensor systems. This course met twice a week for 75 minutes per lecture. Note that the videos from classes that met twice in a week were smaller ( Table 1) because we only captured 64 minutes of the 75 minute lecture. Newer video cameras that use the AVCHD format will not experience this capture limitation. Also, earlier courses used lower bit-rate high definition videos than what was used in later semesters.
Our primary focus during the lecture was in interacting with the students and not to face and talk into the camera. We only acknowledged the existence of the camera when discussing private information (such as student grades). Sometimes this meant that the lecturer would walk away from the camera or continue writing past the camera's field of view; these events were rare because of the wide capture angle of the camera. Note that a trained technician would have followed our movements and generally did a better capture job. We also did not use any special lighting facilities; the lighting in typical classrooms was adequate for video capture.
During each lecture, we projected Powerpoint slides using the lectern PC. We experimented with the presentation capture feature of Powerpoint. During the postprocessing stages, we can then combine the video and slide capture streams using tools such as Camtasia. Unfortunately, Powerpoint missed the synchronization timing between the audio streams (captured by the camera) and slide transitions (recorded by Powerpoint). It also lost the audio segment if we went back to a previous slide. The university managed lectern PC's did not support the Camtasia tools. We believe that carrying our own laptop with Camtasia tools places an undue overhead in terms of carrying, setting up, and dismantling two devices (camera and a laptop) for each lecture. Also, the university assigns the time between lectures and is limited. In general, it took us about five minutes each to set up and pack-up the video gear (there was usually 15 minute breaks between lectures).
1. Storage cost. Unlike prior video capture mechanisms that require a videographer to be physically present in each lecture, the storage support personnel need not be in-situ. However, the cost to expand traditionally managed storage to accommodate all the videos is nontrivial. Each semester, our 2,500 courses in the entire university would require 92 TB of storage; providing a reliable and managed storage for this amount is expensive. Just a few years ago, the university allocated an order of magnitude less storage per course. If those trends continue, the university might ultimately invest in enterprise class storage for storing videos. In the meanwhile, we boot strap the process by arguing for a storage solution that relaxes traditional reliability guarantees. We expand on this storage in Section 3.1.1.
2. Local versus remote distribution. The student location plays an important role. Current students access the lectures from campus, dormitory as well as from off-campus locations. Currently, the university uses a 200 Mbps link to access the Internet. The university also has special peering agreements with some local ISPs in order to service students who live off-campus. Also, private email conversations show that our alumni are continuing to access the videos from remote locations. Hence, we investigate local as well as remote distribution.
3. Public versus private. Another question is whether the videos should be publicly available or restricted only to the students who took the course. Maintaining access control lists, especially for alumni can be hard. Hence, we publicly released the videos to everyone. Publicly releasing the videos meant that the number of accesses could be high. For example, Camtasia tools can use the Screencast 4 service to distribute videos. Screencast provides 25 GB of storage and 200 GB of transfer bandwidth for about US$9.95 a month with an additional US$31.95 per 100 GB transfer block. Each of our HD lectures consumed about 1.25 GB of storage (processed using Camtasia). The access costs can quickly accumulate.
3.1.1 Video Annotations Useful for Local Distribution An important feature of personal capture is the ability of the instructor to add meaningful annotations post hoc while processing the videos. The instructor can splice the video and add a new video clarification. They can add markers for slide transitions as well as overlay textual clarifications. Video editing software make it relatively easy to add these annotations. Annotations which modify the video are available to the student regardless of the distribution mechanism. However, certain annotations depend on the distribution mechanism. Note that the time required to add annotations directly depends on its complexity; the instructor should strike the right balance.
For local distribution, we used annotations that are viewable on the iPod as well as on the Quicktime player. For the video objects, we manually marked the time at which we changed the Powerpoint slide (on the LCD projection). For the audio objects, we added a still image that showed the Powerpoint slide that was being discussed. These annotations appear differently in different players. For example, the audio podcasts can show the slide markers ( Fig. 5b) or the slide images themselves ( Fig. 5a). Playing the audio podcasts via Quicktime shows the slide images and chapter markers ( Fig. 5f). On the other hand, the video podcasts can display the slide markers ( Fig. 5d) as well as the actual video ( Fig. 5c). These annotations allow the students to choose the appropriate component of the video for quick review. Note that these annotations will not be visible if the audio and video objects are viewed on a player which did not recognize them, such as in the Sony PSP handheld unit.
3.1.2 Usage Statistics for Local Distribution First, we tabulate the amount of data transferred as well as the number of audio and video objects downloaded between February 2006 and November 2009 in Table 1. We also show the percentage of requests from within the campus as well as from the public Internet. As we noted in Section 2.3, the amount of data created in some semesters was smaller because of the 64 minute capture limitation. We serviced about 60 TB worth of data for over 200,000 objects. Of these, about 9.66 percent of the data (8.8 TB) were requested by on-campus users while the remaining 54.5 TB of data were requested by Internet users. Assuming a network capacity of 200 Mbps to the Internet, external users consumed videos worth over 25 days of our external network connection.
Analyzing the data for objects created for the different classes, we note that some classes were more popular than others. For example, the Spring 2006 offering serviced over 86 thousand requests (as compared to 200 thousand requests for all the semesters). In general, all the undergraduate Operating System courses were popular and serviced over 167 thousand requests (84 percent of all requests) and used about 47 TB (78 percent of the transferred data). Among campus users, the graduate Operating System course (Spring 2008) was popular, accounting for 20.55 percent of the data for that course. The Multimedia system course offering was also popular.
Next, we plot the quarterly change in the popularity of the various classes, both from inside the campus and from Internet users in Fig. 6. We observe a flash crowd in the second quarter of 2007 for the Spring 2007 Operating Systems course. Earlier, Table 1 showed that Spring 2006 course was popular. Among Internet users, Fig. 6 shows the popularity of the Spring 2006 offering increasing from serving 1.8 thousand objects in the second quarter of 2006 to over 9.4 thousands objects by the second quarter of 2009. Even among the campus users, the popularity remained stable at around 0.2 thousands. Note that the campus users exhibit a seasonal variation between summer and the rest of the academic year; the school does not offer many courses over the summer break. We observe that most lectures continue to remain popular, especially since the recent course offering in Spring 2008 could potentially subsume similar courses offered in the Spring of 2006 and 2007. It is likely that students who took the Spring 2006 offering preferred to review using those videos instead of using videos from the newer offerings of the same course. We observed that lectures are a continuum, replacing a single lecture from one semester with the corresponding lecture from a prior offering is not straightforward. One mechanism to conserve resources is to stop servicing requests for older courses. If users continue to request older offerings, we believe that objects should not be expired—at least within the three year window used in our analysis.
Finally, we illustrate the quarterly change in resource consumption for audio, SD, and HD videos in Fig. 7; Fig. 7a shows the magnitude of change both as a count as well as the amount of data transferred while Fig. 7b shows the relative percentage of each type of object. From Fig. 7b, we note that the relative popularity of audio objects is waning, in terms of volume: from about 10 percent in the second quarter of 2006 to 3 percent in the third quarter of 2009 and in terms of count: from about 25.8 to 22 percent, respectively. Grabe and Christopherson [ 31] also observed that psychology students did not prefer audio. The SD videos became inexplicably popular in the second quarter of 2007. Though such flash crowds are common in Internet scenarios, the size of the audio and video objects place tremendous stress on our networking infrastructure. Interestingly, HD videos are becoming more popular; having increased in count from 13.2 to 29.6 percent with the corresponding data volume from 21 to 59 percent. One of the persistent student complaints in Spring 2006 was the enormous size of HD videos; commodity technologies appear to be evolving to allow more students to use the HD videos. We saw corresponding drop in the popularity of SD videos. However, there is little evidence that our campus Internet connection is scaling at a similar rate to accommodate the three fold increase in the volume of HD videos.
In terms of the absolute counts and the amount of data transferred ( Fig. 7a), we note a steady increase in the amount of data transferred in each quarter. The amount of data consumed in a quarter by the HD videos increased from 0.3 TB in 2006 to over 5.6 TB. During the flash crowds in 2007, the SD videos also consumed about 5 TB of data in a single quarter. By 2009, we were consuming 9.4 TB in a single quarter or around 4.4 days worth of campus Internet connectivity.
Even though the University does not currently limit the amount of network resources used by a faculty member, the level of resource usage highlighted in this section is not sustainable, especially when other faculty members also release their videos for public consumption. The author recently participated in the university iTunes U advisory panel. Apple allows the university to store 500 GB worth of data on its cloud servers. The university can also host videos on its own servers. Many faculty and administrators of the panel assumed that the primary difficulty in having an iTunes U presence for the university is in producing the content for distribution over iTunes U. Unless the individual faculty member objected, there was unanimous support for publicly releasing as much contents as possible. However, our experience suggests that the cost of personally creating the video contents was relatively small. However, the storage and distribution costs can quickly overwhelm the campus resources if a significant fraction of the faculty followed in the author's foot steps and personally captured and distributed their own lecture videos.
3.2.1 Video Annotations Useful for Remote Distribution We describe our experiences with streaming as well as annotating videos using the YouTube service (Google Video did not support annotations). Note that we do not have control over the annotation mechanism or the policies on whether the object can be downloaded. For example, YouTube does not allow the students to download the videos; students are expected to be online while watching the stream. Given the proliferation of smart phone and laptops that are capable of playing YouTube streams, this restriction might be acceptable.
As a free service, the specific annotation mechanisms are controlled by YouTube and are evolving continuously. The annotations are browser based and are available from a wide variety of browsers and operating systems. YouTube allows a rich set of annotation that uses Speech bubble, Note, and Spotlight to directly add annotation elements into the stream at a specified time and spatial location. The instructor can also control the font and color elements in these annotations. The instructor can also authorize other users to annotate the videos. However, the system does not report the provenance records on where any annotations were made. Hence, we did not use this feature for our lectures. Even though these annotations are powerful, we believe that they are inadequate for instructional purposes. It is not possible to index and list all the annotation elements in a video, the annotations are viewed when the user watches the particular video segment. Lecture videos are not always watched sequentially; students require the ability to jump to discussions about specific slides, a capability already available from our local distribution (Section 3.1.1). Regardless, we continue to explore ways in which we can utilize annotations on YouTube.
3.2.2 Usage Statistics We plot the number of accesses as well as their geographical origin (as reported by YouTube) in Fig. 8. From Fig. 8a, we note that the number of accesses are increasing with over 200 access per day by February 2010. Also, in Fig. 8b, the darkness of the state indicates the popularity of the requests from that state. Most requests came from Indiana, the location of the University. A large number of requests also came from Ohio, a neighboring state as well as from California. California is a popular job destination for Computer Science graduates. It is possible that most of the requests from Indiana are from inside the campus, which defeats the purpose of making the videos available to Internet users. On the other hand, serving users from Ohio and California from YouTube can reduce the network load on the campus Internet link. Incidentally, these requests from YouTube have not made a significant impact on the number of requests from the campus ( Fig. 6).
The author is with the FX Palo Alto Laboratory, 3400 Hillview Avenue, Palo Alto, CA 94304. E-mail: email@example.com.
Manuscript received 29 Nov. 2009; revised 1 Mar. 2010; accepted 30 Nov. 2010; published online 22 Mar. 2011.
For information on obtaining reprints of this article, please send e-mail to: firstname.lastname@example.org, and reference IEEECS Log Number TLT-2009-11-0159.
Digital Object Identifier no. 10.1109/TLT.2011.10.
Surendar Chandra received the PhD degree in computer science from Duke University. He held positions in academia at the University of Georgia and Notre Dame and in industry at the FX Palo Alto Laboratory. His research interests include experimental systems topics in multimedia, storage, security, networks, and sensor systems. He is the recipient of a US National Science Foundation CAREER Award and is a senior member of the ACM.