The Community for Technology Leaders

Adapting the Speed of Reproduction of Audio Content and Using Text Reinforcement for Maximizing the Learning Outcome though Mobile Phones

Mario , IEEE
Pedro J. , IEEE
Carlos Delgado Kloos, IEEE

Pages: pp. 233-238

Abstract—The use of technology in learning environments should be targeted at improving the learning outcome of the process. Several technology enhanced techniques can be used for maximizing the learning gain of particular students when having access to learning resources. One of them is content adaptation. Adapting content is especially important when using limited devices such as mobile phones. Content can be adapted to restrictive network conditions, to limited terminal capabilities, to different user preferences and learning styles or to external elements in the learning and user contexts. This paper studies and analyzes the impact of modifying and therefore adapting the speed of reproduction of audio content in mobile phones on the learning gain of the user. The paper shows that the optimum speed for Spanish audio is around 206 words per minute. The impact of enhancing the audio reproduction by presenting an equivalent version in text displayed on the screen of the mobile phone is also studied. The paper concludes that this text is only important for students younger than 15. The paper analyzes data from 100 Spanish speaking users that are grouped according to different criteria, such as gender, age, or level of studies. The results are presented and discussed.

Index Terms—Audio content adaptation for mobile devices, mobile technology for education, mobile supported learning to maximize learning outcomes, learning adaptive systems, student experiments.


The steady improvement in the computational capabilities of mobile personal devices, the increasing needs for specialized and personalized training and the pervasive availability of such mobile devices are leveraging the definition, creation, development, and deployment of new m-learning architectures and environments in which personalization and content adaptation are of the utmost importance [ 1], [ 2], [ 3], [ 4], [ 5], [ 17]. The use of mobile devices for consuming learning resources creates challenges, such as adapting content to limited and heterogeneous environments with limited capabilities over low-speed wireless networks [ 6], while maximizing the quality of experience of the user [ 7] and maximizing at the same time the learning outcome for that user.

Adapting e-learning environments to mobile limited devices requires both the adaptation of the learning resources and the way they are orchestrated into a unit of learning. This paper concentrates on learning resource adaptation. Learning resources are normally composed as structured multimedia containers made of different single media components [ 8]. The adaptation of single media components has mainly been studied in previous related literature as a combination of three adaptation techniques: transcoding (such as image size reduction, format conversion or video frame dropping), transmoding (changing video to text, for example), and summarization [ 9]. This paper proposes and studies a forth technique for single media adaptation: the speed of reproduction.

The adaptation of multimedia resources to be consumed anywhere, anytime, and with any device has been taken into account by standards such as MPEG-21 Digital Item Adaptation (DIA) [ 14], [ 15], which specifies description formats to assist with the adaptation of Digital Items in order to enable transparent and augmented use of multimedia resources across a wide range of networks and devices used by different communities. MPEG-21 DIA takes into account the particular characteristics of the user, the terminal being used, the network connection and the user's natural environment for multimedia content adaptation. It also specifies content centric metadata facilitating the content adaptation process itself. This paper analyzes the relationships between the speed of reproduction of audio content in Spanish and the assimilation of this content by the users which complement the user's characteristics information in MPEG-21 DIA and provides additional information for the audio content reproduction.

The adaptation of the speed in which multimedia content is reproduced is one of the tools that students currently use to improve their learning outcomes. Video and audio rates have an impact on the student's perceived quality [ 16] and this is translated on different learning outcomes. This paper provides some data for automating the adaptation of audio content in order to maximize the learning outcomes depending on some characteristics of the student such as age, gender of level of studies.

The decomposition of learning resources into single media components also decomposes the adaptation process for such learning resources. Adapting learning resources to mobile devices require a multimedia adaptation layer and a learning object adaptation layer [ 6] in which high-level learning objects are composed of low-level multimedia objects. This paper studies and proposes a learning object adaptation for mobile devices in which learning resources are created as a combination of audio and text components and the adaptation process finds the way to combine them together to maximize the learning outcome of the user.

This paper therefore studies and analyzes the impact on the learning outcome of a user of a mobile phone when adapting the speed of reproduction of audio content and when adapting the composition of the learning resource as a combination of audio and text learning components. The faster a content is reproduced the more information per second is sent to the user (which can improve the amount of concepts learned per second for this user). However, the faster the reproduction the more difficult it is to understand and assimilate the information. There is therefore a tradeoff when changing the speed of reproduction between the information presented to and assimilated by the user. This paper uses the information obtained from 100 Spanish speaking users to try to find the best rate of audio content reproduction for Spanish learning resources. The users are categorized according to different parameters, such as age, gender, and level of studies. The positive or negative impact on the user's learning gain as a result of creating learning objects that present together the same information in text and audio formats is also analyzed compared to the same learning objects containing only audio content.

Adaptation Types

Adapting the contents delivered to a mobile device can be carried out in different ways, using different techniques depending on the objective and type of adaptation. This section makes a short review of different adaptation types and sets the scope of this paper.

The ways in which content can be adapted to be delivered to users of mobile devices can be categorized in different ways. Tretiakov and Kinshuk [ 10] categorize adaptation of content to be delivered to mobile devices into: adaptation to communication channels, adaptation to the mobile device, and adaptation to the user. Wang [ 11] highlights the importance of using the context for adaptation in mobile learning. Context is defined as “any information that can be used to characterize the situation of learning entities that are considered relevant to the interactions between a learner and an application.” Forte et al. [ 12] identify three different types of content adaptation in ubiquitous computing: adaptation to the user preferences, to the access device capabilities and to the delivery context. Reveiu et al. [ 13] also identify the adaptation of learning multimedia contents for mobile devices taking into account the user preferences and the device characteristics. Muntean [ 7] identifies the user profile (containing information, such as the user's goals, interests, knowledge level, content type preferences, and presentation style), the performance features (such as the device capabilities), the type of access, and the state of the network as the main components for content adaptation.

This paper takes into account the limited capabilities of mobile phones and makes use of simple audio and text-based contents. The use of other multimedia formats, such as video or hypermedia documents, has been excluded (these formats are supported by modern mobile phones and smart phones but they cannot be included in the maximum common divisor of features among older terminals). The contents are adapted to the user by selecting an appropriate speed of audio reproduction and by deciding if to complement the audio reproduction with the visualization of a reinforcement text. This content adaptation to the user takes into account a user profile which contains parameters, such as the user age, gender, and level of studies.

Adaptation Goals

There may be different goals targeted when adapting learning contents to mobile phones. This section makes a brief presentation of some of them and establishes the objectives of this paper.

Gang and Yang [ 6] propose a learning resource adaptation and delivery framework for mobile learning which focuses on solving the bandwidth and mobile device limitations. The underlying objective is to have a working application which makes use of the capabilities available in the environment. Tretiakov and Kinshuk [ 10] define an approach to incorporate different tutoring strategies and best practices minimizing a distance which takes into account the variations in communication channels, end-user devices capabilities and user profiles. Reveiu et al. [ 13] define a content adaptation framework for m-learning which targets a multidevice environment. Forte et al. [ 12] present a framework of components for content adaptation that facilitates the development of software reuse. Wang [ 11] focuses of maximizing the use of the information available in the user context for adapting the content presented in the mobile device. Muntean [ 7] presents a model that tries to maximize the quality of experience (QoE) perceived by the user.

This paper highlights a different approach in the definition of its objective: the maximization of the learning outcome for a particular user when consuming a learning resource. This goal is implicit in any learning experience but this paper makes it explicit for an adaptive m-learning environment. Therefore, the contents adapted to the user profile should result in the maximization of the learning gain for a user inside such a profile.

The Conducted Experiment

This section presents the details of the experiment carried out to gather study and analyze data from Spanish speaking users assessing the degree of understanding, comprehension and assimilation of adapted content by modifying the reproduction of a simple audio optionally complemented with a reinforcement text.

A Java application for mobile devices in J2ME ( has been developed and used (running on a Nokia 6131 mobile phone). The application presents a configuration screen that allows to select the rate of audio content reproduction and to specify if an equivalent text version of the audio content is also required to be presented to the user. After accepting the configuration options the reproduction of the content is stated according to those options. Fig. 1 presents some screenshots of the application in two scenarios in which the application is configured to show or hide the reinforcement text. The left part of Fig. 1 shows the configuration screen. The right part captures the visualization of the text according to the configured parameters. The audio is always reproduced independently of the text visualization. The rate of reproduction is also configured in the configuration screen.


Figure    Fig 1. Screenshots of the application. The name of the application is “aprende y entretente” (learn and enjoy). The main screen allows the user to select the audio rate (“tasa de reproducción”) and where to see the associated text or not (“ver texto”).

The learning content to be reproduced by the application presented in Fig. 1 was selected taking into account two important issues for the experiment: it had to be generic enough to try to minimize the prerequisites for its understanding independently of the user profile and it had to contain information elements not previously known by the selected users for the experiment (a pretest should provide no previous knowledge about the content). After analyzing some alternatives a content related to ecological agriculture was selected since it optimally fulfilled the previous criteria.

The length of the reproduced content was also important. Long contents make difficult for the user to retain specific details that need to be addressed when analyzing the understanding of content by the user. Short contents, on the other hand, do not provide enough data to distinguish if the content was properly understood by the user. A compromise length of 173 words was used. This number of words is the result deleting some phrases in the text about ecological agriculture available on Wikipedia (ógica).

To analyze the understanding of the content a posttest containing five single choice questions with four possible answers per question was used. All the answers contained words reproduced by the application but with different meanings. Users understanding the reproduced content had no difficulty in properly answering the five questions. Users experiencing understanding difficulties failed to provide correct answers to all of the questions. The rate of badly answered questions was used to estimate de degree of understanding.

A set of 100 randomly selected Spanish speaking users with different profiles was used to gather data for the experiment. Each user randomly configured the parameters of the application within a maximum and a minimum limit. The maximum speed of reproduction of audio content was set to 235 words per minute. The minimum speed was configured to 173 words per minute. The reinforcement text was presented of omitted independently of the speed of the reproduction of the audio. The results of the experiment are presented, studied, and analyzed in the following section.


According to the objectives of the paper, this section tries to answer the following questions:

  1. Is there an optimal speed of audio reproduction for Spanish speaking users depending on their age, gender, and level of studies that maximizes the number of concepts learned per minute?
  2. Is there any relevant increase in the learning outcome of the user when creating composite learning objects that reinforce the audio content with an equivalent text version?

The data gathered using the experiment described in the previous section is presented, studied, and analyzed.

Table 1 presents the means and standard deviations for the number of right answers for all the users (independently of their profiles) with respect to the speed of reproduction (in words per minute) of the content. The results in Table 1 show that the speed of audio content reproduction has an impact on the user understanding of the content. This understanding gets better as the speed is of reproduction is decreased but gets stable for speeds of reproduction less than 206 words per minute. The rates analyzed (from 173 to 235 words per minute) have been selected based on the results of a reduced proof of concept experiment previous to the main experiment.

Table 1. Means and Standard Deviations of the Number of Right Answers for All the Users

Dividing the users into two groups according to the speed of audio content reproduction the t-test can be used to validate the statistical significance of the negative impact that the increase on the speed of reproduction has on the understanding of the content by the user. Comparing users to which the audio was reproduced faster than 206 words per minute and the rest of the users the t-test gives a p-value of 0.004 (much smaller than 0.05) showing the statistical significance of the difference of the means of the two groups.

Table 2 does a similar study of the users but taking into account the number of right answers in the posttest depending on the visualization or not of the reinforcement text.

Table 2. Number of Right Answers in the Posttest Depending on the Visualization or Not of the Reinforcement Text

The t-test for the results in Table 2 provides a p-value of 0.35 (greater than 0.05) which shows no statistical significance on the difference of the means of the two groups. A possible justification on this is that the user is not able to concentrate his or her attention to the information in both media tending to concentrate on one of them, and therefore, achieving no significant improvement when the second media is added to the learning object.

The results in Tables 1 and 2 are aggregate results. The rest of the section is going to categorize the users according to their age, gender, and level of studies to analyze if the content adaptation process can be improved depending on such parameters.

To analyze the impact that the age of the user may have on the understanding of the learning content (when using a mobile device as the learning interface), the users have been divided into groups trying to meet the ceteris paribus assumption (similar characteristics regarding the speeds of audio content reproduction and visualization of the reinforcement text). Table 3 presents the mean and standard deviation values for the number of right answers in the posttest for each group of users. The understanding of the learning content is fairly constant for the groups of users with ages between 15 and 50. Users with less than 15 and more than 50 show a worse performance.

Table 3. Mean and Standard Deviation Values for the Number of Right Answers in the Posttest for Each Group of Users

In order to adapt the speed of reproduction of the audio content and the visualization of the reinforcement text for age groups of users younger than 15 and older than 50 a more detailed analysis is needed. In order to find the optimal speed of reproduction for these groups, the criterion used has been the maximization of the statistical significance as measured by the t-test. In other words, dividing the groups of users younger than 15 and older than 50 into two groups each by using as the classification parameter a particular value for the speed of audio content reproduction (those users to which the content was reproduced faster and those to which the content was reproduced slower than that rate), the value for this classifying speed of audio reproduction that maximizes the statistical significance as measured by the t-test is the one that makes the p-value of the t-test smaller for the resulting groups. The values (in words per minute) are captured in Table 4.

Table 4. Optimal Speed of Reproduction

One conclusion from Table 4 is that audio contents for young people should be reproduced in the mobile device at a lower speed to maximize the understanding of the content (following the t-test criterion). However, the optimal speed of audio content reproduction for users older than 50 is similar than the speed for users between 15 and 50 (although the comprehension is worse).

The positive, neutral or negative impact of displaying the reinforcement text for users less than 15 or more than 50 is captured in Table 5 which presents the mean values for the number of right answers per group. Table 5 shows that there is a positive influence in the comprehension of the learning content for users younger than 15 when the reinforcement text is presented. However, for users older than 50, presenting the text version of the content in parallel to its audio reproduction has a neutral or even negative effect.

Table 5. Impact of Displaying the Reinforcement Text

The impact that the gender of the user has on his or her understanding of the learning content (ceteris paribus) is captured in Table 6. The results show no significant difference between male and female users (the t-test gives a p-value of 0.39).

Table 6. Impact that the Gender of the User has on His or Her Understanding of the Learning Content (Ceteris Paribus)

The level of studies is analyzed in Table 7. The users have been categorized depending on the highest degree that they have according to the Spanish education standards (primary, secondary, and university studies). Table 7 captures the mean and standard deviation values for the number of right answers for each group of users. There is a positive correlation between the ability to understand mobile learning contents and the level of studies.

Table 7. Influence of the Level of Studies in the Understanding of the Learning Contents

The results in Table 7 capture the positive correlation between the level of studies and the assimilation of content. Table 3 presented that there is also an influence of the age of the user and the level of assimilation of the content. In order to validate that the correlation in Table 7 is not a result of the positive impact that the age of the user may have on the assimilation of the content, the mean values of the age of each group of users has been captured in Table 8. The user with primary education were in fact older that those with secondary and university studies which decorrelates the results in Table 7 from the positive effect of the age of the user in the assimilation of the content.

Table 8. Mean Values for the Age of the Members of Each Group


This paper has analyzed data from 100 Spanish speaking users grouped according to different criteria, such as gender, age, or level of studies to assess how to better adapt learning contents when presented to the user by means of a mobile phone. The content adaptation parameters have been the speed of reproduction of audio content and the use or omission of a reinforcement text. The objective of the adaptation has been the maximization of the learning outcome for a particular user.

The paper has presented that there is generic optimal speed of 206 words per minute for Spanish speaking users in Spain that maximizes the learning outcome of the process. This speed should be decreased for young users. The paper has also presented that there is not a clear impact on the learning outcomes when complementing the audio information with a reinforcement text except for young students (in fact, providing the information in both formats, audio and text, can be counterproductive for people older than 50). The paper has also shown that the gender of the user has no significant impact for the content adaptation either.


The work presented in this paper was partially funded by the Spanish project Learn3 TIN2008-05163/TSI within the Spanish “Plan Nacional de I+D+I” and by the SOLITE CYTED Program 508AC0341. The authors also want to acknowledge the ideas contributed by the Gradient Group in the Carlos III University of Madrid.


About the Authors

Bio Graphic
Mario Muñoz-Organero received the MSc degree in telecommunications engineering from the Polytechnic University of Catalonia in 1996, and the PhD degree in telecommunications engineering from the Carlos III University of Madrid, Spain, in 2004. He is an associate professor of telematics engineering at the Carlos III University of Madrid. His research interests include topics related to open architectures for e-learning systems, open service creation environments for next generation networks, advanced mobile communication systems, pervasive computing, and convergent networks. His main current interest is in e-learning and m-learning technologies. He has participated in European-funded projects, such as E-LANE, and in Spanish-funded projects, such as MOSAIC learning and Learn3. He also has more than four years of experience working for the telecommunications industry in companies such as Telefonica R&D and Lucent Technologies. He is a member of the IEEE.
Bio Graphic
Pedro J. Muñoz-Merino received the MSc degree in telecommunications engineering from the Polytechnic University of Valencia, Spain, in 2003, and the PhD degree in telecommunications engineering from the Carlos III University of Madrid, Spain, in 2009. He is a teaching assistant of telematics engineering at the Carlos III University of Madrid. He has participated in various research projects, such as E-LANE or MOSAIC learning, and is the coauthor of several papers in conferences and journals. His research interests include e-learning, technology-enhanced learning, and web science. He is a member of the IEEE.
Bio Graphic
Carlos Delgado Kloos received a PhD degree in computer science from the Technical University of Munich, Germany, and another PhD degree in telecommunication engineering from the Technical University of Madrid, Spain, in 1986. He is a full professor of telematics engineering at the Carlos III University of Madrid, Spain, where he was the founding director of the Department of Telematics Engineering. He is presently the associate vice-chancellor, the director of two master's programs (one on e-learning), and the director of the Nokia Chair. He has been involved in more than 20 projects with European (Esprit, IST, @LIS), national (Spanish Ministry), and bilateral (Spanish-German and Spanish-French) funding. He has authored more than 200 articles in national and international conferences and journals. He has authored one book, coauthored another, and coedited five. He was the coordinator of the European-funded E-LANE project on e-learning in Latin America and a member of the board of directors of the LRN Consortium. His research interests include educational technologies. He is a senior member of the IEEE.
62 ms
(Ver 3.x)