Pages: pp. 187-195
Abstract—In this paper, a new dance training system based on the motion capture and virtual reality (VR) technologies is proposed. Our system is inspired by the traditional way to learn new movements—imitating the teacher's movements and listening to the teacher's feedback. A prototype of our proposed system is implemented, in which a student can imitate the motion demonstrated by a virtual teacher projected on the wall screen. Meanwhile, the student's motions will be captured and analyzed by the system based on which feedback is given back to them. The result of user studies showed that our system can successfully guide students to improve their skills. The subjects agreed that the system is interesting and can motivate them to learn.
Index Terms—Animation, computer uses in education, motion analysis.
Dancing is a popular activity, which can be enjoyed by people of different ages. There are mainly two ways to learn dancing. The first way is to attend a dance lesson in which a teacher demonstrates the moves, points out the mistake made by the students, and guides them to improve. It is the most effective way but some people do not have time to attend lessons and teachers are not always available. The second way is self-learning by watching demonstrations in videos such that students observe the moves and practice by themselves. However, students may not be able to completely understand the moves and perform them correctly. The dance motion can also be visualized in a 3D virtual environment [ 1], [ 2]. However, it also suffers the same shortage as watching video, i.e., lack of feedback.
There exist some commercial dancing games such as Dance Dance Revolution Hottest Party [ 3]. The game is played with Wiimote and a pad, which contains four arrow panels: up, down, right, and left. The players have to step on the correct panels and wave the Wiimote with correct timing according to the instructions provided by the game. A score will be given according to how well they can follow the instructions. Although these games have brought in a new population of game players who are interested in learning dance through game plays, it is unclear how much the game play can help them to learn the movements. Especially, the input data is greatly decimated to ease the analysis of the movements. Although such decimation is acceptable for entertainment, the system cannot give appropriate advice for training purposes of the whole body movements. The games usually provide only a scalar score, which is not enough for users to know how to improve. To summarize, the recent games are not appropriate for training purposes because of the following limitations: 1) the captured data does not cover all the movable body parts and 2) the game design is more focused on having fun instead of providing training. This motivates us to develop a learning tool, that is, fun and teach motions at the same time.
In this paper, a virtual reality (VR) training application integrated with motion capture technology for dance training is proposed. The user can simply wear the motion capture suit and follow the movements of the virtual teacher and further receive feedback on how to improve the movements. The motion capture system can collect enough data, which is useful for evaluating the difference between the learner and the virtual teacher.
The main difference between our system and other previous motion training systems such as [ 4] is that our system carries out a more comprehensive analysis of the user's whole body motion in order to let him/her know, which body parts are moved incorrectly. We also propose a new way to provide feedback to the user by comparing two motions. The experimental results show that our system better assists students in learning comparing to the traditional “watching video” approach. The students also think the learning process is fun and motivates them to learn.
The rest of the paper is organized as follows: In Section 2, we introduce the related work. In Section 3, the summary of the system is explained. In Section 4, our system is evaluated. We conclude the paper and discuss about the future work in Section 5.
In this section, the works that are related to our proposed system will be reviewed. First, the VR-based learning systems for different applications will be reviewed followed by some VR-based dance learning systems that are directly related to our system. And then, the motion matching of two different movements will be discussed. Finally, the animation techniques that consider both music and motion data will be introduced.
VR applications assist people to get immersed into the virtual world and experience a real world like or an imaginative virtual world. Users can experience various kinds of sceneries such as fighting with virtual boxers, or dancing with virtual partners [ 5], [ 6]. When the VR application is designed to adopt the motion capture technology, the interaction between the computer and human users can be further enhanced. By including some gaming elements, users can have fun and be more motivated to use the system thus results in better progress for motion training [ 7].
Some VR applications aimed on motion training to replace the traditional master-to-student teaching approach. Using motion capture technology, the computers can track the movements of users and supervise them on motion training. Chua et al. [ 8] proposed a VR motion training system for practicing the Chinese martial art Tai Chi. The avatar of the Tai Chi learner and his/her master are rendered in the virtual environment. The learner observes the motion of the virtual master and mimics it until the virtual avatar performs the same motion as the virtual master.
Komura et al. [ 4] proposed a martial art training system based on the motion capture system. Users wear head-mounted display and practice defense/offense with the virtual coach. The user is notified the failed defenses by the visual effects, and his/her performance is evaluated by factors such as total movement for defense, minimum preaction for attack, etc. In their experiments, it was found that a novice player improves his/her performance after training with the system several times. This shows that feedback like visual effects and scores can help students to learn.
There are some existing works for dance learning. Davcev et al. [ 9] presented a dance learning system based on synchronized presentation of several streams of data, including video streaming, 3D animation, music, textual description, and labanotation representation. The user can interactively switch between the streaming modes while learning the dance. For example, he/she can switch from the video mode to the 3D animation mode in the middle, which helps to vide the motion from different viewing angles. Soga et al. [ 10] developed an integrated system for contemporary dance by 3D motion clips. The system can help the teacher design sequences of dance motions by basic motions. The system also provides 3D demonstration of motion by a virtual avatar. In their experiments, the dancers commented that feedbacks of the performance such as teacher's comments are required for dance lessons. As demonstrated in the experimental result of [ 10], the feedback is important because it accelerates the student's learning process of dancing.
Nakamura et al. [ 11] designed a training tool for dancing, which consists of a timing-vibro device and a robotic screen. The timing-vibro device reminds users when a move should be made and the moving of the robotic screen shows the translation in dance performance to the user. A virtual representative of the student is rendered in real time and is shown on the screen. Users can observe their own motions and compare them with the master motion. This application is similar to our proposed tool in the sense that our tool also involves real-time animation. Our tool, further uses the captured motion data for evaluation rather than just for animation.
Hachimura et al. [ 12], [ 13] integrated motion capture and VR technology to develop a dance training system. A head-mounted display shows the professional dancer's motion overlapped with the body of the virtual character controlled by the user. The trainee can observe where his/her body does not overlap with that of the professional dancer. In their system, the user needs to perform a motion and observes his/her avatar at the same time. This may affect the performance and require enough experience to identify mistakes. One of our contributions is to provide a learning tool that makes use of the motion matching technology and provide feedback to the user. The user can thus focus on his/her own performance and at the same time identify the weaknesses.
Matching two motions is an essential step in motion recognition. In motion matching, the similarities between each of the motion templates and the input motion are computed. Then, the input motion is recognized as the class of the motion template with the highest similarity. Joint angles are usually used in motion matching. For the Japanese dance Furi, Hachimura et al. [ 13] proposed to use features in Laban Movement Analysis (LMA), weight, space, time and shape to analyze, and evaluate dancing movement. Weight refers to kinetic energy produced at each part of the body during movement. Space shows direction of the movement of the body as a whole using facing direction and co-ordinates of a marker called “root,” which locates at center hip. Time expresses the acceleration of each part of the body. Shape incorporates overall shape of the body during the movement. These features are extracted when motions are compared and the difference between motions at any time instants can be computed. This method can globally match postures but cannot locally define the mismatches.
Yoshimura et al. [ 14] defined four spatial indices to quantify the characteristic features. In their experiment, an expert's motion and a beginner's motion showed great differences in these four indices. Qian et al. [ 15] designed a gesture recognition engine in a dance system of a similar design . Its recognition is based on joint angles of 10 body parts including head, torso, upper arms, forearms, upper legs, and lower legs but ignores those of the hands and feet. The similarity between two gestures is calculated by Mahalanobis distance between two set of joint angles. Kwon et al. [ 16] proposed a motion training system co-operating with visual and body sensors. In their design, the user's forearm is attached with body sensors that can measure two angular values, pitch, and roll throughout the movement. The similarity between the template motion and the user's motion is calculated by using the euclidean distance between the two angular value sets. In the above researches, particular joint angles are considered in the analysis of specific motion.
Magnenat-Thalmann et al. [ 17] presented their advanced technology on digitizing and rendering 3D folk dance . They first used the optical motion capture system to capture dance motion from real dancers. Then, some postprocessing techniques such as retargeting and music synchronization were applied to the motion data. Finally, the animation was presented to users through the internet. Alankus et al. [ 18] proposed an automatic system that can synthesize dancing motions given a song or melody. A virtual character can be driven according to the beat of the music. Kim et al. [ 19] proposed a scheme to synthesize a new motion from unlabeled example motions by synchronizing the motion beat and the beat of an input music. Furthermore, Shiratori et al. [ 20] proposed a system that can synthesize a new dance sequence, which considers both the beat of the music and its emotional aspects.
In order to overcome the shortage of current self-learning method in motion training, we propose a solution with a practical example to address the problems. Our solution makes use of motion capture technology and motion analysis method. In addition, we will propose how teachers and students can use the system to achieve the educational purposes.
We implemented the system as a showcase. It was evaluated through user studies. The result and discussion will be presented at the evaluation section. The system will also be compared with tradition method.
The architecture of the system includes four components: 3D graphics, motion matching, motion database, and motion capture system. Fig. 1 shows the relationships between each component. The user's movements, which are obtained by the motion capture system are compared with the motions in the motion database through the motion-matching component. The 3D graphics component visualizes the movements by the user and the virtual teacher (template motion).
Figure Fig. 1. System architecture.
To learn a move, students first need a demonstration of the moves by a professional. The demonstration of the dance motion is done by rendering the 3D animation with OpenGL. The avatar can be driven by motion data. To facilitate the observation of moves, the student can change the demonstration speed and the viewpoint. Fig. 2a shows a virtual teacher demonstrating a dance motion and Fig. 2b shows the actual movement performed by a real dancer.
Figure Fig. 2. (a) Layout of the 3D viewer. (b) The actual movement by a real dancer.
The virtual teacher also appears when the student is practicing the moves. The student can thus imitate the moves of the teacher. They may also be able to notice the error of timing and movement performing in a glance. To facilitate the observation, we apply a mirroring effect to the virtual agents in the screen such that the student can learn under the same setup as in a dance classroom.
To provide suitable feedback throughout the training, it is necessary to capture and track the movements. It can be achieved by the motion capture technology, which is being used in animation, movie production, sport performance analysis, and medical treatment, etc. We adopt an optical motion capture system (see Fig. 3), which provides the highest accuracy and shortest response time, which are essential for real-time feedback to the users.
Figure Fig. 3. An optical motion capture system with cameras and a suit with markers attached.
Proper guidance can help students to improve and learn effectively. Several kinds of feedback are provided in our proposed learning tool. The first type of feedback is called immediate feedback (see Fig. 4). When the student is practicing, the movement is captured in real time and rendered with a virtual representative, which is displayed next to the virtual teacher and is formed by cylinders representing the body segments. The color of a cylinder shows whether the position of the body segment is correct. The yellow and red colors label the correct and erroneous parts, respectively. Through this visualization, the student can notice the errors quickly and correct his/her moves. The error can be caused by either wrong postures or timing errors. The student may then take a look at the virtual teacher and follow the correct movement. This feedback should be more useful for learning a long motion.
Figure Fig. 4. A sample of immediate feedback.
The second type of feedback is a score report. Students are shown a general report about their performance. Fig. 5 shows an example of a general report. From the report, students can get an idea about, which joint is better and, which joint may need improvement. In this particular example, the student can notice that the movement of his left arm is the worst among the whole body, thus he can focus on correcting this part.
Figure Fig. 5. A sample of score report.
The third type of feedback is the slow motion replay. It is shown for students to learn about how and where the errors happened. Fig. 6 shows a screen shot of the slow-motion replay. Through the replay, students can realize the errors in each posture by observing the color on their virtual representatives. The color shows the correctness of the limb movement from deep red to white in the ascending order according to the correctness. When the student finds a body part in red, he/she then checks the correct position from the virtual teacher, which is also rendered in the replay. Sometimes, the error may occur because of wrong timing of launching individual moves, which can also be found by comparing the motion of the virtual representative and the virtual teacher. In the example shown in Fig. 6, the color of the left arm is deep red indicating that the error of the left arm is more serious.
Figure Fig. 6. A sample of slow-motion replay.
All the three kinds of feedback involve the comparison between two motions. A motion can be represented by a sequence of postures, thus two motions are compared among their postures.
To compare two postures, normalization is first performed so that the root and the facing direction of the postures are the same. Since the student and teacher may have different body size, normalization is also performed in the calculation of difference between two postures. The positions of the joints are divided by the total length of body segments in the calculation. A posture is represented by 15 joints: left/right shoulders, left/right elbows, left/right wrists, left/right thighs, left/right knees, left/right ankles, head, neck, and torso.
In the intermediate feedback, the color depends on how accurate each joint is moved in a posture. It is calculated by the euclidean distance between the template posture and the student's posture. When the distance regarding a particular joint is higher than a threshold, the color of the corresponding body part will be shaded in yellow, which means acceptably correct. Otherwise, a red color is shaded, which means the instantaneous posture seems incorrect.
In the score report, the score of each joint movement is computed based on the euclidean distance of the joint positions between the template posture and the student's posture averaged over all the frames. The score is obtained from the additive inverse of the distance. The overall similarity score is then normalized to the range from 0 to 100.
In the slow motion replay, the color scheme is similar to the one in the intermediate feedback. Indeed, the color is white when the euclidean distance is zero, i.e., the comparing motions are identical. It becomes more red when the euclidean distance is getting larger, i.e., the comparing postures are less similar.
Students are recommended to use the system by looping these steps: watching the demonstration, practicing, and understanding the feedback. Students can first get the basic knowledge about the moves in the demonstration. Then, they can practice the moves with our proposed learning tool. From the feedback, students find out the mistakes and understand how to improve. They can then go back to the demonstration for the timing information. Afterward, they can practice again. Actually, this process is analogous to a physical dance lesson.
Teachers can also make use of the system. Although, the system aims to provide lessons when teachers are not available, teachers can help to prepare the teaching materials by having their dance motions captured. At the same time, they can prepare a suitable syllabus for the students, for example, a set of moves and the order to learn.
Our system is evaluated by two means. One is to evaluate the performance of the evaluation function, which is essential to generate the feedback. The other one is a user study, in which subjects are invited to learn with our system.
There are three common features used for measuring the difference between two motions, joint position, velocity, and angle. Here, we compare the three different measures and examine, which one best suits for evaluating dance performance.
Two groups of motion pairs are manually formed: one that contains motion pairs that the two motions are similar (group 1), and the other that contains those, which are dissimilar (group 2). For the pairs in-group 1, the motions are the same movements performed by the same or different persons. For the pairs in-group 2, the motions are different movements performed by the same or different persons. By using different measures, the differences between two motions in each pair are quantified and obtained. It is obvious that a better measure will give a greater discriminative power between the two groups.
Six subjects who are students/staffs of City University of Hong Kong were invited to participate in this study. Table 1 shows the information of the six subjects who performed the motions. Two of them have attended dance lessons and the other four have not. To make sure that all the dance motions are performed correctly, the capturing of the motions was supervised by subject 2 who has 1 year of dancing experience. The motions performed by novices make the discrimination more challenging as the variations in their motions are relatively greater. 1,836 similar motion pairs and 15,172 dissimilar motion pairs were collected.
Among the three kinds of measures, we performed experiments to test their discriminative power between similar and dissimilar dance motions.
To obtain the difference of a motion pair with different durations, dynamic time wrapping is first performed to obtain a mapping between the two motions. The difference between each matched frame pair is first obtained as the euclidean distance between joint angles, joint positions, or joint velocities. Then, the difference between two motions is taken as the average distance of all the matched frame pairs.
To check if there is a significant difference between the difference values of the two groups, the right-tailed T-test is used and the result is shown in Table 2. Since the $p$ -value is very small ( $p<0.01$ ), it shows that all the three measures are able to discriminate whether two motions are similar or dissimilar.
The next experiment aims to measure the proportion of motion pairs that are mixed up. When the distributions of difference values of the two groups are plotted, we can see two curves formed by each distribution. The overlapping area refers to the motions that are mixed up between the two groups. As a result, the discriminative power can be evaluated by measuring the percentage of overlapping area in the two curves. Fig. 7 shows the distribution of difference values by the joint position and the overlapping area is found to be 5 percent. The overlapping areas of the distribution curves for the joint velocity and joint angle are 34 percent and 10 percent, respectively. Hence, the measure of joint position provides the best discriminative power among the three measures.
Figure Fig. 7. Distribution of difference values from joint position.
To evaluate the performance of our system, we invited eight subjects from City University of Hong Kong randomly, who are all male in the age group of 21-30, to learn dancing (see Table 3). They were randomly selected into an experiment group and a control group, and each group contained four subjects. The subjects in the experiment group were trained with the aid of the proposed system, while those in the control group were trained by self learning. The objective is to investigate the system performance from the following perspectives:
The achievements of learning outcome were assessed by measuring the change of skills on specific dance motions while learning through our system.
The four subjects in the experiment group were told to learn some dance moves with our system (see Fig. 8). Each of them learned three dance moves and spent 15 minutes for each move. The three moves are about 2-seconds long Hip-Hop dance moves that were captured from an experienced dancer. Before the starting of the course, they were instructed to watch demonstrations of the dance moves to be learned. This allows subjects to get some basic knowledge about these moves. At the beginning of the course, baseline scores were derived by the similarity measurement between the subjects' motions and the template motions. After the baseline testing, the course started. During the course, the subjects can watch the demonstrations by the virtual teachers, practice their moves through our system and check out the errors happened in their movements through the feedback provided. After 15 minutes, the subjects did the post-training testing and the scores were obtained.
Fig. 8. A subject is learning a dance motion.
To assess the changes of skills before and after the course, the scores were analyzed by paired T-test. The mean of baseline scores is 40.58 and the standard deviation is 4.87. The mean of post-training scores is 51.41 and the standard deviation is 5.23. The $p$ -value is 0.000011598, the $t$ -value is 6.9833 and the degree of freedom is 11. Since $p<0.01$ , it shows that there is a significant difference before and after the training. As the mean after the training is higher than that before the training, it further showed that there is a significant improvement after training with our system.
Our system is evaluated to check whether it is able to motivate the students in the learning progress. This can be shown by the result of postcourse survey. Fig. 9 shows the questions in the postcourse survey and their results. It shows that our system is interesting and able to motivate subjects to learn. Provided that the subjects only have a little interest on dancing, it is already encouraging to make half of them felt that the course is definitely interesting. According to the extra comments, some subjects found that the scores in the feedback stimulated them to achieve better. Some suggested that it would be more exciting if they know the highest score achieved by other learners.
Fig. 9. Result of postcourse survey of the experiment group.
Another part of the postcourse survey is to find whether the system can provide them an easy way of learning. From the survey, none of them thinks that the dance motion is difficult to learn. The result is acceptable as all the subjects did not learn dancing before. By the way, the survey showed that most subjects are willing to recommend other people to try our system. Overall speaking, the subjects enjoy learning dance with our proposed system.
To show that our system can overcome the shortage of self-learning approaches, four subjects in the control group were told to learn dance by self learning, i.e., no feedback is provided. Similar to the experiment group, the baseline and post-training scores were measured. In the control group, the subjects can learn dance moves by watching the demonstration and mimicking the movements without any external aid (see Fig. 10). In other words, no feedback is provided to them. The subjects in both groups have similar backgrounds and skills as shown in Table 3. A two sample T-test is carried out to compare the baseline scores of the experiment and control groups. The $p$ -value is 0.2116. Since $p>0.01$ , it shows that there is no significant difference between the baseline scores between the two groups.
Fig. 10. A subject tries to perform the same move as shown in the wall screen.
The change between baseline and post-training scores were analyzed by the paired T-test. The mean of baseline scores is 37.08 and the standard deviation is 7.1536. The mean of post-training scores is 37.92 and the standard deviation is 5.02. The $p$ -value is 0.3374, the $t$ -value is 0.4309, and degree of freedom is 11. Since $p>0.01$ , the results show that there is no significant difference before and after training. This result does not support what people always assumed practice makes perfect. This is because the subjects may not get the key point to improve their skill during the learning progress.
The improvement of subjects in experiment group and control group were also analyzed by another T-test. The improvement is obtained by post-training score minus baseline score. The mean of improvement in experiment group is 10.83 and the standard deviation is 5.37. The mean of improvement in control group is 0.83 and the standard deviation is 6.70. The $p$ -value is 0.0012, the $t$ -value is 3.9178, and degree of freedom is 11. Since $p<0.01$ , it shows that there is a significant difference between improvements in two groups. As the mean of the improvement in experiment group is higher than that of control group, it further showed that there is a significant improvement after training with our system.
Compared with the result of the experiment group, our system is shown capable to guide the students to improve their skill in the learning process.
The four subjects in the control group also completed the postcourse survey. The result is shown in Fig. 11. The subjects in the control group faced more difficulty to learn the dance moves compared with the experiment group. However, the result in the experiment group is only slightly better than the control group for the remaining part of the survey. In the extra comments in the survey, some subjects thought that the motion capturing system made the course more interesting. It is unexpected, when we designed the evaluation and this should be avoided in the further studies.
Fig. 11. Result of postcourse survey of the control group.
Overall speaking, this evaluation result supports the hypothesis that our system can assist the students in dance training better than just watching demonstration without feedback. The feedback is proven useful in guiding students into the correct direction in the learning process, as well as stimulating them to learn more.
In this paper, a dance training system using the motion capture system is proposed. A virtual environment simulates a real performance lesson is provided to students and it can work even though a real teacher is absent. Inside the room equipped by our proposed training system, the student's real-time motion can be captured while a virtual character representing him/her is animated in the virtual classroom. A virtual teacher can demonstrate different motions. When the student follows the way the virtual teacher dances, the system can provide immediate feedback to him/her after analysis of the captured motion. The virtual teacher can point out the mistakes made by the student and suggests him/her to improve by focusing on particular limbs at particular moves. A prototype of our proposed dance education system is implemented and demonstrated as a showcase. The experiment results proved the following contributions made by this learning tool using motion recognition. First, the system can evaluate the similarity between two motions robustly. Second, the user studies showed the learning in the experiment group who were using our system outperform the learning in the control group who were just watching demonstration videos. Furthermore, the subjects evaluated that our system is interesting and stimulate them to learn more.
As future work, the generation of dance lessons from the motion templates in the database can be automated by using pattern recognition techniques. Besides analyzing the body movements, the emotions expressed by the dancers will be evaluated. Some existing studies consider the speed, acceleration, and openness of the human body, more specific, high speed and light openness are used to represent happiness [ 21], [ 22] and those may be applicable to enhance our system. To help the user immerse into the virtual classroom better, a more realistic background as well as more visual effects can be provided. Practically, a student may not always face the screen especially when he/she is performing some turning-around movements in some dance. Therefore, our system could be equipped by movable screens or head-mounted devices. Not the least, more subjects will be invited to evaluate our system, and hence, more usability tests will be conducted in future.
The work described in this paper was fully supported by a grant from the Research Grants Councils of the Hong Kong Special Administration Region, China (Project No. CityU 1165/09E).