Issue No.03 - July-Sept. (2013 vol.20)
Published by the IEEE Computer Society
John R. Smith , IBM T.J. Watson Research Center
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MMUL.2013.39
Machine learning has become an indispensible tool for the multimedia community. Given large amounts of data, computers using machine learning are able to create rich representations and accomplish impressive discrimination tasks. Yet, the way machines learn is still differs significantly from how humans learn. EIC John R. Smith explains that the way forward is for the multimedia field to create appropriate lesson plans or more generally develop curriculum-based approaches to multimedia machine learning.
Machine learning has become an indispensible tool for the multimedia community. It is being applied for content analysis, speech recognition, computer vision, multimedia retrieval, and many more problems. 1,2 Given large amounts of data, computers using machine learning are able to create rich representations and accomplish impressive discrimination tasks. Yet, the way machines learn still differs significantly from how humans learn. Machine learning generally uses statistical and mathematical techniques that do not have a biological basis. Examples include support vector machines (SVMs), Gaussian mixture models (GMMs), neural nets, and others. However, modeling techniques are still evolving and one day may come closer to those used in human learning.
The way computers receive instruction in machine learning is also different in important ways. Typically, everything (data, concepts, learning problems) is presented at once. In doing so, the computer needs to simultaneously create a range of simple and complex representations and learn to solve easy and hard problems. For example, when given training data for thousands of types of animals, computers must learn to discriminate dogs from alligators (basic) as well as understand the difference between Irish wolfhounds and Scottish deerhounds (advanced). We wouldn't teach our children that way, which is why schools are organized into grades, where early grades focus on simple lessons and higher grades build up to more advanced ideas. Likewise, we need to send the computer to school. We must create appropriate lesson plans or more generally develop curriculum-based approaches to multimedia machine learning.
Figure 1 shows an example framework for curriculum-based multimedia machine learning. As illustrated, images are ordered in terms of the complexity of the content, from simple objects to cluttered scenes. The images are introduced in batches of increasing complexity to allow the computer to develop increasingly sophisticated representations that it builds on sequentially. Similarly, classification problems are ordered in terms of difficulty to allow the computer to acquire basic discrimination capabilities that become the foundation for advanced problems. Although building confidence is not exactly what matters for the computer, deeply layered learning can use these learned representations and discriminators as building blocks for subsequent levels.
The use of curricula for machine learning is motivated by human and animal learning. The idea of shaping is to schedule a progression of training exercises that establish basic concepts early on, which are then built on to acquire more complex concepts. Shaping has its origins in the work of B.F. Skinner, who discovered that the learning of complex skills improves through successive approximations compared with pure trial and error. 3 Given the effectiveness of shaping in human and animal learning, it is reasonable to apply it to machine learning.
The concept of shaping appears in the machine learning literature mainly in two general frameworks. One consists of learning language models and grammars. J.L. Elman showed how a connectionist network performs better in learning grammars when forced to start small and undergo developmental changes that resemble the increase in working memory occurring over time in children. 4 This was achieved by providing gradually more complex sentences in successive learning stages. A similar approach was developed for learning language models using a deep neural network by Yoshua Bengio and his colleagues. 5 Shaping has also been explored in robot vision for reinforcement learning. One approach is to start the robot in states that are "close" to the desired goal and then progressively introduce more complex situations that are further away. 6
Increasing availability of data creates more opportunity to improve multimedia analytics capabilities. And advances in computation make computers capable of metaphorically walking while chewing gum. Nevertheless, computers still need to learn to walk before learning to run. That's why we need to learn how to better structure machine learning through lesson plans and curricula to achieve more effective overall learning. That is lesson number one for us.
John R. Smith is a senior manager of intelligent information management at IBM T.J. Watson Research Center. Contact him at firstname.lastname@example.org.