Scaling Up Programming by Demonstration for Intelligent Tutoring Systems Development: An Open-Access Web Site for Middle School Mathematics Learning
1. numbers and operations,
3. data analysis,
4. geometry, and
• The CTAT authoring tools ( Fig. 6, bottom), including the Behavior Recorder and tools for generalizing behavior graphs.
• GUI builder tools, such as the Flash IDE is used to create interfaces; we imported ActionScript-based components into the Flash IDE.
• The Mathtutor Server-Side Components (top-right of Fig. 6). These server-side components include:
- Tutoring Service, the module that runs the example-tracing tutor engine;
- Mathtutor Learner Management System, used by teachers to select, sequence, and assign problems, edit class rosters, and generate reports; used by students to get problems;
- Several data stores , including the tutor interfaces; the DataShop research database (see below); and the Mathtutor Database, which contains class, curriculum, assignment, and student performance data.
• The Mathtutor Client-Side Components (top-left of Fig. 6). These are the Mathtutor user interfaces that run on the student machines, after being downloaded from the server. Each is designed to support a specific problem type.
6.3.1 Choice #1: Running the Tutor Engine on the Server As mentioned, a central architectural choice (inherited from CTAT) was to have a thin client, that is, to place the tutor engine on the server so that only minimal functionality runs on the client. Specifically, the tutor engine runs in a module called the Tutoring Service, a Java-language process on a Linux host. We see a number of pros of having the tutor engine run on the server, especially in an architecture such as CTAT's that aims to support not a single ITS, but many ITSs with a wide variety of tutor interfaces, implemented in any language, and can also be employed to provide tutoring within existing interfaces or simulators.
First, it is easier to combine the same tutor engine with different options for implementing student interfaces when the two communicate over sockets, unencumbered by difficulties associated with direct procedure calls between systems implemented in different programming languages. However, security restrictions in browsers typically permit socket connections only back to the Web application's host server, preventing tutor/interface interprocess communication locally on a client machine. Communication with a tutor engine on the server avoids this problem and still gains the ease of integration.
A second advantage of the thin client architecture is that it minimizes the requirements on the client machine and network so that the site is useful in schools and tutoring sites with older hardware. The client machines' processing and memory requirements are limited to those of the browser and the lightweight Flash Player. Download sizes are limited to the compiled Flash student interfaces, which tend to be small (10s or 100s of kilobytes). The remaining network usage consists of (frequent) short messages conveying student step descriptors and tutor evaluations; this usage pattern permits effective multiplexing of many concurrent sessions over low-bandwidth Internet connections, even if every student action in the tutor interface requires server access.
Third, placing the tutor engine on a server with full-time connectivity accommodates access by the tutor engine to a range of possible external resources (e.g., natural language processing, image analysis, computer algebra engines, etc.) that might not be suited to direct Web access from clients. This kind of interoperability is useful for possible future extensions of Mathtutor as well as for applications of the CTAT architecture outside of Mathtutor. External resources available via socket interfaces can be integrated by connections from the Tutoring Service, which is unrestricted by browser security regimes that inhibit client access to servers other than that which served the UI. The server-side tutor engine also facilitates access to external Java-based resources. They can be linked in as application libraries, without the need to download them to the client.
A key question is: Will a server-side tutor engine solution be able to serve potentially thousands of concurrent users without requiring large numbers of server machines? Our initial experience with approximately 60 simultaneous users in a school has been positive even with old server hardware (and we now have a brand-new server). But this matter remains to be seen: to get a measurement, we are developing load tests, which (more severe than real-world use) simulate many simultaneous student sessions with no pausing for think time between steps. As mentioned, our many-small-messages communication pattern is suited to low network bandwidth. Also, we continue to make the example-tracing algorithm more efficient. It already is considerably lighter weight than the more complex model tracer of CTAT.
6.3.3 Choice #3: Use of Ruby on Rails to Implement Most of the Nontutoring Functionality We opted to use Rails to implement functionality for student and class management, teacher reports, etc. Our choice followed the positive experience of the Assistments group with Rails. It offers several advantages. First, it affords fast prototyping and implementation for Web applications, with compact, easy-to-read code and lots of examples from the online development community. Second, integration with Java, which may be necessary for access to external resources, is straightforward. Third, with the Apache Web server and Rails' Mongrel server, it is easy to add multiple host machines for scalability [ 51].
6.3.4 Choice #4: Separation of Data for Normal Operation of the Site and for Research The Mathtutor architecture strictly separates the data collected for research purposes from the data needed to support the normal operation of the site, mainly because the data collected for research purposes need to be kept in anonymous form (meaning that we remove all information from which the identity of students, teachers, and schools can be inferred), whereas the data needed for regular use should not be anonymous. For the purpose of research and development, detailed logs of step-by-step performance data are collected in the DataShop, where anonymization is automatic [ 26]. This facility was developed for the Pittsburgh Science of Learning Center (PSLC, http://www. learnlab.org/), a US National Science Foundation (NSF)-sponsored research center spanning Carnegie Mellon University and the University of Pittsburgh. The DataShop is a public available repository for many data sets collected by various kinds of educational technology, including many data sets from various tutors. It provides a suite of Web-based tools for analyzing these data and generates reports on error rates and learning curves, among other things. These reports differ from the teacher reports described above in that they are available in DataShop only, not on the Mathtutor site, and are geared more toward learning science researchers than teachers. These reports will be invaluable in the iterative redesign of the content on the site and also tell us how effective the tutors on the Mathtutor site are in helping students learn.
1. the Cognitive Tutor technology (e.g., [ 24]);
2. a set of Cognitive Tutor courses for middle school mathematics that was created in our lab;
3. a relatively new kind of ITS technology, example-tracing tutors; and
4. a set of authoring tools, CTAT, which makes it possible for computer-savvy nonprogrammers to create example-tracing tutors.
• V. Aleven is with the Human-Computer Interaction Institute, Carnegie Mellon University, 3613 Newell Simon Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213-3891. E-mail: email@example.com.
• B.M. McLaren is with the Human-Computer Interaction Institute, Carnegie Mellon University, 2617 Newell-Simon Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213-3891, and the Competence Center for e-Learning, Deutsches Forschungszentrum für Künstliche Intelligenz, Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany.
• J. Sewall is with the Human-Computer Interaction Institute, Carnegie Mellon University, 2617 Newell-Simon Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213-3891. E-mail: firstname.lastname@example.org.
Manuscript received 8 Jan. 2009; revised 23 Mar. 2009; accepted 4 Apr. 2009; published online 5 May 2009.
For information on obtaining reprints of this article, please send e-mail to: email@example.com, and reference IEEECS Log Number TLTSI-2009-01-0002.
Digital Object Identifier no. 10.1109/TLT.2009.22.
Vincent Aleven is an assistant professor in the Human-Computer Interaction Institute at Carnegie Mellon University. He has 16 years of experience in research and development related to intelligent tutoring systems and authoring tools for tutoring systems. He is one of the original developers of the Cognitive Tutor Geometry curriculum. He has conducted research in real educational settings, in urban and suburban schools at the high school, middle school, and vocational school levels, as well as at the postsecondary level. He is a member of the executive committee of the Pittsburgh Science of Learning Center ( http://www.learnlab.org).
Bruce M. McLaren has a split appointment as a senior systems scientist in the Human-Computer Interaction Institute at Carnegie Mellon University in Pittsburgh and as a senior researcher at the Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) in Saarbrücken, Germany. He has research interests in educational technology, collaborative learning, intelligent tutoring, and artificial intelligence. He has more than 50 publications in peer-reviewed journals, conferences, workshops, and symposiums, with most focused on educational technology and learning.
Jonathan Sewall is the project director in the Human-Computer Interaction Institute at Carnegie Mellon University. He has been the technical lead of the Cognitive Tutor Authoring Tools Project for the last four years. He has 25 years of experience in government, industry, and academia with software design and development. His past work experience includes expert systems, massively parallel systems, Web applications, databases, and networks.