The "How Was Your Dayâ (HWYD) companion is an embodied conversational agent that can discuss work-related issues, entering free-form dialogues while discussing issues surrounding a typical work day. The open-ended nature of these interactions requires new models of evaluation. Here, we describe a paradigm and methodology for evaluating the main aspects of such functionality in conjunction with overall system behavior, with respect to three parameters: functional ability (i.e., does it do the "rightâ thing conversationally), content (i.e., does it respond appropriately to the semantic context), and emotional behavior (i.e., given the emotional input from the user, does it respond in an emotionally appropriate way). We demonstrate the functionality of our evaluation paradigm as a method for both grading current system performance, and targeting areas for particular performance review. We show correlation between, for example, automatic speech recognition performance and overall system performance (as is expected in systems of this type), but beyond this, we show where individual utterances or responses, indicated as positive or negative, characterize system performance, and demonstrate how our combination evaluation approach highlights issues (both positive and negative) in the companion system's interaction behavior.
