This Article 
 Bibliographic References 
 Add to: 
A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment
June 2010 (vol. 32 no. 6)
pp. 974-987
Arshia Cont, Ircam-Centre Pompidou, Paris
The capacity for real-time synchronization and coordination is a common ability among trained musicians performing a music score that presents an interesting challenge for machine intelligence. Compared to speech recognition, which has influenced many music information retrieval systems, music's temporal dynamics and complexity pose challenging problems to common approximations regarding time modeling of data streams. In this paper, we propose a design for a real-time music-to-score alignment system. Given a live recording of a musician playing a music score, the system is capable of following the musician in real time within the score and decoding the tempo (or pace) of its performance. The proposed design features two coupled audio and tempo agents within a unique probabilistic inference framework that adaptively updates its parameters based on the real-time context. Online decoding is achieved through the collaboration of the coupled agents in a Hidden Hybrid Markov/semi-Markov framework, where prediction feedback of one agent affects the behavior of the other. We perform evaluations for both real-time alignment and the proposed temporal model. An implementation of the presented system has been widely used in real concert situations worldwide and the readers are encouraged to access the actual system and experiment the results.

[1] R.B. Dannenberg, "An On-Line Algorithm for Real-Time Accompaniment," Proc. Int'l Computer Music Conf., pp. 193-198, 1984.
[2] B. Vercoe, "The Synthetic Performer in the Context of Live Performance," Proc. Int'l Computer Music Conf., pp. 199-200, 1984.
[3] R.J. McNab, L.A. Smith, I.H. Witten, C.L. Henderson, and S.J. Cunningham, "Towards the Digital Music Library: Tune Retrieval from Acoustic Input," Proc. First ACM Int'l Conf. Digital Libraries, pp. 11-18, 1996.
[4] R.B. Dannenberg, "An Intelligent Multi-Track Audio Editor," Proc. Int'l Computer Music Conf., vol. 2, pp. 89-94, Aug. 2007.
[5] J. Keshet, S. Shalev-Shwartz, Y. Singer, and D. Chazan, "A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment," IEEE Trans. Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2373-2382, Nov. 2007.
[6] M. Johnson, "Capacity and Complexity of HMM Duration Modeling Techniques," IEEE Signal Processing Letters, vol. 12, no. 5, pp. 407-410, May 2005.
[7] L. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-285, Feb. 1989.
[8] R. Rosen, Anticipatory Systems, vol. 1. Pergamon Press, 1985.
[9] E.W. Large and M.R. Jones, "Dynamics of Attending: How People Track Time-Varying Events," Psychological Rev., vol. 106, no. 1, pp. 119-159, 1999.
[10] M. Puckette and C. Lippe, "Score Following in Practice," Proc. Int'l Computer Music Conf., pp. 182-185, 1992.
[11] L. Grubb and R.B. Dannenberg, "A Stochastic Method of Tracking a Vocal Performer," Proc. Int'l Computer Music Conf., pp. 301-308, 1997.
[12] C. Raphael, "Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 4, pp. 360-370, Apr. 1999.
[13] N. Orio and F. Déchelle, "Score Following Using Spectral Analysis and Hidden Markov Models," Proc. Int'l Computer Music Conf., 2001.
[14] A. Cont, "Realtime Audio to Score Alignment for Polyphonic Music Instruments Using Sparse Non-Negative Constraints and Hierarchical HMMS," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, May 2006.
[15] C. Raphael, "Aligning Music Audio with Symbolic Scores Using a Hybrid Graphical Model," Machine Learning, vol. 65, nos. 2/3, pp. 389-409, 2006, .
[16] M. Müller, Information Retrieval for Music and Motion. Springer-Verlag New York, Inc., 2007.
[17] P. Boulez, Penser la Musique Aujourd'hui. Gallimard, 1964.
[18] I. Xenakis, Formalized Music. Univ. of Indiana Press, 1971.
[19] J.D. Ferguson, "Variable Duration Models for Speech," Proc. Symp. Applications of Hidden Markov Models to Text and Speech, pp. 143-179, Oct. 1980.
[20] A. Cont, "Antescofo: Anticipatory Synchronization and Control of Interactive Parameters in Computer Music," Proc. Int'l Computer Music Conf., Aug. 2008.
[21] I. Peretz and R.J. Zatorre, "Brain Organization for Music Processing," Ann. Rev. Psychology, vol. 56, pp. 89-114, 2005.
[22] Y. Guédon, "Hidden Hybrid Markov/Semi-Markov Chains," Computational Statistics and Data Analysis, vol. 49, pp. 663-688, 2005.
[23] K.P. Murphy, "Dynamic Bayesian Networks: Representation, Inference and Learning," PhD dissertation, Computer Science Division, Univ. of California Berkeley, , 2002.
[24] K.V. Mardia and P. Jupp, Directional Statistics, second ed. John Wiley and Sons Ltd., 2000.
[25] P.S. Maybeck, Stochastic Models, Estimation and Control, vol. 1. Academic Press, 1979.
[26] ScofoMIREX, "Score Following Evaluation Proposal," Score_Following_ Proposal, Aug. 2006.
[27] A. Cont, D. Schwarz, N. Schnell, and C. Raphael, "Evaluation of Real-Time Audio-to-Score Alignment," Proc. Int'l Symp. Music Information Retrieval, Oct. 2007.
[28] F.R. Moore, Elements of Computer Music. Prentice-Hall, Inc., 1990.
[29] C. Yeh, N. Bogaards, and A. Roebel, "Synthesized Polyphonic Music Database with Verifiable Ground Truth for Multiple f0 Estimation" Proc. Eighth Int'l Conf. Music Information Retrieval, pp. 393-398, Sept. 2007.
[30] A. Cont, D. Schwarz, and N. Schnell, "Training Ircam's Score Follower," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, Mar. 2005.
[31] S. Dixon, "Live Tracking of Musical Performances Using On-Line Time Warping," Proc. Eighth Int'l Conf. Digital Audio Effects, 2005.
[32] R.B. Dannenberg and N. Hu, "Polyphonic Audio Matching for Score Following and Intelligent Audio Editors," Proc. Int'l Computer Music Conf., pp. 27-34, 2003.

Index Terms:
Automatic musical accompaniment, hidden hybrid Markov/semi-Markov models, computer music.
Arshia Cont, "A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 6, pp. 974-987, June 2010, doi:10.1109/TPAMI.2009.106
Usage of this product signifies your acceptance of the Terms of Use.