Information and Human Language Technology, Brazilian Symposium in (2009)
Sao Carlos, Sao Paulo, Brazil
Sept. 8, 2009 to Sept. 11, 2009
ISBN: 978-0-7695-3945-4
pp: 179-182
This paper highlights the primary methods employed in the C-ORAL-BRASIL compiling process, i.e, recording, transcribing and segmenting oral texts. The C-ORAL-BRASIL is a Brazilian Portuguese corpus of spontaneous speech, designed for the study of informational structure. It is representative of the diaphasic variation, seeking to cover as many different comunicative situations as possible. This paper presents and exemplifies the processes of transcription and segmentation of speech into prosodic units as employed in our on-going research. It concludes with illustrations of some questions that the corpus will enable us to answer.
corpus, spontaneous speech, Brazilian Portuguese
