The Hong Kong Cantonese corpus: Design and uses
Kang Kwong Luke 陸鏡光; May L.Y. Wong 王麗賢

Abstract 摘要
The Hong Kong Cantonese Corpus (HKCC) was built with the specific aim of making available to researchers and language learners a body of naturally occurring talk gleaned from everyday conversations between speakers of Cantonese in Hong Kong.1 In this paper, we describe the origin, rationale, design principles and uses of HKCC. In particular, we focus on the following aspects of the corpus: (1) data collection procedures; (2) transcription and orthographic conventions; (3) encoding schemes; (4) segmentation and POS tagging; and (5) potential uses of the corpus and future directions.


Keywords 關鍵詞

Speech corpus 口語語料庫 Conversation 日常會話 Cantonese 粵語 Naturally occurring talk 自然語言材料 Corpus design 語料庫設計

Article 文章

<< Back 返回

Readers 读者