An articulatory model of standard Chinese using MRI and X-ray movie
用磁共振成像和X光声道资料建立汉语普通话调音模型
Gaowu Wang 汪高武; Jiangping Kong 孔江平

Abstract 摘要
To better understand speech production from the phonological inputs to articulatory movements and then to acoustic outputs, it is important to establish an elaborate articulatory model of the vocal tract. This paper has explored the articulatory mechanism of speech production in Standard Chinese and developed a geometric articulatory model in both the visual and acoustic modalities.1 This model was based on the data of MRI images and X-ray movie, with the former providing detailed volumetric information of the vocal tract, and the latter the dynamic information of articulation. In this model, the seven articulators have been studied and modeled, including the hard palate, pharynx, jaw, lips, velum, tongue, and larynx. The tongue is modeled as two parts: tongue tip and tongue body, thus reducing the necessary number of parameters. The relation between larynx height and fundamental frequency in regard to the four tones is also modeled. These two improvements on tongue and larynx modeling have contributed new ideas to the articulatory modeling of Standard Chinese. This model can serve as a research tool for linguists, phoneticians, and speech engineers, and can be used in parameter speech synthesis, virtual speaker, and visual assistant speech training of Standard Chinese.

为了更好地理解言语的产生过程,即如何从音位输入到调音器官的动作、再到声学输出,需要建立详尽的声道调音模型。本文是对汉语普通话言语产生中调音机制的探索与研究,建立了一个具有视觉和声学输出的几何调音模型。该调音模型的数据源自于声道的磁共振图像和X光录影,前者主要提供声道的立体形状,后者提供调音的动态过程。在这个模型中,声道被分解为七个调音部位进行研究:硬腭、喉腔后壁、下颌、双唇、软腭、舌头和喉管。其中创新性地,舌头又被分为舌体(相对简单)和舌尖(更为灵活)两部份,从而简化所需要的参数;另外,普通话四声中基频高低与喉管上下高度的关系也加入到模型中。该模型可以作为研究工具服务于语言学、语音学和言语工程,并可用于语音参数合成、虚拟说话人、普通话辅助教学等领域。

Keywords 关键词

Speech production 言语产生 Articulatory model 调音模型 Vocal tract 声道 MRI 磁共振成像 X-ray Movie X光

Article 文章

<< Back 返回

Readers 读者