Determinative-Measure Compounds in Mandarin Chinese Formation Rules and Parser Implementation
中文里的定量复合词:构成律以及剖析程序 
Ruo-ping Jean Mo 莫若萍; Yao-Jung Yang 杨曜荣; Keh-Jiann Chen 陈克健; Chu-Ren Huang 黄居仁

Abstract 摘要
We deal with the identification of the determinative-measure compounds (DMs) in parsing Mandarin Chinese in this paper. The number of possible DMs is infinite, and cannot be listed exhaustively in a lexicon. However, the set of DMs can be described by regular expressions, and can be recognized by a finite automation. We propose to identify DMs by regular expression before parsing as part of our morphological module. After investigating a large amount of linguistic data, we find that DMs are formed compositionally and hierarchically from simpler constituents. Based upon this fact, some grammar rules are constructed to combine determinatives and measures. In addition, a parser is formed to implement these rules. By doing so, almost all of the unlisted DMs are recognized. However, if only the DM recognition procedure is fired, many ambiguous results appear. With our word segmentation process, these ambiguous are greatly reduced.

本论文将提出剖析中文时如何处理定量式复合词。像衍生性的复合词一般,定量式复合词也可以不断地衍生新词,数量庞杂无法在词典中一一列出。因此造成断词或者剖析时歧异产生。但比起其它复合词,定量式复合词却较容易归纳其衍生的规则,进而使其在剖析前即已辨认出来。###我们发现定量式的词不但具有组合性同时也有阶层关系,因此根据这种关系我们列出组合规则并将之应用于我们所设计的剖析系统中。结果发现,大部分的定量式复合词皆可辨识出来,同时断词时产生的歧异也大为减低。

Article 文章

<< Back 返回

Readers 读者