Logic-Based Parsing of Chinese
以逻辑为本的中文剖析
Hsin-His Chen 陈信希

Abstract 摘要
Mandarin Chinese is a highly flexible and context-sensitive language. Not only is it difficult to process this type of language in computers, but segmentation also poses problems due to the unclear delimitation of lexical units in Chinese sentences. This paper regards segmentation as a part of parsing with logic programming techniques. For the treatment of maximal freedom of empty categories in Mandarin Chinese, C-Command and Subjacency Conditions are embedded implicitly in the integrated segmentation-parsing model to decide which constituents are moved and/or deleted, A grammar formalism is proposed that has the specific features of uniform treatment of movements, and arbitrary number of movements, automatic detection of grammar errors beforehand and clear declarative semantics. A parser generator is used to translate the grammar rules and generate the optimized codes. Graph unifications that support multiple-valued, negated and distinctive features are adopted to express the co-occurrence restrictions and information transfers among constituents in this model. Represented with this environment are many common linguistic phenomena that occur in Chinese sentences such as topic-comment structures, ba-construction, bei-constructions, relative clause constructions, appositive clause constructions and serial verb constructions. The parsing of long Chinese sentences is also dealt with in this paper.

中文是一种使用非常弹性且前后文相关的语言,因此计算机很难处理中文语句。除此以外,由于中文句子语汇之间并没有明显的分割符号,断词为另一个困难的问题。这篇论文采用逻辑程序的技术,将断词视为剖析的一部分。为了处理中文空词高自由度的使用,论文将 c-command 和 subjacency 两项限制条件,放在整合的剖析-断词模型中,以决定那些成分被移走且/或删除。论文也提出一种语法形式化语言,其具有均一处理移位现象及任意个数的移位,预先自动侦测语法错误,和清楚地叙述语等特点。剖析器产生装置将语法规则转换成程序代码,并作最佳化。图形联并支持多值,反面,离接等结构,在这个模型中,被采用来表示成分间的共存限制和信息传递。许多常见的语言现象如主题-评论结构,把字句,被字句,关系子句,同位句,递续结构等,都在这个环境中表现出来。最后,本文也讨论中文长句的处理。

Article 文章

<< Back 返回

Readers 读者