Issues and Topics in Chinese Natural Language Processing
综述:汉语自然语言处理的重要论题
Chu-Ren Huang 黃居仁; Keh-Jiann Chen 陈克健

Abstract 摘要
The ten articles collected in this volume are representative studies dealing with important issues in Chinese natural language processing (NLP). Unlike intra-disciplinary linguistic studies, where the concern for cross-linguistic generalization (i.e., Universal Grammar) dominates, computational linguistic studies necessarily focus on accounting for language-specific characteristics. This is because recent developments in linguistic formalisms and computational mechanisms have provided a strong base to deal with general and basic language universal facts, so that the issues remaining are actually idiosyncrasies in each language. Thus, issues and topics in Chinese natural language processing necessarily involve special considerations of the linguistic characteristics of Chinese as well as the idiosyncrasies of Chinese textual conventions. In other words, these issues and topics can be best grasped from the viewpoint of understanding the characteristics of Chinese grammar and texts. In what follows, we will discuss important topics in Chinese Language Processing in terms of the linguistic characteristics of Chinese. We will explicate the relevance of the chapters in this book as well as point to future research directions when appropriate. We will introduce the concept of a ‘word’ as the basic unit for natural language processing in the first section, and discuss the fundamental research topics of segmentation and morpho-lexical generation. The two articles involved are Chiang, Chang, Lin and Su’s ‘Statistical Word Segmentation’ (chapter 7) and Mo, Yang, Cheng and Huang’s ‘Deterministic-measure Compounds in Mandarin Chinese: Formation Rules and Parsing Implementation’ (Chapter 6). In the second section, we will discuss parsing as the foundation of NLP. Four crucial issues in parsing are discussed in four sub-sections. They are 1) grammatical categories and the lexicon, 2) the assignment of grammatical roles, 3) the resolution of lexical ambiguity, and 4) the resolution of structural ambiguity. The two articles involved in this section are Chen and Huang’s ‘Information-based Case Grammar: A Unification-based Formalism for Parsing Chinese’ (Chapter 2) and Chen’s ‘Logic-based Parsing of Chinese’ (Chapter 3). The process of mapping grammatical representation to meaning is discussed in section 3. The relevant articles are Guo and Hsu’s ‘A Cognitive Treatment of Aspect in Japanese to Chinese Machine Translation’ (chapter 4) and Yeh and Lee’s ‘Ambiguity Resolution of Serial Noun Constructions in Chinese Sentences’ (Chapter 5). In section 4, we will introduce the applications of NLP as well as complete working systems. The three systems are reported in chapters 8, 9 and 10. They are Chien, Chen and Lee’s ‘A Mandarin Dictation Machine with Improved Chinese Language Modeling’, T’sou, Lin, Ho, and Lai’s ‘From Argumentative Discourse to Inference Trees: Using Syntactic Markers as Cues in Chinese Text Abstraction’, and Su, Chang, Wang, Chang and Wu’s ‘The Computational Models of the Behavior Tran English-Chinese Machine Translation System’. Lastly, we will discuss developments and new research directions in the concluding section.

本文由计算语言学理论及汉语语法分析两个观点出发; 讨论汉语自然语言处理最重要的题目及其理论背景,并藉由这些讨论来介绍本文集中所收的九篇论文的贡献及相关学术地位。本文中讨论的几个论题为:一,[词]在自然语言处理中基本地位及在中文分析中的特殊问题,二,中文剖析的大要素,包括1)词汇与词类分析,2)语法功能之判定,3)多重词义之解析及4)多重结构之解析,三,如何由结构导出意义,四,如何构建应用系统。本文以讨论汉语自然语言之未来发展方向作为总结。

Article 文章

<< Back 返回

Readers 读者