Chinese CCGbank construction from Tsinghua Chinese Treebank
从清华中文短语结构树库到组合范畴语法树库
Chang-ning Huang 黄昌宁; Yan Song 宋彦
Abstract 摘要
For the purpose of in-depth text processing in the application of natural language processing, deep grammars require to be introduced into syntactic annotation in treebank construction. Among all of the deep grammars that can provide us deep analysis of texts, Combinatory Categorial Grammar (CCG) is an effective one with type-driven lexicalized formalism and transparent interface between syntax and semantics. In this paper, we proposed an approach of CCGbank construction based on a translation from Tsinghua Chinese Treebank (TCT). 1 In the approach, we designed a verb sub-categorization algorithm and pre-defined several Chinese sentence patterns incorporated with the standard translation procedure. Finally, the resulted CCGbank includes 32,737 sentences with more than 350,000 word tokens.2 Evaluating experiments on both macro statistics and manually annotated references have proved the robustness of our CCGbank and the efficiency of the proposed translation process.
Keywords 关键词
Combinatory categorial grammar 组合范畴语法 CCGbank CCG树库 TCTbank TCT树库 Category 范畴 Combinatory rules 组合规则