Lexicalized statistical pattern matching: Search engine-aided analysis for the Chinese language
词汇化模板定量匹配——借助于搜索引擎的中文分析
HMaosong Sun 孙茂松; Ruying Sun 孙如颖

Abstract 摘要
"This article presents an idea of search engine-aided analysis for the Chinese language. The core of the idea is the proposed concept “Lexicalized statistical pattern matching”. The basic methodology is to perform some degree of Chinese analysis at different linguistic levels by designing and exploiting a lexicalized statistical pattern system, together with the simplest string matching technique search engines used. The rationality of the idea is discussed centering on several typical case studies and, some related key issues are also addressed. It should be noted that this idea is preliminary, needing further validation by large-scale experiments.

"本文阐述了一种借助现有搜索引擎对中文进行辅助研究的思路。1主要考量是本文所提出的“词汇化模板定量匹配”方法。这个方法的要点是期望设计一个针对中文的“词汇化模板体系”,依靠简单的字符串匹配技术,在语言的不同层次上实现对中文某种程度的分析。本文通过若干典型案例说明了所提方法的合理性,并讨论了若干相关的重要问题。这个思路还有待于大规模实验的检验。

Keywords 关键词

Lexicalized statistical pattern matching 词汇化模板定量匹配 Search engine 搜索引擎 Web corpus 互联网语料库 Chinese analysis 中文分析 Natural language processing 自然语言处理

Article 文章

<< Back 返回

Readers 读者