A corpus-based approach to fingerprinting stylistic features of classical Chinese poetry: A case study of Liu Yong and Su Shi
基於語料庫的古詩詞文學風格辨識:柳永及蘇軾詩詞範例研究
Alex Chengyu Fang 方稱宇; Wan-yin Li 李昀燕; Jing Cao 曹競
Abstract 摘要
"In this article, we describe an experiment that is aimed at the use of ontological knowledge to identify the stylistic features of classical Chinese poetry.1 In particular, this article addresses the task of automatic authorship attribution of classical Chinese poems. This work is motivated by the understanding that the creative language use by different poets can be characterised through their creative use of imageries which can be captured through ontological annotation. A corpus of lyric songs written by Liu Yong and Su Shi in the Song Dynasty2 is used, which is word segmented and ontologically annotated. Different feature sets are constructed that represent all the possible combinations of word tokens and their ontological annotations. Machine learning techniques are applied and SVM used to evaluate the performance of the different feature sets. Empirical results show that word tokens alone can be used to achieve an accuracy of 87% in the task of authorship attribution between Liu Yong and Su Shi. More interestingly, ontological knowledge is shown to produce significant performance gains when combined with word tokens. This observation is reinforced by the fact that most of the feature sets with ontological annotation outperform the use of bare word tokens as features. Specifically, our empirical experiment shows that word tokens combined with ontological annotations achieve an overall accuracy of 89%, expressed in F-value, for the task of authorship attribution between Liu Yong and Su Shi.
Keywords 關鍵詞
Syntax 語法 Ontology 本體知識 Imagery 意象 Machine learning 機器學習 Poetic style 詩詞文學風格