基于统计与词嵌入的近代汉语动量结构研究

蒋彦廷; 潘雨婷; 杨乐

doi:10.12189/j.issn.1672-8505.2020.02.004

基于统计与词嵌入的近代汉语动量结构研究

A Research on Verbal Classifiers Collocation in Pre-modern Chinese Based on Statistics and Word Embedding

摘要

摘要: 文章以一个2.3亿字的历时语料库为平台，结合统计与词嵌入算法，定量考察近代汉语中13个动量词与动词的组合状况。以宏观视野，展现并解释近代汉语动量词的概貌与特征，服务于汉语史研究与量词教学。首先，综合统计与规则的方法，完成动量词自动识别、自动分词、动量词搭配的动词自动识别等预处理工作。其次，分时段测查各动量格式、各动量词的频率，发现动量词在文言、白话语体中的词频差异悬殊。最后，依照《同义词词林》的语义类体系，考察动量词所修饰的动词的优势和劣势语义类别，发现动词语义类与动词是否受动量词修饰之间，存在着一种非强制的、概率性的联系。

Abstract: Based on a diachronic corpus with 230 million Chinese characters and combined with the statistical method and word embedding algorithm, this paper makes a quantitative study of 13 verbal classifiers in pre-modern Chinese language. From a macro perspective, this study shows and explains the general situation and characteristics of verbal classifiers in pre-modern Chinese, and tries to serve for the study of Chinese history and the teaching of quantifiers. Firstly, combined with statistical and regular methods, it finishes pre-processing work of the automatic recognition of verbal classifiers, word segmentation, and verbal classifiers collocation in pre-modern Chinese language. Secondly, it measures the frequency of various verbal classifiers, verbal classifiers’ syntactic forms, and finds the differences in the word frequency of verbal classifiers in classical Chinese and vernacular Chinese. Finally, according to the lexical semantic system of Synonym Forest, it analyzes the advantage and disadvantage of semantic categories of verbs modified by verbal classifiers, and finds that there is a non-compulsory and probabilistic relationship between the semantic categories of verbs and whether the verbs are modified by passive quantifiers.

HTML全文

参考文献(20)

施引文献

资源附件(0)