Abstract:
Chinese word segmentation is the foundation of natural language processing, and cross ambiguity is one of the bottlenecks to improve the accuracy of Chinese word segmentation. This paper proposes a method combining maximum matching algorithm and passive aggressive(PA)algorithm to eliminate cross ambiguity. Firstly, segmentation model was trained based on PA. Secondly, we checked the position of cross ambiguity based on forward maximum matching algorithm and negative maximum matching algorithm. Thirdly, the position of cross ambiguity and the context were submitted to the segmentation model, and they were decoded. Lastly, the final result was obtained. The experiment results on Renmin Daily 2014 show that the precision, recall and F-score of cross ambiguity are 98.32%、98.14% and 98.23% respectively.