1673-159X

CN 51-1686/N

一种基于网格查询的改进DBSCAN算法

Improvements of DBSCAN Algorithm Based on Grid

  • 摘要: 针对DBSCAN算法聚类时时间复杂度较高、当边界点同时属于多个类时其聚类准确率较低的问题,在网格查询思想和OPTICS算法的基础上,提出一种改进的DBSCAN算法(GO-DBSCAN算法)。进行聚类操作前,为降低聚类的时间复杂度,先基于网格查询的思想将数据集划分成不同的网格,在进行项目邻域查询时,只须遍历项目附近网格数据而不必遍历整个数据集; 在进行项目聚类时,主要考虑该项目与其附近核心项目的最小可达距离,因此,将OPTICS算法中的最小可达距离引入到DBSCAN算法中,以提高算法对边界点处理的准确度。仿真实验结果表明,GO-DBSCAN在边界点处理的准确率和运行效率方面较DBSCAN都有所提高。

     

    Abstract: In view of the low accuracy of boundary point clustered and the excessively high time complexity of DBSCAN algorithm, a DBSCAN algorithm is proposed based on GO-DBSCAN algorithm and OPTICS algorithm. The algorithm introduced the minimum reachable distance of OPTICS algorithm into the DBSCAN algorithm. Generally, the minimum distance between object and the near core object is a mainly considered factor for the clustering based on DBSCAN algorithm. Therefore, the introduce improves the clustering accuracy. The idea of grid query is proposed to reduce the size of traversal data of objects and this results in the decrease of DBSCAN algorithm'stime complexity. Before clustering, data set is divided into different grid based on grid query rules to reduce clustering time complexity. When project neighborhood is to be queried, it is necessary to traverse only the project near grid data other than the entire data set. The theoretical analysis and simulation results show that GO-DBSCAN can effectively improve the accuracy of boundary point clustered and reduce the time complexity.

     

/

返回文章
返回