图书情报工作 ›› 2017, Vol. 61 ›› Issue (12): 113-121.DOI: 10.13266/j.issn.0252-3116.2017.12.015

• 情报研究 • 上一篇    下一篇

基于DBSCAN算法与句间关系的热点话题发现研究

孙明溪1, 刘春琦2   

  1. 1. 长春理工大学图书馆 长春 130022;
    2. 长春市农业信息中心 长春 130111
  • 收稿日期:2017-04-13 修回日期:2017-05-30 出版日期:2017-06-20 发布日期:2017-06-20
  • 作者简介:孙明溪(ORCID:0000-0001-6971-2143),助理馆员,E-mail:1192160894@qq.com;刘春琦(ORCID:0000-0002-8059-7585),农艺师助理。

Research on Hot Topic Detection Based on DBSCAN Algorithm and Inter Sentence Relationship

Sun Mingxi1, Liu Chunqi2   

  1. 1. Changchun University of Science and Technology Library, Changchun 130022;
    2. Changchun Agriculture Information Center, Changchun 130111
  • Received:2017-04-13 Revised:2017-05-30 Online:2017-06-20 Published:2017-06-20

摘要: [目的/意义] 在大数据时代面对海量的数据用户有时会束手无策。因此,越来越多的学者们开始关注互联网热点话题发现的算法,帮助用户快速获取热点话题。[方法/过程] 基于DBSCAN算法,通过动态调整参数来优化算法,实现热点话题发现。根据句法结构与句间关系分析构建热点话题过滤模型,过滤包含热点词项的一般话题。[结果/结论] 采用主流网站新闻数据集进行实验,利用错检率、漏检率等评价指标对算法的有效性进行检验,实验结果证明改进算法性能有所提升,能够为信息用户提供科学研究网络数据的高效途径。

关键词: 信息用户, 热点话题, 聚类分析, 句法结构, 句间分析

Abstract: [Purpose/significance] In the age of big data, information users are helpless when they face massive data. More and more scholars focus on the algorithm used to detect the hot topics of the Internet, helping information users get the hot topics quickly. [Method/process] Based on the DBSCAN, the improved algorithm can realize the goal of detecting the hot topics of the Internet by adjusting the parameters dynamically. The hot topic filtering model can filter the general topic of lexical to improve the accuracy of the results. [Result/conclusion] We use error rate and detection rate to prove the effectiveness of the algorithm. The experimental results of the improved algorithm show that the improved algorithm is more efficient than the original. In conclusion, the improved DBSCAN algorithm and the SDIRFM can offer efficient ways for information users to research network data.

Key words: information user, hot topic detection, density clustering, syntactic structure, relationship between sentences

中图分类号: