Research on Hot Topic Detection Based on DBSCAN Algorithm and Inter Sentence Relationship

  • Sun Mingxi ,
  • Liu Chunqi
Expand
  • 1. Changchun University of Science and Technology Library, Changchun 130022;
    2. Changchun Agriculture Information Center, Changchun 130111

Received date: 2017-04-13

  Revised date: 2017-05-30

  Online published: 2017-06-20

Abstract

[Purpose/significance] In the age of big data, information users are helpless when they face massive data. More and more scholars focus on the algorithm used to detect the hot topics of the Internet, helping information users get the hot topics quickly. [Method/process] Based on the DBSCAN, the improved algorithm can realize the goal of detecting the hot topics of the Internet by adjusting the parameters dynamically. The hot topic filtering model can filter the general topic of lexical to improve the accuracy of the results. [Result/conclusion] We use error rate and detection rate to prove the effectiveness of the algorithm. The experimental results of the improved algorithm show that the improved algorithm is more efficient than the original. In conclusion, the improved DBSCAN algorithm and the SDIRFM can offer efficient ways for information users to research network data.

Cite this article

Sun Mingxi , Liu Chunqi . Research on Hot Topic Detection Based on DBSCAN Algorithm and Inter Sentence Relationship[J]. Library and Information Service, 2017 , 61(12) : 113 -121 . DOI: 10.13266/j.issn.0252-3116.2017.12.015

References

[1] 李宇耀,李国岳.新媒体下体育舆情环境及应对技术探讨[J].现代计算机,2013(3):10-14.
[2] 罗亚平.基于用户浏览行为的网络热点话题发现模型研究[D].北京:北京邮电大学,2008.
[3] 刘旭.基于互联网数据的话题发现及追踪技术研究与实现[D].上海:复旦大学,2010.
[4] CHEN K Y,CHOU T. Hot topic extraction based on timeline analysis and multidimensional sentence modeling[J]. IEEF transactions on knowledge and data engineering,2007,19(8):1016-1025.
[5] LU P, LIU S Y, DONG Z J, et al. HSPKNN:an effective and practical framework for hot topic detection of internet news[C]//International conference on computing and convergence technology. Seoul:IEEE,2012:888-893.
[6] 李保利,俞士汶.话题识别与跟踪研究[J].计算机工程与应用,2003,39(17):6-10.
[7] 贾自艳,何清,张俊海.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280.
[8] 王巍,杨武,齐海凤.基于多中心模型的网络热点话题发现算法[J].南京理工大学学报(自然科学版),2009,33(4):422-426.
[9] 王猛,李斌,孙春奇.基于频繁模式挖掘的网络舆情热点发现技术研究[J].微计算机信息,2010,26(36):35-37.
[10] 徐雅斌,李艳平,郑芬.基于MapReduce架构的网络热点话题发现[J].华中科技大学学报(自然科学版),2012(S1):236-239.
[11] 蒙祖强,黄柏雄.一种新的网络热点话题提取方法[J].小型微型计算机系统,2013,34(4):743-748.
[12] 郝晓玲,茅嘉惠,于秀艳.微博热词抽取及话题发现研究[J].情报杂志,2015(6):109-113.
[13] 饶浩,林育曼,陈海媚.基于粒子群算法的微博热点话题发现分析[J].情报科学,2016,34(12):51-54.
[14] 陈兴蜀,罗梁,王海舟,等.基于ICE-LDA模型的中英文跨语言话题发现研究[J].工程科学与技术,2017,49(2):100-106.
[15] 乔端瑞.基于K-means算法及层次聚类算法的研究与应用[D].长春:吉林大学,2016.
[16] TRAN T N, DRAB K, DASZYKOWSKI M. Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J]. Chemometrics and intelligent laboratory systems, 2013,120(2):92-96.
[17] 叶圣俊,孙济庆,李楠.基于词素的中文术语语义关联研究[J].图书馆杂志,2017(1):80-87.
[18] 王玉雷,李玲娟.一种密度和划分结合的聚类算法[J].计算机技术与发展,2015(9):53-56.
[19] 镇丽华,王小林,杨思春.自动问答系统中问句分类研究综述[J].安徽工业大学学报(自然科学版),2015,32(1):48-54.
[20] 姬建辉.中文篇章级句间关系分析[D].哈尔滨:哈尔滨工业大学,2014.
[21] 李宗林,罗可.DBSCAN算法中参数的自适应确定[J].计算机工程与应用,2016,52(3):70-73.
[22] 宋丹,卫东,陈英.基于改进向量空间模型的话题识别跟踪[J].计算机技术与发展,2006,9(16):62-67.
Options
Outlines

/