[Purpose/significance] In order to solve the problem of topic selection for professional fields in publishing industry, this paper integrates multisource dynamic information on the Internet to detect the hotspots for professional fields through multi-dimensional intelligence analysis. The data-driven topic selection is realized to lay a solid foundation for the digitization transformation and development of publishing industry.[Method/process] A intelligence analysis model towards topic selection was proposed to detect hotspots in professional fields. The model was divided into two steps:the hotspot discovery and the hotness evaluation. The hotspot discovery in this model identified hotspots in professional fields through word frequency statistics and the algorithm of word growth rate. Then, in the step of hotness evaluation, a series of indices in the dimension of content and spread were designed to calculate and evaluate the hotness of the hotspots identified in the last step.[Result/conclusion] A hotspots detecting experiment was conducted with 36,550 pieces of Chinese multisource dynamic information in the area of ICT collected from January to April of 2018, which verified the effectiveness of the proposed model. This model can be used in publishing industry to complete the step of topic selection scientificallyn
Wang Xiaoguang
,
Wang Hongyu
,
Huang Han
. Towards Professional Publishing: Research on Hotspot Detection Model Based on Multi-source Data[J]. Library and Information Service, 2019
, 63(14)
: 52
-61
.
DOI: 10.13266/j.issn.0252-3116.2019.14.007
[1] 曾文, 徐红姣, 车尧, 等. 基于图书出版行业大数据的选题决策分析模型研究[J]. 情报学报, 2018, 37(8):813-821.
[2] 黄震. 数字化领航传统出版迈入新时代[J]. 出版广角, 2018(17):35-37.
[3] 王洪伟, 高松, 陆頲. 基于LDA和SNA的在线新闻热点识别研究[J]. 情报学报, 2016, 35(10):1022-1037.
[4] ASATANI K, MORI J, OCHI M, et al. Detecting trends in academic research from a citation network using network representation learning[J]. Plos One, 2018, 13(5):e0197260.
[5] SALMERON-MANZANO E, MANZANO-AGUGLIARO F, ENERGIES, et al. The electric bicycle:worldwide research trends[J]. Energies, 2018, 11(7):1894.
[6] 程齐凯, 王晓光. 一种基于共词网络社区的科研主题演化分析框架[J]. 图书情报工作, 2013, 57(8):91-96.
[7] 庄婷婷, 王平, 程齐凯. 一种时间情境依赖的微博话题抽取方法[J]. 信息资源管理学报, 2013(3):40-46.
[8] 陈武, 陆伟, 韩曙光. 专家检索及热点探测系统设计与实现[J]. 情报杂志, 2009, 28(12):113-117.
[9] WANG H, LIU C, ZHAO Z, et al. Efficiency evaluation of an Internet Plus University Student Affairs System based on fuzzy theory and the analytic hierarchy process[J]. Journal of intelligent & fuzzy systems, 2016,31(6), 3121-3130.
[10] 杨春静, 程刚. 科技情报机构知识服务能力评价体系研究[J]. 情报理论与实践, 2017, 40(7):43-49.
[11] 李树青, 白云. 基于时序关键词热点识别方法的图情学科研究趋势分析(2000-2009)[J]. 现代图书情报技术, 2011, 27(5):69-76.
[12] 何跃, 蔡博驰. 基于因子分析法的微博热度评价模型[J]. 统计与决策,2016(18):52-54.
[13] 李信, 李旭晖, 陆伟,等. 大数据驱动下的图书情报学科热点领域挖掘——面向WOS题录数据的实证视角[J]. 图书馆论坛, 2017, 37(4):49-57.
[14] 陆伟, 彭玉, 陈武. 基于SOM的领域热点主题探测[J]. 现代图书情报技术, 2011, 27(1):63-68.
[15] 郑魁, 疏学明, 袁宏永. 网络舆情热点信息自动发现方法[J]. 计算机工程, 2010, 36(3):4-6.
[16] 陈晓美, 高铖, 关心惠. 网络舆情观点提取的LDA主题模型方法[J]. 图书情报工作, 2015, 59(21):21-26.
[17] 杨于峰, 余伟萍, 田盼. 基于SOM神经网络的品牌丑闻微博传播分类预测研究[J]. 情报杂志, 2013(10):23-28.
[18] 吴晓娟. 基于微博文本的网络舆情主题演化分析——以"蓝色钱江放火案"为例[D]. 南京:南京大学, 2018.
[19] 刘海峰, 于利军, 刘守生. 一种基于类别分布信息的文本特征选择模型[J]. 图书情报工作, 2013, 57(15):137-141.
[20] YU B, YU Y H. Auto-Tracking controversial topics in social-media-based customer dialog:a case study on starbucks[C]//GOBINDA C, JULIE M. Lecture notes in computer science,volume10766. Berlin:Springer,2018:87-96.
[21] LI N, WU D. Using text mining and sentiment analysis for online forums hotspot detection and forecast[J]. Decision support systems, 2010,48(2), 354-368.
[22] KLEINBERG J. Bursty and hierarchical structure in streams[J]. Data mining & knowledge discovery, 2003, 7(4):373-397.
[23] 高永兵, 杨贵朋, 张娣,等. 基于突显词博文聚类的官微事件检测方法[J]. 数据分析与知识发现, 2017, 1(9):57-64.
[24] 杨选辉, 蔡志强.基于突变检测与共词分析的关联数据新兴趋势探测[J]. 情报科学, 2018, 36(11):164-168.
[25] 孙飞显, 程世辉, 靳晓婷,等. 政府负面网络舆情热度定量评价方法——以新浪微博为例[J]. 情报杂志, 2015(8):137-141.