图书情报工作 ›› 2017, Vol. 61 ›› Issue (7): 111-117.DOI: 10.13266/j.issn.0252-3116.2017.07.016

• 知识组织 • 上一篇    下一篇

关键词共现方法识别领域研究热点过程中的数据清洗方法

潘玮1, 牟冬梅2, 李茵2, 刘鹏1   

  1. 1. 蚌埠医学院卫生管理系 蚌埠 233030;
    2. 吉林大学公共卫生学院 长春 130021
  • 收稿日期:2017-01-03 修回日期:2017-03-12 出版日期:2017-04-05 发布日期:2017-04-05
  • 作者简介:潘玮(ORCID:0000-0003-0444-973X),讲师,博士;牟冬梅(ORCID:0000-0003-0237-034X),教授,博士生导师,通讯作者,E-mail:moudm@jlu.edu.cn;李茵(ORCID:0000-0003-2125-7270),博士研究生;刘鹏(ORCID:0000-0003-1677-7044),助教,硕士。
  • 基金资助:
    本文系国家自然科学面上项目"嵌入式知识服务驱动下的领域多维知识库构建"(项目编号:71573102)和蚌埠医学院人文社科基金重点项目"医药专利研究领域的知识图谱绘制与分析"(项目编号:BYKY16110skZD)研究成果之一。

Data Cleaning in the Process of Identifying Research Hotpot Based on Keywords Co-occurrence

Pan Wei1, Mu Dongmei2, Li Yin2, Liu Peng1   

  1. 1. Department of Health Management, Bengbu Medical College, Bengbu 233030;
    2. School of Public Health, Jilin University, Changchun 130021
  • Received:2017-01-03 Revised:2017-03-12 Online:2017-04-05 Published:2017-04-05

摘要: [目的/意义] 针对关键词共现方法识别领域研究热点过程中数据清洗进行理论研究与探索,以辅助科研工作者准确识别领域研究热点。[方法/过程] 在文献调研的基础上,阐述数据清洗的定义和对象,并分析脏数据产生的原因和影响,进而制定数据清洗的步骤和方案,并采用实证研究方法对数据清洗的效果和方案的可行性进行验证。[结果/结论] 研究结果表明该数据清洗方案能够提高研究热点识别的准确性,从而证明了该方案的可行性。

关键词: 关键词共现, 研究热点, 研究领域分析, 数据清洗, 数据挖掘

Abstract: [Purpose/significance] In order to efficiently aid researchers to identify research hotpot, this paper aims to explore theoretical basis and practical guidance of data cleaning in the process of identifying research hotpots based on keywords co-occurrence. [Method/process] On the basis of literature research, it firstly defines the conception and the objects of data cleaning. Then it analyses the reasons and influences of dirty data. Finally, it proposes the procedures of data cleaning, which is verified by empirical research method. [Result/conclusion] The result indicates that the procedures of data cleaning which are proved to be feasible can increase the accuracy of identification of research hotpot.

Key words: keywords co-occurrence, research hotpot, research area analysis, data cleaning, data mining

中图分类号: