图书情报工作 ›› 2017, Vol. 61 ›› Issue (23): 6-14.DOI: 10.13266/j.issn.0252-3116.2017.23.001

• 专题:学术论文全文本中的引用信息提取、分析及应用 • 上一篇    下一篇

考虑全文本内容的算法学术影响力分析研究

王玉琢1,2, 章成志1,2,3   

  1. 1. 南京理工大学经济管理学院信息管理系 南京 210094;
    2. 江苏省社会公共安全科技协同创新中心 南京 210094;
    3. 江苏省数据工程与知识服务重点实验室(南京大学) 南京 210093
  • 收稿日期:2017-06-16 修回日期:2017-08-28 出版日期:2017-12-05 发布日期:2017-12-05
  • 通讯作者: 章成志(ORCID:0000-0001-9522-2914),教授,博士,博士生导师,通讯作者,E-mail:zhangcz@njust.edu.cn。
  • 作者简介:王玉琢(ORCID:0000-0002-2891-7238),硕士研究生

Using Full-text to Analyse Academic Impact of Algorithms

Wang Yuzhuo1,2, Zhang Chengzhi1,2,3   

  1. 1. Department of Information Management, Nanjing University of Science & Technology, Nanjing 210094;
    2. Jiangsu Collaborative Innovation Center of Social Safety Science and Technology, Nanjing 210094;
    3. Jiangsu Key Laboratory of Data Engineering and Knowledge Service(Nanjing University), Nanjing 210093
  • Received:2017-06-16 Revised:2017-08-28 Online:2017-12-05 Published:2017-12-05

摘要: [目的/意义]从全文本内容分析的角度对算法的学术影响力进行分析。[方法/过程]以自然语言处理领域十大数据挖掘算法使用为例,分析不同算法在特定领域的影响力。通过对1965年-2006年间发表的自然语言处理领域10 922篇学术论文的调研,从其全文内容中抽取6 001条包含十大数据挖掘算法的句子(简称算法句);针对算法句从提及论文数、总提及次数、提及位置等3个方面,对不同算法的影响力进行比较分析。[结果/结论]以不同特征作为影响力衡量标准,十大数据挖掘算法在自然语言处理领域学术论文中的影响力有明显区别,在基于论文数、提及数和提及位置的评估标准中,SVM算法表现出较高的影响力,Apriori算法的影响力则明显低于其他算法。本研究为量化评估算法的影响力提供了新思路。

关键词: 算法影响力, 影响力评估, 全文本内容, 文本内容分析

Abstract: [Purpose/significance] This paper analyses the influence of different algorithms in specific fields based on full-text analysis.[Method/process] This paper analyzes the usage of the top 10 data mining algorithms in the domain of natural language processing. Firstly, we use 10922 academic papers published in the field of natural language processing from 1965 to 2006, and 6001 sentences containing Top-10 data mining algorithms are extracted from its full text. We evaluate the impact of the Top-10 algorithms according to three aspects:number of papers, mention number of algorithms, location of algorithms, and compare the results of different evaluation criterion.[Result/conclusion] With different standard of assessment, the influence of ten data mining algorithms in conference papers of NLP is obviously different. SVM algorithm has higher influence on the evaluation criteria based on the number of papers and number of mention, and impact of Apriori algorithm is significantly lower than other algorithms. Our result of this paper provides a new way to quantify the influence of algorithm.

Key words: impact of algorithms, impact assessment, full-text, text content analysis

中图分类号: