图书情报工作 ›› 2014, Vol. 58 ›› Issue (01): 101-108,116.DOI: 10.13266/j.issn.0252-3116.2014.01.015

• 情报研究 • 上一篇    下一篇

利用引用信息的关键词提取

陈翀, 罗鹏程, 汪十红   

  1. 北京师范大学信息管理
  • 收稿日期:2013-11-05 修回日期:2013-12-05 出版日期:2014-01-05 发布日期:2014-01-05
  • 作者简介:陈翀,北京师范大学信息管理系副教授,E-mail:chenchong@bnu.edu.cn;罗鹏程,北京师范大学信息管理系硕士研究生;汪十红,北京师范大学信息管理系硕士研究生。
  • 基金资助:

    本文系教育部科技发展中心“网络时代的科技论文快速共享”专项研究资助课题(项目编号20120240001)研究成果之一。

Extraction of Keywords with Citation Information

Chen Chong, Luo Pengcheng, Wang Shihong   

  1. Department of Information Management, Beijing Normal University, Beijing 100875
  • Received:2013-11-05 Revised:2013-12-05 Online:2014-01-05 Published:2014-01-05

摘要:

提出一种利用引用信息提取关键词的新方法,将候选词项与引用文献之间的关系抽象为二部图,使用Co-HITS方法迭代计算词项重要性得分至收敛,选出得分最高的词项作为关键词。用ACM数据库中主分类为“信息系统”的论文摘要作为数据集进行评测,结果显示本文所提出的方法优于同类基于图模型计算词项重要度的方法,适用于科学文献和其他具有链接关系的文本集合。在考虑引用信息的情况下,所提取的关键词不但概括原文还能体现原文受到外界关注的内容要点。

关键词: 关键词提取, 引用文本, Co-HITS

Abstract:

This paper proposes a new method for keywords extraction with citation information. The relationship between candidate terms and citing papers are abstracted to a bipartite, the import score is computed with the general Co-HITS until convergence, and the top scored terms are selected as the extracted keywords. The paper abstracts dataset classified into "information system" during 2002-2011 crawled from ACM digital library is evaluated. The result shows that the method performs better than the state-of-art graph-based method. This method suits for scientific literature and other type of text collection containing rich links. The keywords extracted with it can reflect both the main topics of the original document and the focus outside it.

Key words: keyword extraction, citation text, Co-HITS

中图分类号: