情报研究

利用引用信息的关键词提取

  • 陈翀 ,
  • 罗鹏程 ,
  • 汪十红
展开
  • 北京师范大学信息管理
陈翀,北京师范大学信息管理系副教授,E-mail:chenchong@bnu.edu.cn;罗鹏程,北京师范大学信息管理系硕士研究生;汪十红,北京师范大学信息管理系硕士研究生。

收稿日期: 2013-11-05

  修回日期: 2013-12-05

  网络出版日期: 2014-01-05

基金资助

本文系教育部科技发展中心“网络时代的科技论文快速共享”专项研究资助课题(项目编号20120240001)研究成果之一。

Extraction of Keywords with Citation Information

  • Chen Chong ,
  • Luo Pengcheng ,
  • Wang Shihong
Expand
  • Department of Information Management, Beijing Normal University, Beijing 100875

Received date: 2013-11-05

  Revised date: 2013-12-05

  Online published: 2014-01-05

摘要

提出一种利用引用信息提取关键词的新方法,将候选词项与引用文献之间的关系抽象为二部图,使用Co-HITS方法迭代计算词项重要性得分至收敛,选出得分最高的词项作为关键词。用ACM数据库中主分类为“信息系统”的论文摘要作为数据集进行评测,结果显示本文所提出的方法优于同类基于图模型计算词项重要度的方法,适用于科学文献和其他具有链接关系的文本集合。在考虑引用信息的情况下,所提取的关键词不但概括原文还能体现原文受到外界关注的内容要点。

本文引用格式

陈翀 , 罗鹏程 , 汪十红 . 利用引用信息的关键词提取[J]. 图书情报工作, 2014 , 58(01) : 101 -108,116 . DOI: 10.13266/j.issn.0252-3116.2014.01.015

Abstract

This paper proposes a new method for keywords extraction with citation information. The relationship between candidate terms and citing papers are abstracted to a bipartite, the import score is computed with the general Co-HITS until convergence, and the top scored terms are selected as the extracted keywords. The paper abstracts dataset classified into "information system" during 2002-2011 crawled from ACM digital library is evaluated. The result shows that the method performs better than the state-of-art graph-based method. This method suits for scientific literature and other type of text collection containing rich links. The keywords extracted with it can reflect both the main topics of the original document and the focus outside it.

参考文献

[1] Turney P.Learning to extract keyphrases from text[R]. Dttawa, Canada: Institute for Information Technology, National ResearchCouncil Canada, Technical Report. 1999.
[2] Frank E, Paynter G W, Witten I H, et al. Domain-specific keyphrase extraction[C]//Proceedings of 16th International Joint Conference on Artificial Intelligence. San Francisco, USA:Morgan Kaufmann Publishers, 1999:668-673.
[3] Hulth A. Improved automatic keyword extraction given more linguistic knowledge[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2003: 216-223.
[4] Jiang Xin, Hu Yunhua, Li Hang. A Ranking approach to keyphrase extraction[C]//Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2009: 756-757.
[5] Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine[J]. Computer Networks and ISDN Systems, 1998, 30(1): 107-117.
[6] Mihalcea R, Tarau P. TextRank: Bringing order into texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barceloca, Spain:ACL, 2004.
[7] Kleinberg J M. Authoritative sources in a hyperlinked environment[J]. Journal of the ACM, 1999, 46(5): 604-632.
[8] Litvak M, Last M. Graph-based keyword extraction for single-document summarization[C]//Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization. Stroudsburg: ACL, 2008:17-24.
[9] Zha Hongyuan. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering[C]//Proceedings of the 25th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2002: 113-120.
[10] Deng Hongbo, Lyu M R, King I. A generalized Co-HITS algorithm and its applicationto bipartite graphs[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 239-248.
[11] Wan Xiaojun, Xiao Jianguo. Exploiting neighborhood knowledge for single document summarization and keyphrase extraction[J]. ACM Transactions on Information Systems, 2010, 28(2):31-34.
[12] Liu Zhiyuan, Huang Wenyi, Zheng Yabin, et al. Automatic keyphrase extraction via topic decomposition[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2010: 366-376.
[13] Porter M.The porter stemming algorithm[EB/OL].[2013-12-08].http://tartarus.org/~martin/PorterStemmer/.
[14] The Stanford Natural Language Processing Group[EB/OL].[2013-12-08].http://nlp.stanford.edu/software/index.shtml. (下转第116页)

文章导航

/