收稿日期: 2013-11-05
修回日期: 2013-12-05
网络出版日期: 2014-01-05
基金资助
本文系教育部科技发展中心“网络时代的科技论文快速共享”专项研究资助课题(项目编号20120240001)研究成果之一。
Extraction of Keywords with Citation Information
Received date: 2013-11-05
Revised date: 2013-12-05
Online published: 2014-01-05
陈翀 , 罗鹏程 , 汪十红 . 利用引用信息的关键词提取[J]. 图书情报工作, 2014 , 58(01) : 101 -108,116 . DOI: 10.13266/j.issn.0252-3116.2014.01.015
This paper proposes a new method for keywords extraction with citation information. The relationship between candidate terms and citing papers are abstracted to a bipartite, the import score is computed with the general Co-HITS until convergence, and the top scored terms are selected as the extracted keywords. The paper abstracts dataset classified into "information system" during 2002-2011 crawled from ACM digital library is evaluated. The result shows that the method performs better than the state-of-art graph-based method. This method suits for scientific literature and other type of text collection containing rich links. The keywords extracted with it can reflect both the main topics of the original document and the focus outside it.
Key words: keyword extraction; citation text; Co-HITS
[1] Turney P.Learning to extract keyphrases from text[R]. Dttawa, Canada: Institute for Information Technology, National ResearchCouncil Canada, Technical Report. 1999.
[2] Frank E, Paynter G W, Witten I H, et al. Domain-specific keyphrase extraction[C]//Proceedings of 16th International Joint Conference on Artificial Intelligence. San Francisco, USA:Morgan Kaufmann Publishers, 1999:668-673.
[3] Hulth A. Improved automatic keyword extraction given more linguistic knowledge[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2003: 216-223.
[4] Jiang Xin, Hu Yunhua, Li Hang. A Ranking approach to keyphrase extraction[C]//Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2009: 756-757.
[5] Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine[J]. Computer Networks and ISDN Systems, 1998, 30(1): 107-117.
[6] Mihalcea R, Tarau P. TextRank: Bringing order into texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barceloca, Spain:ACL, 2004.
[7] Kleinberg J M. Authoritative sources in a hyperlinked environment[J]. Journal of the ACM, 1999, 46(5): 604-632.
[8] Litvak M, Last M. Graph-based keyword extraction for single-document summarization[C]//Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization. Stroudsburg: ACL, 2008:17-24.
[9] Zha Hongyuan. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering[C]//Proceedings of the 25th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2002: 113-120.
[10] Deng Hongbo, Lyu M R, King I. A generalized Co-HITS algorithm and its applicationto bipartite graphs[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 239-248.
[11] Wan Xiaojun, Xiao Jianguo. Exploiting neighborhood knowledge for single document summarization and keyphrase extraction[J]. ACM Transactions on Information Systems, 2010, 28(2):31-34.
[12] Liu Zhiyuan, Huang Wenyi, Zheng Yabin, et al. Automatic keyphrase extraction via topic decomposition[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2010: 366-376.
[13] Porter M.The porter stemming algorithm[EB/OL].[2013-12-08].http://tartarus.org/~martin/PorterStemmer/.
[14] The Stanford Natural Language Processing Group[EB/OL].[2013-12-08].http://nlp.stanford.edu/software/index.shtml. (下转第116页)
/
〈 | 〉 |