Issues in Automatic Summarization Research

  • Wang Lianxi
Expand
  • 1. Library, Guangdong University of Foreign Studies, Guangzhou 510420;
    2. Social Science Key Laboratory of Language Engineering and Computation of Guangdong Province, Guangzhou 510006

Received date: 2014-08-06

  Revised date: 2014-09-28

  Online published: 2014-10-20

Abstract

This paper analyzes the procedure of automatic summarization, and briefly surveys the related achievements at home and abroad. Moreover, some issues of word segmentation, redundancy control, quality evaluation, short text summarization, multilingual summarization and across-language summarization in current work are concluded, and some future directions are also discussed in detail. A valuable reference can be provided to the further studies for automatic summarization and natural language processing.

Cite this article

Wang Lianxi . Issues in Automatic Summarization Research[J]. Library and Information Service, 2014 , 58(20) : 13 -22 . DOI: 10.13266/j.issn.0252-3116.2014.20.002

References

[1] 胡侠,林晔,王灿,等. 自动文本摘要技术综述[J]. 情报杂志, 2010, 29(8): 144-147.

[2] 谭翀, 陈跃新.自动摘要方法综述[J].情报学报, 2008, 27(1): 62-68.

[3] 吴岩, 刘挺,王开铸,等.中文自动文摘原理与方法探索[J].中文信息学报,1998, 12 (2):8-16.

[4] 郭燕慧,钟义信,马志勇,等.自动文摘综述[J].情报学报, 2002, 21(5): 582-591.

[5] Arackal N, Dhanya P. A survey on existing extractive text summarization techniques[EB/OL].[2014-07-05]. http://csidl.org/xmlui/bitstream/handle/123456789/671/7.PDF?sequence=1.

[6] Spärck Jones K. Automatic summarising: The state of the art[J]. Information Processing & Management, 2007, 43(6): 1449-1481.

[7] He Ruifang, Qin Bing, Liu Ting. A novel approach to update summarization using evolutionary manifold ranking and spectral clustering[J]. Expert Systems with Applications, 2012, 39(3):2375-2384.

[8] 程倩倩,田大钢.基于基本要素方法的中文自动文本摘要模型[J]. 现代图书情报技术, 2010,26(2):74-78.

[9] 王红玲,张明慧,周国栋.主题信息的中文多文档自动文摘系统[J].计算机工程与应用,2012,48(25): 132-136.

[10] Canhasi E, Kononenko I. Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization[J]. Expert Systems with Applications, 2014, 41(2): 535-543.

[11] 曹洋, 成颖, 裴雷. 基于机器学习的自动文摘研究综述[J]. 图书情报工作, 2014, 58(18): 122-130.

[12] Das D, Martins A. A survey on automatic text summarization[EB/OL].[2014-07-28].http://stuyresearch.googlecode.com/hg-history/132ed87460529c48ae57bc388ef1083ba07791a5/blake/resources/das-martins.07.pdf.

[13] Harashima J, Kurohashi S. Summarizing search results using PLSI[C]//Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era. Beijing:Chinese Information Processing Society of China, 2010: 12-20.

[14] Ozsoy M, Alpaslan F, Cicekli I. Text summarization using latent semantic analysis[J]. Journal of Information Science, 2011, 37(4): 405-417.

[15] 吴晓锋, 宗成庆. 一种基于LDA的CRF自动文摘方法[J]. 中文信息学报, 2009, 23(6): 39-45.

[16] Hahn U, Mani I. The challenges of automatic summarization[J]. Computer, 2000, 33(11): 29-36.

[17] 奉国和, 郑伟. 国内中文自动分词技术研究综述[J]. 图书情报工作, 2011, 55(2): 41-45.

[18] 黄昌宁, 赵海.中文分词十年回顾[J].中文信息学报, 2007, 21(3):8-19.

[19] 王丹,杨晓蓉. 自动标引中的歧义词消除方法研究[J]. 图书情报工作, 2014, 58(5): 93-97.

[20] 修驰,宋柔. 基于无监督学习的专业领域分词歧义消解方法[J]. 计算机应用, 2013, 33(3): 780-783.

[21] 修驰,宋柔.基于“固结词串”实例的中文分词研究[J]. 中文信息学报, 2012, 26(3): 59-64.

[22] 罗智勇,宋柔.现代汉语通用分词系统中歧义切分的实用技术[J].计算机研究与发展,2006,43(6): 1122-1128.

[23] 徐坤, 曹锦丹. 基于领域文献的未登录词识别方法研究[J]. 情报杂志, 2012, 31(1): 172-174.

[24] 熊回香, 夏立新. 汉语分词技术综述[J]. 图书情报工作, 2008, 52(4): 81-84.

[25] 张海军,史树敏,朱朝勇,等.中文新词识别技术综述[J]. 计算机科学, 2010,37(3): 6-10.

[26] 邹纲,刘洋,刘群,等.面向Internet的中文新词语检测[J].中文信息学报, 2004, 18(6): 1-9.

[27] 程冲, 黄水清. 自适应分词算法中的未登录词识别技术研究[J]. 情报学报, 2009, 28(4): 530-536.

[28] 张海军,史树敏,丁溪源,等.基于分词提取重复串的未登录词遗漏量化模型[J].中文信息学报, 2011, 25(2): 122-128.

[29] Zhang Huaping, Gao Jian, Mo Qian, et al. Incorporating new words detection with Chinese word segmentation[C]//Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. Beijing:Chinese Information Processing Society of China, 2010:249-251.

[30] 吴悦,燕鹏举,翟鲁峰.基于二元背景模型的新词发现[J].清华大学学报(自然科学版),2011,51(9): 1317-1320.

[31] 段宇锋, 鞠菲. 基于N-Gram的专业领域中文新词识别研究[J]. 现代图书情报技术, 2012,28(2): 41-47.

[32] 张海军,栾静,李勇,等.基于统计学习框架的中文新词检测方法[J].计算机科学, 2012, 39(2): 232-235.

[33] 陈飞,刘奕群,魏超,等.基于条件随机场方法的开放领域新词发现[J].软件学报, 2013, 24(5):1051-1060.

[34] Liang Zheng, Xu Bingying, Zhao Jie, et al. Chinese new words detection using mutual information[M]//Trustworthy Computing and Services Communications in Computer and Information Science. Berlin: Springer, 2013: 341-348.

[35] 霍帅, 张敏, 刘奕群, 等. 基于微博内容的新词发现方法[J]. 模式识别与人工智能, 2014,27 (2):141-145.

[36] 张海军, 冯冲, 史树敏, 等. 一种应用组合特征的中文未登录词词性猜测研究[J]. 小型微型计算机系统, 2010, 31(7): 1402-1406.

[37] 秦兵, 刘挺, 李生. 多文档自动文摘综述[J]. 中文信息学报, 2005, 19(6):13-20.

[38] Atkinson J, Munoz R. Rhetorics-based multi- document summarization[J]. Expert Systems with Applications, 2013,40(11):4346-4352.

[39] 王红玲,周国栋,朱巧明.面向冗余度控制的中文多文档自动文摘[J].中文信息学报, 2012, 26(2): 92-96.

[40] 程传鹏,杨要科.自动文摘中的冗余句消除方法[J]. 计算机应用, 2011, 31(12): 3275-3277.

[41] Ferreira R, Cabral L, Lins R, et al. Assessing sentence scoring techniques for extractive text summarization[J]. Expert Systems with Applications, 2013, 40(14): 5755-5764.

[42] Zhang Peiying, Li Cunhe. Automatic text summarization based on sentences clustering and extraction[C]//Proceedings of the 2nd IEEE International Conference on Computer Science and Information Technology. Beijing:IEEE, 2009: 167-170.

[43] Jiang Changjin, Peng Hong, Ma Qianli, et al. Automatic summarization for Chinese text based on combined words recognition and paragraph clustering[C] //Proceedings of the Third International Symposium on Intelligent Information Technology and Security Informatics. Jinggangshan:IEEE, 2010: 591-594.

[44] Ferreira R, de Souza Cabral L, Freitas F, et al. A multi-document summarization system based on statistics and linguistic treatment[J]. Expert Systems with Applications, 2014, 41(13): 5780-5787.

[45] 沈洲, 王永成, 许一震, 等. 自动文摘系统评价方法的研究与实践[J]. 情报学报, 2001, 20(1): 66-72.

[46] 张瑾, 王小磊, 许洪波. 自动文摘评价方法综述[J]. 中文信息学报, 2008, 22(3): 81-88.

[47] 黄丽琼,何中市,张杰慧.基于文本相似度的自动文摘评价方法[J]. 计算机应用研究, 2007, 24(8): 97-99.

[48] 刘德喜, 姬东鸿. 基于基本要素的文摘内容连贯性评测模型[J].计算机学报, 2008, 31(4): 628-635.

[49] 龙华, 何中市, 伍星, 等. 基于依存内容单元的金字塔自动摘要评估[J].计算机工程, 2009, 35(13): 8-10.

[50] 罗文娟, 马慧芳, 何清, 等. 权衡熵和相关度的自动摘要技术研究[J]. 中文信息学报, 2011, 25(5): 9-16.

[51] 傅间莲,陈群秀.一种新的自动文摘系统评价方法[J]. 计算机工程与应用, 2006, 42(18): 176-177.

[52] 张剑峰,夏云庆,姚建民.微博文本处理研究综述[J]. 中文信息学报, 2012, 26(4): 21-27.

[53] 文坤梅,徐帅,李瑞轩,等.微博及中文微博信息处理研究综述[J].中文信息学报, 2013, 26(6): 27-37.

[54] 王连喜.微博短文本预处理及学习研究综述[J].图书情报工作,2013,57(11):125-131.

[55] 刘德喜,万常选.社会化短文本自动摘要研究综述[J].小型微型计算机系统, 2013, 34(12): 2764-2771.

[56] 刘金岭,倪晓红,王新功.手机短信文本信息流的自动文摘生成[J].现代图书情报技术, 2013, 29(2): 43-49.

[57] Chakrabarti D, Punera K. Event summarization using tweets[C]//Proceedings of the Fifth International AAAI Conference on Weblogs and Social. Barcelona:AAAI, 2011: 1-8.

[58] Nichols Je, Mahmud J, Drews C. Summarizing sporting events using Twitter[C]//Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces. Lisbon:ACM, 2012: 189-198.

[59] Inouye D. Multiple post microblog summarization[EB/OL].[2014-07-27]. http://www.cs.uccs.edu/~kalita/work/reu/REUFinalPapers2010/Inouye.pdf.

[60] Olariu A. Efficient online summarization of microblogging streams[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg:Association for Computational Linguistics, 2014:236-240.

[61] Sharifi B, Hutton M, Kalita J. Summarizing microblogs automatically[C]//Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL. Los Angeles:Association for Computational Linguistics, 2010: 685-688.

[62] Long Rui, Wang Haofen, Chen Yuqiang, et al. Towards effective event detection, tracking and summarization on microblog data[C]//Proceedings of the 12th International Conference on Web-Age Information Management.Berlin:Springer,2011: 652-663.

[63] Kim T, Kim J, Lee J, et al. A tweet summarization method based on a keyword graph[C]//Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication.New York:ACM, 2014: 96-103.

[64] Yang Zi, Cai Keke, Tang Jie, et al. Social context summarization[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing: ACM, 2011: 255-264.

[65] 李劲, 张华, 吴浩雄, 等. 基于特定领域的中文微博热点话题挖掘系统BTopicMiner[J]. 计算机应用, 2012, 32(8): 2346-2349.

[66] 赵斌, 吉根林, 曲维光, 等.基于转发图的微博事件主题摘要方法[J].南京师范大学学报(自然科学版), 2014,37(1):66-70.

[67] Gao Dehong, Li Wenjie, Cai Xiaoyan, et al. Sequential summarization:A full view of twitter trending topics[J]. IEEE/ ACM Transactions on Audio, Speech and Language Processing, 2014, 22(2): 293-302.

[68] Wan Xiaojun, Jia Houping, Huang Shanshan, et al. Summarizing the differences in multilingual news[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing:ACM, 2011: 735-744.

[69] Sarkar K. Multilingual summarization approaches[EB/OL].[2014-07-28].http://www.igi-global.com/chapter/multilingual-summarization-approaches/108720.

[70] Lee B. On multilingual, multi-document summarization[D]. Vermont: Middlebury College, 2013.

[71] Litvak M, Last M, Friedman M. A new approach to improving multilingual summarization using a genetic algorithm[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala:Association for Computational Linguistics, 2010:927-936.

[72] 柯修,王惠临. 基于混合方法的多语言多文档自动摘要系统构建及实现[J].图书馆学研究, 2013(2):66-72.

[73] Kamal S. A keyphrase-based approach to text summarization for English and bengali documents[J]. International Journal of Technology Diffusion, 2014, 5(2):28-38.

[74] Steinberger J, Turchi M. Machine translation for multilingual summary content evaluation[C]// Proceedings of the Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Montréal:Association for Computational Linguistics,2012:19-27.

[75] Wan Xiaojun, Li Huiying, Xiao Jianguo. Cross-language document summarization based on machine translation quality prediction[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala:Association for Computational Linguistics,2010:917-926.

[76] Wan Xiaojun. Using bilingual information for cross-language document summarization[C] //Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.Oregon:Association for Computational Linguistics,2011: 1546-1555.

[77] Boudin F, Huet S, Torres-Moreno J. A graph-based approach to cross-language multi- document summarization[J]. Polibits, 2011 (43):113-118.

Outlines

/