[Purpose/Significance] In the era of rapid development of social networks, automatic summarization technology is increasingly required to cope with the overload of product review information. Aiming at the problems of incomplete information and poor accuracy in the extraction of comment abstracts by existing graph model methods, this paper proposes a multi text summarization method based on topic clustering and semantic graph model. [Method/Process] Firstly, the Fuzzy C-means(FCM) clustering algorithm was used to achieve topic division of review texts. Secondly, the Word2vec model was used to obtain the vectorized representation of the classified comment sentences, and the graph model was constructed according to the semantic similarity between the sentences. Finally, the weighted graph sorting algorithm was used to automatically extract sentences with high importance to form text summaries. [Result/Conclusion] The results show that this method can effectively identify the core content of product reviews. Compared with traditional methods, the method that fused topic clustering and semantic graph model achieves higher scores in information coverage and information diversity indicators, improving the quality and efficiency of abstract extraction.
[1] LI M, HUANG L, TAN C-H, et al. Helpfulness of online product reviews as seen by consumers:source and content features[J]. International journal of electronic commerce, 2013, 17(4):101-136.
[2] CHAKRABORTY U. The impact of source credible online reviews on purchase intention:the mediating roles of brand equity dimensions[J]. Journal of research in interactive marketing, 2019, 13(2):142-161.
[3] MARRESE-TAYLOR E, VELASQUEZ J D, BRAVO-MARQUEZ F. A novel deterministic approach for aspect-based opinion mining in tourism products reviews[J]. Expert systems with applications, 2014, 41(17):7764-7775.
[4] 张云纯,张琨,徐济铭,等.基于图模型的多文档摘要生成算法[J].计算机工程与应用, 2020, 56(16):124-131.
[5] GAMBHIR M, GUPTA V. Recent automatic text summarization techniques:a survey[J]. Artificial intelligence review, 2017, 47(1):1-66.
[6] 李金鹏,张闯,陈小军,等.自动文本摘要研究综述[J].计算机研究与发展, 2021, 58(1):1-21.
[7] LLORET E, PALOMAR M. A gradual combination of features for building automatic summarisation systems[C]//International conference on text, speech and dialogue. Berlin:Springer, 2009:16-23.
[8] 程园,斯拉木,哈斯木.基于综合的句子特征的文本自动摘要[J].计算机科学, 2015, 42(4):226-229.
[9] FATTAH M A. A hybrid machine learning model for multi-document summarization[J]. Applied intelligence, 2014, 40(4):592-600.
[10] 刘彼洋,孙锐,姬东鸿.基于矩阵分解和子模最大化的微博新闻摘要方法[J].计算机应用研究, 2017, 34(10):2892-2896.
[11] NGUYEN-HOANG T A, NGUYEN K, TRAN Q V. TSGVi:a graph-based summarization system for Vietnamese documents[J]. Journal of ambient intelligence and humanized computing, 2012, 3(4):305-313.
[12] SANKARASUBRAMANIAM Y, RAMANATHAN K, GHOSH S. Text summarization using Wikipedia[J]. Information processing&management, 2014, 50(3):443-461.
[13] 黄波,刘传才.基于加权TextRank的中文自动文本摘要[J].计算机应用研究, 2020, 37(2):407-410.
[14] MENG X, WANG H. Mining user reviews:from specification to summarization[C]//Proceedings of the ACL-IJCNLP 2009 conference short papers. Singapor:ACL, 2009:177-180.
[15] MEI J P, CHEN L. SumCR:A new subtopic-based extractive approach for text summarization[J]. Knowledge and information systems, 2012, 31(3):527-545.
[16] XU X, MENG T, CHENG X. Aspect-based extractive summarization of online reviews[C]//Proceedings of the 2011 ACM dymposium on applied computing. New York:ACM,2011:968-975.
[17] LLORET E, BOLDRINI E, VODOLAZOVA T, et al. A novel concept-level approach for ultra-concise opinion summarization[J]. Expert systems with applications, 2015, 42(20):7148-7156.
[18] GANESAN K, ZHAI C, HAN J. Opinosis:a graph based approach to abstractive summarization of highly redundant opinions[C]//Proceedings of the 23rd international conference on computational linguistics.Stroudsburg:ACL,2010:340-348
[19] 林莉媛,王中卿,李寿山,等.基于PageRank的中文多文档文本情感摘要[J].中文信息学报, 2014, 28(2):85-90.
[20] 荀静,杨玉珍.基于TextRank的文本情感摘要提取方法[J].计算机应用与软件, 2018, 35(10):80-84.
[21] 章成志,童甜甜,周清清.基于细粒度评论挖掘的书评自动摘要研究[J].情报学报, 2021, 40(2):163-172.
[22] ABDI A, HASAN S, SHAMSUDDIN S M, et al. A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion[J]. Knowledge-based systems, 2021, 213:106658.
[23] 高玮军,朱婧,赵华洋,等.基于TRF-IM模型的个性化酒店评论摘要生成[J/OL].计算机工程与应用:1-10[2022-04-18].http://kns.cnki.net/kcms/detail/11.2127.TP.20210915.1458.018.html.
[24] DUNN J C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters[J]. Journal of cybernetics, 1973, 3(3):32-57.
[25] BEZDEK J C, EHRLICH R, FULL W. FCM:the fuzzy c-means clustering algorithm[J]. Computers&geosciences, 1984, 10(2):191-203.
[26] 周开乐,杨善林,王晓佳,等.基于自适应模糊度参数选择改进FCM算法的负荷分类[J].系统工程理论与实践, 2014, 34(5):1283-1289.
[27] 高劲松,张俊丽.基于粒子群的模糊C均值文本聚类算法研究[J].图书情报工作, 2010, 54(6):57-60,65.
[28] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems. New York:Curran Associates, 2013:3111-3119.
[29] 杨楠,李亚平.基于Word2Vec模型特征扩展的Web搜索结果聚类性能的改进[J].计算机应用, 2019, 39(6):1701-1706.
[30] MIHALCEA R, TARAU P. TextRank:bringing order into texts[C]//Proceedings of the 2004 conference on empirical methods in natural language processing. Barcelona:ACL,2004:404-411.
[31] 李维,闫晓东,解晓庆.基于改进TextRank的藏文抽取式摘要生成[J].中文信息学报, 2020, 34(9):36-43.
[32] LOUIS A, NENKOVA A. Automatically assessing machine summary content without a gold standard[J]. Computational linguistics, 2013, 39(2):267-300.
[33] SINGH J P, IRANI S, RANA N P, et al. Predicting the"helpfulness"of online consumer reviews[J]. Journal of business research, 2017, 70:346-355.作者贡献说明:谷莹:数据采集与处理、实验设计、论文撰写;李贺:论文框架指导与确定;祝琳琳:论文审阅。