图书情报工作 ›› 2022, Vol. 66 ›› Issue (13): 118-126.DOI: 10.13266/j.issn.0252-3116.2022.13.011

• 知识组织 • 上一篇    下一篇

融合主题聚类和语义图模型的产品评论自动摘要方法研究

谷莹, 李贺, 祝琳琳   

  1. 吉林大学商学与管理学院 长春 130022
  • 收稿日期:2022-01-13 修回日期:2022-05-08 出版日期:2022-07-05 发布日期:2022-07-06
  • 通讯作者: 李贺,教授,博士生导师,通信作者,E-mail:lihe200303@163.com。
  • 作者简介:谷莹,博士研究生;祝琳琳,博士,在站博士后。
  • 基金资助:
    本文系国家自然科学基金项目"基于图模型的多源异构在线产品评论数据融合与知识发现研究"(项目编号:71974075)和中国博士后科学基金资助项目(项目编号:2021M701397)研究成果之一。

Research on Automatic Summarization Method of Product Reviews Based on Topic Clustering and Semantic Graph Mode

Gu Ying, Li He, Zhu Linlin   

  1. School of Business and Management, Jilin University, Changchun 130022
  • Received:2022-01-13 Revised:2022-05-08 Online:2022-07-05 Published:2022-07-06

摘要: [目的/意义]社交网络快速发展的时代,越来越需要自动摘要技术来解决产品评论信息过载。针对现有图模型方法在评论摘要抽取中存在信息不充分、准确性差的问题,提出一种融合主题聚类和语义图模型的多文本摘要方法。[方法/过程]首先运用FCM(Fuzzy C-means)聚类算法对评论文本进行主题划分;然后利用Word2vec模型获取分类评论句子的向量化表达,并根据句子间的语义相似度进行图模型构建;最后利用加权图排序算法,自动抽取出重要性高的句子形成文本摘要。[结果/结论]实验结果显示,该方法能有效识别出产品评论的关键内容,与传统方法相比,融合主题聚类和语义图模型的方法在信息覆盖率和信息多样性指标方面得到了更高的分数,提高了摘要抽取的质量和效率。

关键词: 产品评论, 自动摘要, 主题聚类, 语义图模型

Abstract: [Purpose/Significance] In the era of rapid development of social networks, automatic summarization technology is increasingly required to cope with the overload of product review information. Aiming at the problems of incomplete information and poor accuracy in the extraction of comment abstracts by existing graph model methods, this paper proposes a multi text summarization method based on topic clustering and semantic graph model. [Method/Process] Firstly, the Fuzzy C-means(FCM) clustering algorithm was used to achieve topic division of review texts. Secondly, the Word2vec model was used to obtain the vectorized representation of the classified comment sentences, and the graph model was constructed according to the semantic similarity between the sentences. Finally, the weighted graph sorting algorithm was used to automatically extract sentences with high importance to form text summaries. [Result/Conclusion] The results show that this method can effectively identify the core content of product reviews. Compared with traditional methods, the method that fused topic clustering and semantic graph model achieves higher scores in information coverage and information diversity indicators, improving the quality and efficiency of abstract extraction.

Key words: product reviews, automatic summarization, topic clustering, semantic graph model

中图分类号: