图书情报工作 ›› 2019, Vol. 63 ›› Issue (14): 101-110.DOI: 10.13266/j.issn.0252-3116.2019.14.012

• 知识组织 • 上一篇    下一篇

专业社交媒体中的主题知识元抽取方法研究

林杰, 苗润生, 张振宇   

  1. 同济大学经济与管理学院 上海 200092
  • 收稿日期:2018-08-12 修回日期:2019-02-24 出版日期:2019-07-20 发布日期:2019-07-20
  • 通讯作者: 苗润生(ORCID:0000-0002-4784-1654),博士研究生,通讯作者,E-mail:rmiao@tongji.edu.cn
  • 作者简介:林杰(ORCID:0000-0002-5421-603X),教授,博士,博士生导师;张振宇(ORCID:0000-0002-4888-4023),博士研究生。
  • 基金资助:
    本文系国家自然科学基金面上项目"社交媒体中用户创新价值度测量模型及互动创新管理方法研究"(项目编号:71672128)和同济大学基本科研业务费专项资金项目"基于大数据的社交网络传播机理与模型研究"(项目编号:1200219368)研究成果之一。

Research on Extraction Methods of Topic Knowledge Tuples in Professional Social Media

Lin Jie, Miao Runsheng, Zhang Zhenyu   

  1. School of Economics and Management, Tongji University, Shanghai 200092
  • Received:2018-08-12 Revised:2019-02-24 Online:2019-07-20 Published:2019-07-20

摘要: [目的/意义]以汽车论坛例,提出一种针对专业社交媒体文本的主题知识元抽取方法。[方法/过程]首先,通过LDA模型提取出汽车论坛中文本的主题,并进行去重,形成主题列表;其次,基于融合主题特征的深度学习模型T-LSTM模型构建适于汽车论坛本文的情感分析模型;然后,通过计算各词汇在图模型TextRank中的重要性与各词汇的Word2Vec主题相似度,抽取情感关键词与关键句,用于对文本主题与情感倾向的解释与补充;最后,对上述方法进行集成,输出结构化的主题知识元。[结果/结论]实验结果中,抽取得到的主题知识元合格率达到69.1%,表明本文提出的主题知识元抽取方法,能够围绕知识主题较为准确地抽取知识元,实现知识的结构化转换。

关键词: 主题知识元, 主题抽取, 长短期记忆神经网络, 情感分析

Abstract: [Purpose/significance] Topic knowledge tuple is a knowledge unit for operating and managing knowledge oriented to knowledge themes. Accurately extracting topic knowledge tuples facilitates the storage, expression and retrieval of knowledge, and realizes knowledge creation and knowledge evaluation in the process of using knowledge. Therefore, this article discusses the existing extraction methods and then, by taking car products as an example, comes up with a method of extracting topic knowledge tuples from professional social media.[Method/process] First of all, this paper extracted a theme list from the users' comments in car forums with the LDA model. Secondly, based on the deep learning model T-LSTM which integrated thematic features, a sentiment analysis model suitable for the corpus of users in car forums was built. Then, by calculating the importance of each word in the TextRank diagram model and the similarity of each word's Word2Vec topic, we extracted key words and key sentences for the purpose of interpreting the extracted theme and sentiment orientation. Finally, the above methods were encapsulated into an integrated topic knowledge tuple extraction method.[Result/conclusion] In the experimental results, the qualification rate of extracted topic knowledge tuples reaches 69.1%. Experimental results show that the proposed method in this paper is capable of refining and extracting each element of knowledge tuples around the topic, meanwhile it can transforms unstructured information into structural knowledge.

Key words: topic knowledge tuple, topic model, Long short-term memory (LSTM), sentiment analysis

中图分类号: