图书情报工作 ›› 2018, Vol. 62 ›› Issue (13): 54-63.DOI: 10.13266/j.issn.0252-3116.2018.13.008

• 情报研究 • 上一篇    下一篇

基于行为-内容融合模型的用户画像研究

余传明1, 田鑫1, 郭亚静1, 安璐2   

  1. 1. 中南财经政法大学信息与安全工程学院 武汉 430073;
    2. 武汉大学信息管理学院 武汉 430072
  • 收稿日期:2018-01-04 修回日期:2018-04-02 出版日期:2018-07-05 发布日期:2018-07-05
  • 通讯作者: 安璐(ORCID:0000-0002-5408-7135),教授,博士生导师,通讯作者,E-mail:anlu97@163.com
  • 作者简介:余传明(ORCID:0000-0001-7099-0853),副教授;田鑫(ORCID:0000-0001-8929-7151),硕士研究生;郭亚静(ORCID:0000-0003-1443-8399),硕士研究生。
  • 基金资助:
    本文系国家自然科学基金面上项目"大数据环境下基于领域知识获取与对齐的观点检索研究"(项目编号:71373286)和教育部哲学社会科学研究重大课题攻关项目"提高反恐怖主义情报信息工作能力对策研究"(项目编号:17JZD034)研究成果之一。

User Profiling Based on the Behaviour and Content Combined Model

Yu Chuanming1, Tian Xin1, Guo Yajing1, An Lu2   

  1. 1. School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073;
    2. School of Information Management, Wuhan University, Wuhan 430072
  • Received:2018-01-04 Revised:2018-04-02 Online:2018-07-05 Published:2018-07-05

摘要: [目的/意义]为识别并去除非理性投资者的网络评论,提升评论的专业程度与质量,促进理性投资,本文以识别股吧中的用户是否属于噪声投资者为研究任务,进行用户画像。[方法/过程]对股吧的用户发文内容进行深度用户表示学习(deep user representation learning),结合股吧用户的粉丝数量、影响力、关注量、自选股、吧龄、发帖量、评论量、访问量等行为特征,提出一种行为-内容融合模型(behaviour and content combined model,BCCM),并在标注数据集上进行实证与对比研究。[结果/结论]实验结果显示,该模型对噪声投资者识别的F1值为79.47%,优于决策树方法(69.90%)、SVM方法(75.61%)、KNN方法(73.21%)和ANN方法(74.83%)。在噪声投资者识别这一特定用户画像研究任务中,通过利用深度用户表示学习引入文本内容特征,能够显著提升用户画像的各种评价指标。

关键词: 用户画像, 情感分析, 用户表示学习, 特征融合

Abstract: [Purpose/significance] To identify and remove online reviews from irrational investors, enhance the professional degree and quality of comments, and to promote rational investment, this article takes identifying whether the users on the Guba website belong to the noise investors as an example, and carries out a user profiling study.[Method/process] Deep user representation learning method was used to learn text information such as users'posts, then a behavior and content combined model was proposed with respect to behavior characteristics such as fans number, influence, bar age, post number and so on, and an empirical and comparative study was done on the annotated data set.[Result/conclusion] Experiment result showed that the BCCM model got the F1 score of 79.47%, which is superior to Decision Tree model(69.90%), SVM model(75.61%), KNN model(73.21%) and ANN model(74.83%). In the specific user profiling task of identifying noise traders, by using deep user representation learning method to obtain text content characteristics, the various evaluation metrics of use profiling can be remarkably improved.

Key words: user modelling, emotional analysis, user representation learning, characteristic fusion

中图分类号: