图书情报工作 ›› 2019, Vol. 63 ›› Issue (22): 21-30.DOI: 10.13266/j.issn.0252-3116.2019.22.003

• 专题:用户画像研究 • 上一篇    下一篇

基于网络结构和文本内容的群体画像构建方法研究

邱云飞, 张伟竹   

  1. 辽宁工程技术大学软件学院 葫芦岛 125105
  • 收稿日期:2019-03-31 修回日期:2019-06-14 出版日期:2019-11-20 发布日期:2019-11-20
  • 通讯作者: 张伟竹(ORCID:0000-0001-5450-8342),硕士研究生,通讯作者,E-mail:1426483346@qq.com。
  • 作者简介:邱云飞(ORCID:0000-0002-2061-6617),副院长,教授,博士
  • 基金资助:
    本文系国家自然科学基金青年科学基金项目"二向性反射分布函数的先验知识耦合式融合方法研究"(项目编号:61401185)研究成果之一。

Study for the Construction Method of Group Profile Based on Network Structure and Text Content

Qiu Yunfei, Zhang Weizhu   

  1. Liaoning Technical University, Huludao 125105
  • Received:2019-03-31 Revised:2019-06-14 Online:2019-11-20 Published:2019-11-20

摘要: [目的/意义] 在基于社会网络的用户画像研究中,针对传统用户建模难以处理复杂网络关系,群体构建多基于内容,以及群体相似度低或紧密性差的问题,提出基于网络结构和文本内容的群体画像构建方法。[方法/过程] 首先,采用卷积神经网络方法,融合网络结构和文本内容两方面特征将网络用户表示成空间向量,其次,在k-means算法基础上结合模块度计算方法,对空间向量进行聚类,然后,在爬取的中英文数据集上分别进行对比研究,最后,从中文数据集中选取1 000名重要性用户进行实例分析。[结果/结论] 实验结果表明,该方法的密度值比基于内容的方法平均增加0.105,熵值比基于结构(含基于结构和内容)的方法平均减少0.955,实例分析进一步说明文中方法的可行性。

关键词: 社会网络, 网络关系, 文本内容, 深度学习, 聚类算法, 用户画像

Abstract: [Purpose/significance] In the study of user profile based on social network, aiming at the problems that traditional user modeling is difficult to deal with the complex network relationship, group construction is mostly based on content, and the group is low similarity or poor tightness, a construction method of group profile based on network structure and text content is proposed.[Method/process] Firstly, using the convolutional neural network method, the network structure and the text content are combined to represent the network user as a space vector. Secondly, based on the k-means algorithm, the modularity calculation method is combined to cluster the space vector. In the crawled Chinese and English datasets, a comparative study is conducted. Finally, 1000 important users are selected from the Chinese dataset for instance analysis.[Result/conclusion] The experimental results show that the density value of this method is increased by 0.105 compared with the content-based method, and the entropy value decreases by 0.955 on average compared with the structure-based (including structure-based and content-based) method. The instance analysis further illustrates the feasibility of the proposed method.

Key words: social network, network relationship, text content, deep learning, clustering algorithm, user profile

中图分类号: