专题:用户画像研究

基于网络结构和文本内容的群体画像构建方法研究

  • 邱云飞 ,
  • 张伟竹
展开
  • 辽宁工程技术大学软件学院 葫芦岛 125105
邱云飞(ORCID:0000-0002-2061-6617),副院长,教授,博士

收稿日期: 2019-03-31

  修回日期: 2019-06-14

  网络出版日期: 2019-11-20

基金资助

本文系国家自然科学基金青年科学基金项目"二向性反射分布函数的先验知识耦合式融合方法研究"(项目编号:61401185)研究成果之一。

Study for the Construction Method of Group Profile Based on Network Structure and Text Content

  • Qiu Yunfei ,
  • Zhang Weizhu
Expand
  • Liaoning Technical University, Huludao 125105

Received date: 2019-03-31

  Revised date: 2019-06-14

  Online published: 2019-11-20

摘要

[目的/意义] 在基于社会网络的用户画像研究中,针对传统用户建模难以处理复杂网络关系,群体构建多基于内容,以及群体相似度低或紧密性差的问题,提出基于网络结构和文本内容的群体画像构建方法。[方法/过程] 首先,采用卷积神经网络方法,融合网络结构和文本内容两方面特征将网络用户表示成空间向量,其次,在k-means算法基础上结合模块度计算方法,对空间向量进行聚类,然后,在爬取的中英文数据集上分别进行对比研究,最后,从中文数据集中选取1 000名重要性用户进行实例分析。[结果/结论] 实验结果表明,该方法的密度值比基于内容的方法平均增加0.105,熵值比基于结构(含基于结构和内容)的方法平均减少0.955,实例分析进一步说明文中方法的可行性。

本文引用格式

邱云飞 , 张伟竹 . 基于网络结构和文本内容的群体画像构建方法研究[J]. 图书情报工作, 2019 , 63(22) : 21 -30 . DOI: 10.13266/j.issn.0252-3116.2019.22.003

Abstract

[Purpose/significance] In the study of user profile based on social network, aiming at the problems that traditional user modeling is difficult to deal with the complex network relationship, group construction is mostly based on content, and the group is low similarity or poor tightness, a construction method of group profile based on network structure and text content is proposed.[Method/process] Firstly, using the convolutional neural network method, the network structure and the text content are combined to represent the network user as a space vector. Secondly, based on the k-means algorithm, the modularity calculation method is combined to cluster the space vector. In the crawled Chinese and English datasets, a comparative study is conducted. Finally, 1000 important users are selected from the Chinese dataset for instance analysis.[Result/conclusion] The experimental results show that the density value of this method is increased by 0.105 compared with the content-based method, and the entropy value decreases by 0.955 on average compared with the structure-based (including structure-based and content-based) method. The instance analysis further illustrates the feasibility of the proposed method.

参考文献

[1] 何娟. 基于用户个人及群体画像相结合的图书个性化推荐应用研究[J].情报理论与实践,2019,42(1):129-133,160.
[2] ZHAO W X, WANG J, HE Y, et al. Mining product adopter information from online reviews for improving product recommendation[J]. ACM transactions on knowledge discovery from data, 2016, 10(3):1-23.
[3] 刘海, 卢慧, 阮金花, 等. 基于"用户画像"挖掘的精准营销细分模型研究[J].丝绸,2015,52(12):37-42,47.
[4] ALAOUI S, AJHOUN R, IDRISSI Y E B E, et al. Semantic approach for the building of user profile for recommender system[C]//Global summit on computer & information technology. Sousse:IEEE, 2016:114-119.
[5] ZHAO W X, GUO Y, HE Y, et al. We know what you want to buy:a demographic-based system for product recommendation on microblogs[C]//ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2014:1935-1944.
[6] ZHAO W X, LI S, HE Y, et al. Exploring demographic information in social media for product recommendation[J]. Knowledge and information systems, 2016, 49(1):61-89.
[7] 单晓红, 张晓月, 刘晓燕. 基于在线评论的用户画像研究——以携程酒店为例[J].情报理论与实践,2018,41(4):99-104,149.
[8] 余传明, 田鑫, 郭亚静, 等. 基于行为-内容融合模型的用户画像研究[J].图书情报工作,2018,62(13):54-63.
[9] 郭光明. 基于社交大数据的用户信用画像方法研究[D].合肥:中国科学技术大学,2017.
[10] 范晓玉, 窦永香, 赵捧未, 等. 融合多源数据的科研人员画像构建方法研究[J].图书情报工作,2018,62(15):31-40.
[11] MISLOVE A, VISWANATH B, GUMMADI K P, et al. You are who you know:inferring user profiles in online social networks[C]//ACM international conference on web search and data mining. New York:ACM,2010:251-260.
[12] 曹玖新, 吴江林, 石伟, 等. 新浪微博网信息传播分析与预测[J].计算机学报,2014,37(4):779-790.
[13] 刘勘, 袁蕴英, 刘萍. 基于随机森林分类的微博机器用户识别研究[J].北京大学学报(自然科学版),2015,51(2):289-300.
[14] 徐志明, 李栋, 刘挺, 等. 微博用户的相似性度量及其应用[J].计算机学报,2014,37(1):207-218.
[15] 林燕霞, 谢湘生. 基于社会认同理论的微博群体用户画像[J].情报理论与实践,2018,41(3):142-148.
[16] 张宏鑫, 盛风帆, 徐沛原, 等. 基于移动终端日志数据的人群特征可视化[J].软件学报,2016,27(5):1174-1187.
[17] 熊伟, 杭波, 李兵, 等. 一种集成用户画像与内容的服务重定向方法[J].小型微型计算机系统,2017,38(12):2762-2765.
[18] BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of statistical mechanics:theory and experiment, 2008(10):10008-10019.
[19] LESKOVEC J, LANG K J, MAHONEY M W. Empirical comparison of algorithms for network community detection[C]//ACM international conference on World Wide Web. Raleigh:ACM, 2010:631-640.
[20] STEINHAEUSER K, CHAWLA N V. Identifying and evaluating community structure in complex networks[J]. Pattern recognition letters, 2010, 31(5):413-421.
[21] ZHOU Y, CHENG H, YU J X. Graph clustering based on structural/attribute similarities[J]. Proceedings of the VLDB endowment, 2009, 2(1):718-729.
[22] XU Z, KE Y, WANG Y, et al. A model-based approach to attributed graph clustering[C]//ACM SIGMOD international conference on management of data. Scottsdale:ACM, 2012:505-516.
[23] 陈克寒, 韩盼盼, 吴健. 基于用户聚类的异构社交网络推荐算法[J].计算机学报,2013,36(2):349-359.
[24] 吴树芳,徐建民,武晓波. 融合用户标签和关系的微博用户相似性度量[J].情报杂志,2014,33(12):170-173,126.
[25] TANG J, QU M, WANG M, et al. LINE:large-scale information network embedding[C]//International conference on World Wide Web. Florence:WWW, 2015:1067-1077.
[26] NEWMAN M E J. Fast algorithm for detecting community structure in networks[J]. Physical review e statistics nonlinear soft matter physics, 2003, 69(6):066133.
[27] MCCALLUM A K, NIGAM K, RENNIE J, et al. Automating the construction of internet portals with machine learning[J]. Information retrieval journal,2000, 3(2):127-163.
[28] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3(1):993-1022.
[29] 潘理, 吴鹏, 黄丹华. 在线社交网络群体发现研究进展[J].电子与信息学报,2017,39(9):2097-2107.
文章导航

/