[Purpose/significance] In the study of user profile based on social network, aiming at the problems that traditional user modeling is difficult to deal with the complex network relationship, group construction is mostly based on content, and the group is low similarity or poor tightness, a construction method of group profile based on network structure and text content is proposed.[Method/process] Firstly, using the convolutional neural network method, the network structure and the text content are combined to represent the network user as a space vector. Secondly, based on the k-means algorithm, the modularity calculation method is combined to cluster the space vector. In the crawled Chinese and English datasets, a comparative study is conducted. Finally, 1000 important users are selected from the Chinese dataset for instance analysis.[Result/conclusion] The experimental results show that the density value of this method is increased by 0.105 compared with the content-based method, and the entropy value decreases by 0.955 on average compared with the structure-based (including structure-based and content-based) method. The instance analysis further illustrates the feasibility of the proposed method.
[1] 何娟. 基于用户个人及群体画像相结合的图书个性化推荐应用研究[J].情报理论与实践,2019,42(1):129-133,160.
[2] ZHAO W X, WANG J, HE Y, et al. Mining product adopter information from online reviews for improving product recommendation[J]. ACM transactions on knowledge discovery from data, 2016, 10(3):1-23.
[3] 刘海, 卢慧, 阮金花, 等. 基于"用户画像"挖掘的精准营销细分模型研究[J].丝绸,2015,52(12):37-42,47.
[4] ALAOUI S, AJHOUN R, IDRISSI Y E B E, et al. Semantic approach for the building of user profile for recommender system[C]//Global summit on computer & information technology. Sousse:IEEE, 2016:114-119.
[5] ZHAO W X, GUO Y, HE Y, et al. We know what you want to buy:a demographic-based system for product recommendation on microblogs[C]//ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2014:1935-1944.
[6] ZHAO W X, LI S, HE Y, et al. Exploring demographic information in social media for product recommendation[J]. Knowledge and information systems, 2016, 49(1):61-89.
[7] 单晓红, 张晓月, 刘晓燕. 基于在线评论的用户画像研究——以携程酒店为例[J].情报理论与实践,2018,41(4):99-104,149.
[8] 余传明, 田鑫, 郭亚静, 等. 基于行为-内容融合模型的用户画像研究[J].图书情报工作,2018,62(13):54-63.
[9] 郭光明. 基于社交大数据的用户信用画像方法研究[D].合肥:中国科学技术大学,2017.
[10] 范晓玉, 窦永香, 赵捧未, 等. 融合多源数据的科研人员画像构建方法研究[J].图书情报工作,2018,62(15):31-40.
[11] MISLOVE A, VISWANATH B, GUMMADI K P, et al. You are who you know:inferring user profiles in online social networks[C]//ACM international conference on web search and data mining. New York:ACM,2010:251-260.
[12] 曹玖新, 吴江林, 石伟, 等. 新浪微博网信息传播分析与预测[J].计算机学报,2014,37(4):779-790.
[13] 刘勘, 袁蕴英, 刘萍. 基于随机森林分类的微博机器用户识别研究[J].北京大学学报(自然科学版),2015,51(2):289-300.
[14] 徐志明, 李栋, 刘挺, 等. 微博用户的相似性度量及其应用[J].计算机学报,2014,37(1):207-218.
[15] 林燕霞, 谢湘生. 基于社会认同理论的微博群体用户画像[J].情报理论与实践,2018,41(3):142-148.
[16] 张宏鑫, 盛风帆, 徐沛原, 等. 基于移动终端日志数据的人群特征可视化[J].软件学报,2016,27(5):1174-1187.
[17] 熊伟, 杭波, 李兵, 等. 一种集成用户画像与内容的服务重定向方法[J].小型微型计算机系统,2017,38(12):2762-2765.
[18] BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of statistical mechanics:theory and experiment, 2008(10):10008-10019.
[19] LESKOVEC J, LANG K J, MAHONEY M W. Empirical comparison of algorithms for network community detection[C]//ACM international conference on World Wide Web. Raleigh:ACM, 2010:631-640.
[20] STEINHAEUSER K, CHAWLA N V. Identifying and evaluating community structure in complex networks[J]. Pattern recognition letters, 2010, 31(5):413-421.
[21] ZHOU Y, CHENG H, YU J X. Graph clustering based on structural/attribute similarities[J]. Proceedings of the VLDB endowment, 2009, 2(1):718-729.
[22] XU Z, KE Y, WANG Y, et al. A model-based approach to attributed graph clustering[C]//ACM SIGMOD international conference on management of data. Scottsdale:ACM, 2012:505-516.
[23] 陈克寒, 韩盼盼, 吴健. 基于用户聚类的异构社交网络推荐算法[J].计算机学报,2013,36(2):349-359.
[24] 吴树芳,徐建民,武晓波. 融合用户标签和关系的微博用户相似性度量[J].情报杂志,2014,33(12):170-173,126.
[25] TANG J, QU M, WANG M, et al. LINE:large-scale information network embedding[C]//International conference on World Wide Web. Florence:WWW, 2015:1067-1077.
[26] NEWMAN M E J. Fast algorithm for detecting community structure in networks[J]. Physical review e statistics nonlinear soft matter physics, 2003, 69(6):066133.
[27] MCCALLUM A K, NIGAM K, RENNIE J, et al. Automating the construction of internet portals with machine learning[J]. Information retrieval journal,2000, 3(2):127-163.
[28] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3(1):993-1022.
[29] 潘理, 吴鹏, 黄丹华. 在线社交网络群体发现研究进展[J].电子与信息学报,2017,39(9):2097-2107.