图书情报工作 ›› 2022, Vol. 66 ›› Issue (4): 129-141.DOI: 10.13266/j.issn.0252-3116.2022.04.013

• 情报研究 • 上一篇    下一篇

搜索引擎全量数据的用户画像模型研究——设计与实证

吴文瀚   

  1. 上海大学图书情报档案系 上海 200444
  • 收稿日期:2021-07-02 修回日期:2021-09-09 出版日期:2022-02-20 发布日期:2022-03-01
  • 作者简介:吴文瀚,博士研究生,E-mail:wuwenhan000@163.com。

Research on User Portrait Model of the Full Data of Search Engines: Design and Empirical Study

Wu Wenhan   

  1. Department of Library, Information and Archives, Shanghai University, Shanghai 200444
  • Received:2021-07-02 Revised:2021-09-09 Online:2022-02-20 Published:2022-03-01

摘要: [目的/意义] 基于某搜索引擎5亿全量数据,本文设计研究年轻用户大数据画像分析的总体模型和详细的研究流程,以建立用户画像的基本方法论。[方法/过程] 借助数据分析与数据验证过程的结合,通过KL散度和AIO社会学模型选取有代表性的计算样本和标签样本,并利用CH-Score和SH-Score明确算法与相关参数,利用聚类算法,通过TGI解读集群数据结果,最终利用关联规则发现年轻用户的汽车需求。[结果/结论] 研究将年轻用户18-24岁代际分为5类,25-34岁分为4类,以这共计9类群体验证模型和流程的有效性,最终完成大数据用户画像从0到1的方法论的建立,并在其中融合调研方法与大数据方法。

关键词: 大数据, 用户画像, 研究模型

Abstract: [Purpose/significance] This research designs a research model of big data portraits of young users based on 500 million full data of a search engine, including the overall model and detailed research process to establish the basic methodology of user portraits.[Method/process] With the combined process of data analysis and data verification, this paper selected representative calculation samples and label samples through KL divergence and AIO sociology model, and used CH-Score and SH-Score to clarify the algorithm and related parameters, used K-Means clustering algorithm and TGI to interpret cluster data. Finally, the association rules were used to discover the young users' car needs.[Result/conclusion] This research divides young users 18-24 years old into 5 categories, 25-34 years old into 4 categories, in total of 9 groups, to verify the effectiveness of the model and process. This article completes the methodological establishment of big data user portraits from 0 to 1, and fully integrates research methods and big data methods in itl

Key words: big data, user portrait, research model

中图分类号: