图书情报工作 ›› 2019, Vol. 63 ›› Issue (14): 83-93.DOI: 10.13266/j.issn.0252-3116.2019.14.010

• 情报研究 • 上一篇    下一篇

中国大学生的网络使用:基于大规模日志分析的模式识别新方法

严承希, 王军, 王珂   

  1. 北京大学信息管理系 北京 100871
  • 收稿日期:2018-12-06 修回日期:2019-03-06 出版日期:2019-07-20 发布日期:2019-07-20
  • 通讯作者: 王军(ORCID:0000-0003-2850-0624),教授,博士生导师,通讯作者,E-mail:junwang@pku.edu.cn
  • 作者简介:严承希(ORCID:0000-0003-1128-550X),博士研究生;王珂(ORCID:0000-0002-9941-1664),硕士研究生。

Chinese College Students' Internet Use: A New Method of Behavior Pattern Recognition with Massive Log Analysis

Yan Chengxi, Wang Jun, Wang Ke   

  1. Department of Information Management, Peking University, Beijing 100871
  • Received:2018-12-06 Revised:2019-03-06 Online:2019-07-20 Published:2019-07-20

摘要: [目的/意义]深入挖掘和准确理解中国大学生日常网络行为模式,不仅对促进用户行为和检索领域的发展具有巨大的理论意义,而且在提升面向大学生用户的企业个性化服务与信息推荐能力方面也具有潜在的社会价值和实践意义。[方法/过程]提出一种基于大规模日志分析的大学生用户行为模式识别新方法,该方法包括一种基于深度学习和文本分析技术的半监督学习算法"MaxMatching"以及混合两种特征熵(香农熵与真实熵)的聚类模型。[结果/结论]实证结果表明本方法不仅在算法和结果解释上具有一定的优势,而且能从网络使用能力、访问时序性和主题倾向性三方面归纳与呈现中国大学生网络行为全方位模式。该方法和结论有效地拓展了信息检索领域查询项的语义化理解方面的方法体系,也为企业提升面向大学生用户的个性化信息推荐服务提供一定的参考和可行性意见。

关键词: 中国大学生, 网络行为, 模式识别, 大规模日志分析

Abstract: [Purpose/significance] It is of great significance to analyze and understand users' daily Web behavior patterns, which not only makes progress in the domain of user behavior analyse and information retrieval theoretically, but also has potential social values and practical significance in promoting personalized service and information recommendation for the undergraduate-oriented enterprises.[Method/process] In this paper, a new method for college students' behavior Web pattern recognition based on large-scale log analysis was proposed. It included a semi-supervised learning algorithm "MaxMatching" based on deep learning and text analysis, and a hybrid model combined with two characteristic entropy (Shannon Entropy and Real Entropy).[Result/conclusion] The empirical results showed that this method has the excellent performance in the algorithm and the result interpretation. Also, it can generalize and present all-round Chinese college students' Web behavior pattern in three aspects of network ability, temporality and topicality. The method and conclusion can effectively expand the methods about semantic understanding of queries in information retrieval, and provide some reference and feasible suggestions to undergraduate-oriented enterprises on personalized recommendation service.

Key words: Chinese students, online behavior, pattern recognition, massive log analysis

中图分类号: