Chinese College Students' Internet Use: A New Method of Behavior Pattern Recognition with Massive Log Analysis

  • Yan Chengxi ,
  • Wang Jun ,
  • Wang Ke
Expand
  • Department of Information Management, Peking University, Beijing 100871

Received date: 2018-12-06

  Revised date: 2019-03-06

  Online published: 2019-07-20

Abstract

[Purpose/significance] It is of great significance to analyze and understand users' daily Web behavior patterns, which not only makes progress in the domain of user behavior analyse and information retrieval theoretically, but also has potential social values and practical significance in promoting personalized service and information recommendation for the undergraduate-oriented enterprises.[Method/process] In this paper, a new method for college students' behavior Web pattern recognition based on large-scale log analysis was proposed. It included a semi-supervised learning algorithm "MaxMatching" based on deep learning and text analysis, and a hybrid model combined with two characteristic entropy (Shannon Entropy and Real Entropy).[Result/conclusion] The empirical results showed that this method has the excellent performance in the algorithm and the result interpretation. Also, it can generalize and present all-round Chinese college students' Web behavior pattern in three aspects of network ability, temporality and topicality. The method and conclusion can effectively expand the methods about semantic understanding of queries in information retrieval, and provide some reference and feasible suggestions to undergraduate-oriented enterprises on personalized recommendation service.

Cite this article

Yan Chengxi , Wang Jun , Wang Ke . Chinese College Students' Internet Use: A New Method of Behavior Pattern Recognition with Massive Log Analysis[J]. Library and Information Service, 2019 , 63(14) : 83 -93 . DOI: 10.13266/j.issn.0252-3116.2019.14.010

References

[1] 中国互联网络信息中心. 第41次《中国互联网络发展状况统计报告》[EB/OL].[2018-03-05].http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201803/P020180305409870339136.pdf.
[2] HASSAN M T, KARIM A. Impact of behavior clustering on Web surfer behavior prediction.[J]. Journal of information science & engineering, 2011, 27(6):1855-1870.
[3] JAMALI H R, NICHOLAS D, HUNTINGTON P. The use and users of scholarly e-journals:a review of log analysis studies[J]. Aslib proceedings, 2005, 57(57):554-571.
[4] KINNALLY W, LACAYO A, MCCLUNG S, et al. Getting up on the download:college students' motivations for acquiring music via the Web[J]. New media & society, 2008, 10(6):893-913.
[5] FORTSON B, SCOTTI J, CHEN Y C, et al. Internet use, abuse, and dependence among students at a southeastern regional university[J]. Journal of American college health, 2007, 56(2):137-144.
[6] WANG Y, NⅡYA M, MARK G, et al. Coming of age (digitally):an ecological view of social media use among college students[C]//Proceedings of the 18th ACM conference on computer supported cooperative work & social computing. New York:ACM, 2015:571-582.
[7] TENOPIR C. Use and users of electronic library resources:an overview and analysis of recent research studies[M]. Washington, DC:Council on library & information resources, 2003:72.
[8] MADDEN M, RAINIE L. Music and video downloading moves beyond P2P[M]. Washington, DC:Pew Internet and American life project, 2005.
[9] ZHANG P, LIU C. Personal information management practices of Chinese college students on their smartphones[C]//The third international symposium of Chinese CHI. New York:ACM, 2015:47-51.
[10] WU D, Liang S. Research on the follow-up actions of college students' mobile search[C]//Proceedings of the 16th ACM/IEEE-CS on joint conference on digital libraries. New York:ACM, 2016:59-62.
[11] BRODER A Z. A taxonomy of Web search[C]//Proceeding of ACM SIGIR forum. New York:ACM, 2002, 36(2):3-10.
[12] ALONSO O, STONE M. Building a query log via crowdsourcing[C]//Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. New York:ACM, 2014:939-942.
[13] GONZALEZ-CARO C, BAEZA-YATES R. A multi-faceted approach to query intent classification[C]//Proceedings of the 18th international conference on string processing and information retrieval. Berlin:Springer-Verlag, 2011:368-379.
[14] BAEZA-YATES R, CALDERON-BENAVIDES L, GONZALEZ-CARO C. The intention behind Web queries[C]//Proceedings of the 13th international conference on string processing and information retrieval. Berlin:Springer-Verlag,2006:98-109.
[15] KHUDABUKHSH A R, BENNETT P N, White R W. Building effective query classifiers:a case study in self-harm intent detection[C]//Proceedings of the 24th ACM international on conference on information and knowledge management. New York:ACM, 2017:1735-1738.
[16] MANSOURI B, ZAHEDI M S, CAMPOS R, et al. Online job search:study of users' search behavior using search engine query logs[C]//Proceedings of the 41th international ACM SIGIR conference on research & development in information retrieval. New York:ACM, 2018:1185-1188.
[17] KANG I H, KIM G C. Query type classification for Web document retrieval[C]//Proceeding of the 26th annual international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2003:64-71.
[18] SUN J, XU J, ZHENG K, et al. Interactive spatial keyword querying with semantics[C]//Proceedings of the 2017 ACM on conference on information and knowledge management.New York:ACM, 2017:1727-1736.
[19] GUO Q, AGICHTEIN E. Exploring mouse movements for inferring query intent[C]//Proceeding of the 31th annual international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2008:707-708.
[20] PUJERI R V, KARTHIK G M. Constraint based frequent pattern mining for generalized query templates from Web log[J]. International journal of engineering science & technology, 2011, 2(11):17-33.
[21] CAO H, JIANG D, PEI J, et al. Context-aware query suggestion by mining click-through and session data[C]//Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2008:875-883.
[22] TEEVAN J, DUMAIS S T, LIEBLING D J. To personalize or not to personalize:modeling queries with variation in user intent[C]//Proceeding of the 31th annual international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2008:163-170.
[23] CHUANG S L, CHIEN L F. Towards automatic generation of query taxonomy:a hierarchical query clustering approach[C]//Proceedings of the 2002 IEEE international conference on data mining. Washington, DC:IEEE Computer Society, 2002:75-82.
[24] PARK J Y, O-HARE N, SCHIFANELLA R, et al. A large-scale study of user image search behavior on the web[C]//Proceedings of the 33rd annual ACM conference on human factors in computing systems. New York:ACM, 2015:985-994.
[25] LE D T, BERNARDI R. Query classification using topic models and support vector machine[C]//Proceedings of ACL 2012 student research workshop. Stroudsburg:Association for Computational Linguistics, 2013:19-24.
[26] ZHAI H, GUO J, WU Q, et al. Query classification based on regularized correlated topic model[C]//Proceedings of the 2009 IEEE/WIC/ACM international joint conference on Web intelligence and intelligent agent technology. Washington, DC:IEEE Computer Society, 2009:552-555.
[27] ZHANG C W, FAN W, DU N, et al.. Mining user intentions from medical queries:a neural network based heterogeneous jointly modeling approach[C]//Proceedings of the 25th international conference on World Wide Web. The Republic and Canton of Geneva, Switzerland:International World Wide Web Conferences Steering Committee, 2016:1373-1384.
[28] HASHEMI S H, WILLIAMS K, KHOLY A E, et al.Measuring user satisfaction on smart speaker intelligent assistants using intent sensitive query embeddings[C]//Proceedings of the 2018 ACM on conference on information and knowledge management. New York:ACM, 2018:1183-1192.
[29] DOU S, SUN J T, YANG Q, et al. Building bridges for web query classification[C]//Proceeding of the 29th annual international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2006:131-138.
[30] KONISHI T, OHWA T, FUJITA S, et al. Extracting search query patterns via the pairwise coupled topic model[C]//Proceedings of the 9th ACM international conference on Web search and data mining. New York:ACM, 2016:655-664.
[31] 郭程, 白宇, 郑剑夕, 等. 一种无指导的子主题挖掘方法[J]. 中文信息学报, 2016(1):50-55.
[32] WU B, XIONG C Y, SUN M S, et al. Query suggestion with feedback memory network[C]//Proceedings of the 27th international conference on World Wide Web. The Republic and Canton of Geneva, Switzerland:International World Wide Web Conferences Steering Committee, 2018:1563-1571.
[33] WANG Z, WANG F, WANG H, et al. Unsupervised head-modifier detection in search queries[J]. ACM transactions on lnowledge discovery from data, 2016, 11(2):1-28.
[34] DUAN H, ZHAI C X. Mining coordinated intent representation for entity search and recommendation[C]//Proceedings of the 24th ACM international on conference on information and knowledge management. New York:ACM, 2015:333-342.
[35] 冯晓华, 陆伟, 张晓娟. 检索结果多样化研究综述[J]. 情报学报, 2015, 34(7):776-784.
[36] LIU P Q, AZIMI J, ZHANG R f, et al. Contextual query intent extraction for paid search selection[C]//Proceedings of the 24th international conference companion on World Wide Web. New York:ACM, 2015:71-72.
[37] SEOCK Y K, CHEN Y. Website evaluation criteria among US college student consumers with different shopping orientations and Internet channel usage[J]. International journal of consumer studies, 2007, 31(3):204-212.
[38] MALIK A, MAHMOOD K. Web search behavior of university students:a case study at university of the Punjab[J]. Webology, 2009, 6(2):1-13.
[39] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th international conference on neural information processing systems. New York:Curran Associates Inc.,2013:3111-3119.
[40] TABATABAI D, SHORE B M. How experts and novices search the Web[J]. Library & information science research, 2005, 27(2):222-248.
[41] SAVOLAINEN R. Network competence and information seeking on the Internet:from definitions towards a social cognitive model[J]. Journal of documentation, 2002, 58(2):211-226.
[42] SONG C, BARABASI A L. Limits of predictability in human mobility[J]. Science, 2010, 327(5968):1018-1021.
[43] MIHALCEA R. TextRank:bringing order into texts[C]//Proceeding of 2004 conference on empirical methods in natural language processing. Barcelona:ACL, 2004:404-411.
[44] KOUTRIKA G, IOANNIDIS Y. Rule-based query personalization in digital libraries[J]. International journal on digital libraries, 2004, 4(1):60-63.
[45] ROUSSEEUW P J. Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J]. Journal of computational & applied mathematics, 1999, 20(20):53-65.
[46] 王晓娜. 最小省力原则与情报检索系统的可接近性[J]. 情报科学, 2000, 18(2):135-136.
Outlines

/