INFORMATION RESEARCH

Research on Difficulty Measurement Method in Academic Search Based on Log Mining

  • Chen Chong ,
  • Wang Siwei ,
  • Liang Bing
Expand
  • 1 School of Government Management, Beijing Normal University, Beijing 100875;
    2 Institute of Scientific and Technical Information of China, Beijing 100038

Received date: 2020-11-09

  Revised date: 2021-01-25

  Online published: 2021-06-02

Abstract

[Purpose/significance] Users often faced different levels of information searching difficulties in search. In order to better understand user needs and improve the retrieval system, a concise and effective method was needed to measure the difficulty of searching for information.[Method/process] This study took the cost of effort on time and behavior for queries as manifestation of users' information seeking difficulty. The session type was divided according to the user's behavior pattern in the session, the session type with the least cost and the query requirement was satisfied as the comparison baseline, and the cost of the baseline session was used to measure the difficulty of other session types. In order to optimize the expression model of the cost, the correlation test of the behavioral indicators of the search cost was carried out, and the behavioral characteristics with good independence and discrimination were selected by factor analysis for modeling. Using National Science and Technology Library (NSTL) logs and Sogou logs as data sets to compare the difficulty faced by users in both academic search and general search environments, as well as during the exploration process represented by and different session types.[Result/conclusion] In the two search systems measured in this paper, the information search difficulty faced by users is 2.30 and 1.57 respectively, and the difficulty in academic search is higher than that in general search. In the two sessions that embodied the process of academic exploration, the difficulty levels were 2.35 and 4.13 respectively. The method proposed in this paper can use simple numerical values to summarize the search difficulties with multiple influencing factors, and can be used in different types of sessions and search environments, enriching the evaluation methods of the retrieval system.

Cite this article

Chen Chong , Wang Siwei , Liang Bing . Research on Difficulty Measurement Method in Academic Search Based on Log Mining[J]. Library and Information Service, 2021 , 65(9) : 79 -88 . DOI: 10.13266/j.issn.0252-3116.2021.09.009

References

[1] LI X, DE RIJKE M. Characterizing and predicting downloads in academic search[J]. Information processing and management, 2019, 56(3):394-407.
[2] LI X, SCHIJVENAARS B J, DE RIJKE M, et al. Investigating queries and search failures in academic search[J]. Information processing and management, 2017, 53(3):666-683.
[3] JIANG J, HE D, ALLAN J, et al. Searching, browsing, and clicking in a search session:changes in user behavior by task and over time[C]//International ACM SIGIR conference on research and development in information retrieval. USA:ACM, 2014:607-616.
[4] DOGAN R I, MURRAY G C, NEVEOL A, et al. Understanding PubMed? user search behavior through log analysis[EB/OL].[2021-03-06]. https://www.researchgate.net/publication/41435872_Understanding_PubMedR_user_search_behavior_through_log_analysis.
[5] KHABSA M, WU Z, GILES C L, et al. Towards better understanding of academic search[C]//Joint conference on digital libraries. USA:ACM,2016:111-114.
[6] ARAMPATZIS A, KAMPS J. A study of query length[C]//International ACM SIGIR conference on research and development in information retrieval. USA:ACM,2008:811-812.
[7] XIE K, YU H, CEN R, et al. Using log mining to analyze user behavior on search engine[J]. Frontiers of electrical and electronic engineering in China, 2011, 7(2):254-260.
[8] KAMVAR M, KELLAR M, PATEL R, et al. Computers and iphones and mobile phones, oh my!:a logs-based comparison of search users on different devices[C]//The Web conference. USA:ACM,2009:801-810.
[9] LIU Y, MIAO J, ZHANG M, et al. How do users describe their information need:query recommendation based on snippet click model[J]. Expert systems with applications, 2011, 38(11):13847-13856.
[10] WANG X, FANG Z, SUN X, et al. Usage patterns of scholarly articles on Web of Science:a study on Web of science usage count[J]. Scientometrics, 2016, 109(2):917-926.
[11] YOO I, MOSA A S. Analysis of PubMed user sessions using a full-day PubMed query log:a comparison of experienced and nonexperienced PubMed users[EB/OL].[2021-03-06]. https://medinform.jmir.org/2015/3/e25/.
[12] WILSON T D. Human information behavior[J]. Information science, 2000, 3(2):49-56.
[13] 李月琳,樊振佳,孙星明.探索式搜索任务属性与信息搜索行为的关系研究[J].情报资料工作,2017(1):54-61.
[14] ARGUELLO J. Predicting search task difficulty[C]//European conference on information retrieval. Amsterdam:Springer,2014:88-99.
[15] COLE M J, HENDAHEWA C, Belkin N J, et al. User activity patterns during information search[J]. ACM transactions on information systems, 2015, 33(1):1-39.
[16] KIM J. Task difficulty as a predictor and indicator of Web searching interaction[C]//Human factors in computing systems. USA:ACM, 2006:959-964.
[17] MARCHIONINI G. Information-seeking strategies of novices using a full-text electronic encyclopedia[J]. Journal of the Association for Information Science and Technology, 1989, 40(1):54-66.
[18] AULA A, KHAN R, GUAN Z, et al. How does search behavior change as search becomes more difficult[C]//Human factors in computing systems. USA:ACM, 2010:35-44.
[19] LIU J, LIU C, COLE M, et al. Exploring and predicting search task difficulty[C]//Conference on information and knowledge management. USA:ACM, 2012:1313-1322.
[20] HASSAN A, WHITE R W, DUMAIS S T, et al. Struggling or exploring?:disambiguating long search sessions[C]//Web search and data mining. USA:ACM,2014:53-62.
[21] ODIJK D, WHITE R W, AWADALLAH A H, et al. Struggling and success in Web search[C]//Conference on information and knowledge management. USA:ACM,2015:1551-1560.
[22] LIU J, COLE M J, LIU C, et al. Search behaviors in different task types[C]//ACM/IEEE joint conference on digital libraries.USA:ACM,2010:69-78.
[23] LIU J, SARKAR S, SHAH C, et al. Identifying and predicting the states of complex search tasks[C]//Conference on human information interaction and retrieval. USA:ACM, 2020:193-202.
[24] JANSEN B J, SPINK A, BLAKELY C, et al. Defining a session on Web search engines[J]. Journal of the Association for Information Science and Technology, 2007, 58(6):862-871.
[25] BEITZEL S M, JENSEN E C, CHOWDHURY A, et al. Hourly analysis of a very large topically categorized Web query log[C]//International ACM SIGIR conference on research and development in information retrieval. USA:ACM,2004:321-328.
[26] JANSEN B J. The methodology of search log analysis[M]//JANSEN B J, SPINK A, TAKSA I. Handbook of research on Web log analysis. Hershey, PA:Idea Group Inc.,2008.
[27] DOWNEY D, DUMAIN S T, HORVITZ E, et al. Models of searching and browsing:languages, studies, and applications[J]. Journal of the American Society for Information Science and Technology, 2007,58(6):862-871.[28 HE D, GOKER A. Detecting session boundaries from Web user logs[C]//Proceedings of the BCS-IRSG 22nd annual colloquium on information retrieval research.BCS Learning & Development Ltd, 2000:57-66.[29姜婷婷,王淼,高慧琴.OPAC系统用户搜索行为日志分析——以武汉大学图书馆为例[J].图书情报知识,2015(5):46-56.
[30] 张鹏翼,周翔,王军.商品检索中的多任务识别与分析[J].现代图书情报技术,2016(3):1-7.
[31] JIANG T, CHI Y, GAO H. A clickstream data analysis of Chinese academic library OPAC users' information behavior[J]. Library & information science research,2017,39(3):213-223.
Outlines

/