图书情报工作 ›› 2021, Vol. 65 ›› Issue (9): 79-88.DOI: 10.13266/j.issn.0252-3116.2021.09.009

• 情报研究 • 上一篇    下一篇

基于日志挖掘的学术搜索困难度量方法研究

陈翀1, 王思炜1, 梁冰2   

  1. 1 北京师范大学政府管理学院 北京 100875;
    2 中国科学技术信息研究所 北京 100038
  • 收稿日期:2020-11-09 修回日期:2021-01-25 出版日期:2021-05-05 发布日期:2021-06-02
  • 作者简介:陈翀(ORCID:0000-0002-9704-1575),教授,博士,E-mail:chenchong@bnu.edu.cn;王思炜(ORCID:0000-0002-8646-9680),硕士研究生;梁冰(ORCID:0000-0002-7622-6618),高级工程师,博士。

Research on Difficulty Measurement Method in Academic Search Based on Log Mining

Chen Chong1, Wang Siwei1, Liang Bing2   

  1. 1 School of Government Management, Beijing Normal University, Beijing 100875;
    2 Institute of Scientific and Technical Information of China, Beijing 100038
  • Received:2020-11-09 Revised:2021-01-25 Online:2021-05-05 Published:2021-06-02

摘要: [目的/意义] 用户检索中经常面临不同程度的信息搜寻困难,为更好地理解用户需求、改进检索系统,需要一种简洁有效的方法度量信息搜寻的困难程度。[方法/过程] 将用户为查询而付出的行为及时间代价作为其信息搜寻困难的体现。按照用户在会话中的行为模式划分会话类型,将查询需求被满足且代价最小的会话类型作为比较基准,用基准会话的代价衡量其他会话类型的困难程度。为优化代价的表达模型,对搜寻代价的行为指标进行相关性检验,用因子分析选择独立性、区分度好的行为特征进行建模。以国家科技图书文献中心(NSTL)日志和搜狗日志为数据集比较学术搜索与通用搜索环境,以及不同会话类型所代表的探索过程中,用户的信息搜寻困难度。[结果/结论] 在本文所度量的两种搜索系统中,用户面临的信息搜寻困难度分别为2.30和1.57,学术搜索中的困难高于通用搜索。在两种体现学术探索过程的会话中,困难度分别为2.35和4.13。本文提出的方法可以用简单的数值来概括具有多种影响因素的搜索困难,并能用于不同类型会话和搜索环境,丰富了检索系统的评估手段。

关键词: 信息搜索困难, 搜索困难度量, 搜索代价, 学术搜索, 会话类型

Abstract: [Purpose/significance] Users often faced different levels of information searching difficulties in search. In order to better understand user needs and improve the retrieval system, a concise and effective method was needed to measure the difficulty of searching for information.[Method/process] This study took the cost of effort on time and behavior for queries as manifestation of users' information seeking difficulty. The session type was divided according to the user's behavior pattern in the session, the session type with the least cost and the query requirement was satisfied as the comparison baseline, and the cost of the baseline session was used to measure the difficulty of other session types. In order to optimize the expression model of the cost, the correlation test of the behavioral indicators of the search cost was carried out, and the behavioral characteristics with good independence and discrimination were selected by factor analysis for modeling. Using National Science and Technology Library (NSTL) logs and Sogou logs as data sets to compare the difficulty faced by users in both academic search and general search environments, as well as during the exploration process represented by and different session types.[Result/conclusion] In the two search systems measured in this paper, the information search difficulty faced by users is 2.30 and 1.57 respectively, and the difficulty in academic search is higher than that in general search. In the two sessions that embodied the process of academic exploration, the difficulty levels were 2.35 and 4.13 respectively. The method proposed in this paper can use simple numerical values to summarize the search difficulties with multiple influencing factors, and can be used in different types of sessions and search environments, enriching the evaluation methods of the retrieval system.

Key words: information search difficulties, search difficulty measurement, search cost, academic search, session types

中图分类号: