Based on Deep Learning Algorithm to Construct the Classifier of Academic Query Intent

Wang Ruixue; Fang Jing; Gui Sisi; Lu Wei; Zhang Xian

doi:10.13266/j.issn.0252-3116.2021.03.012

Library and Information Service >

2021 , Vol. 65 >Issue 3: 93 - 99

DOI: https://doi.org/10.13266/j.issn.0252-3116.2021.03.012

Based on Deep Learning Algorithm to Construct the Classifier of Academic Query Intent

Wang Ruixue ,
Fang Jing ,
Gui Sisi ,
Lu Wei ,
Zhang Xian

Expand

1 School of Information Management, Wuhan University, Wuhan 430072;
2 College of Information Science&Technology, Nanjing Agricultural University, Nanjing 210095;
3 Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072;
4 Baidu Times Network Technology(Beijing) Co., Ltd. Beijing 100085

Received date: 2020-06-17

Revised date: 2020-10-14

Online published: 2021-02-05

Supported by

Fold

Abstract

[Purpose/significance] To find the solutions of automatically identifying search query intent and improve the efficiency of academic search engines. [Method/process] Combining the features of query intent and academic search, we constructed the feature from four aspects, which are the basic descriptive statistics, the special keywords, entity information and the frequency. For the experiments, we examined four types of classifiers which are the Naive Bayes, Logistic regression, SVM, Random Forest and calculated precision, recall and F-measure. A method which is extending the recognition results of academic query intent predicted by Logistic regression algorithm to large-scale data sets and extracting "keyword type" features is proposed to construct a two-layer classifier based on deep learning algorithm for academic query intent recognition. [Result/conclusion] The macro-average F1 value of the two-layer classifier is 0.651, which is superior to other algorithms. This method can effectively balance the precision and recall rate of different academic query intentions. The final second-layer prediction model receives the best classification performance, the score of F1 is 0.783.

Key words： academic query intent; automatic identification; two-layer classification

Cite this article

Wang Ruixue , Fang Jing , Gui Sisi , Lu Wei , Zhang Xian . Based on Deep Learning Algorithm to Construct the Classifier of Academic Query Intent[J]. Library and Information Service, 2021 , 65(3) : 93 -99 . DOI: 10.13266/j.issn.0252-3116.2021.03.012

References

[1] BORNMANN L, RVDIGER M. Growth rates of modern science:a bibliometric analysis based on the number of publications and cited references[J]. Journal of the Association for Information Science and Technology, 2015, 66(11):2215-2222.
[2] 周剑, 王艳, XIE I. 世代特征,信息环境变迁与大学生信息素养教育创新[J]. 中国图书馆学报, 2015, 41(4):25-39.
[3] DONG X, GABRILOVICH E, GEREMY H, et al. Knowledge vault:a web-scale approach to probabilistic knowledge fusion[C]//Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2014:601-610.
[4] 赵蓉英, 陈烨. 学术搜索引擎Google scholar和Microsoft academic search的比较研究[J]. 情报科学, 2014, 32(2):3-6,15.
[5] 胡伶霞. 图书馆OPAC检索中基于词典的查询意图自动识别[J]. 图书馆学研究, 2016(23):72-76.
[6] 李兵. 基于查询意图识别的自适应图书分面检索研究[J]. 图书馆学研究, 2017(15):57-64.
[7] BRODER A. A taxonomy of web search[C]//Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. Tampere:ACM, 2002:3-10.
[8] ROSE D,LEVINSON D. Understanding user goals in web search[C]//Proceedings of the 13th international conference on World Wide Web. New York:ACM, 2004:13-19.
[9] BEITZEL S, JENSEN E, FRIEDER O,et al. Automatic web query classification using labeled and unlabeled training data[C]//Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval.Salvador:ACM, 2005:581-582.
[10] BRENES D, GAYO-AVELLO D, PÉREZ-GONZÁLEZ K. Survey and evaluation of query intent detection methods[C]//Proceedings of the 2009 workshop on web search click data. Barcelona:ACM, 2009:1-7.
[11] LIU Y, ZHANG M, RU L, et al. Automatic query type identification based on click through information[C]//Asia information retrieval symposium. Singapore:Springer, 2006:593-600.
[12] BELKIN N, KELLY D, KIM G, et al. Query length in interactive information retrieval[C]//Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. Toronto:ACM, 2003:205-212.
[13] JANSEN B, BOOTH D, SPINK A. Determining the user intent of web search engine queries[C]//Proceedings of the 16th international conference on World Wide Web. Banff:ACM, 2007:1149-1150.
[14] HERRERA M.R, MOURA E.S, CRISTO M, et al. Exploring features for the automatic identification of user goals in web search[J]. Information processing & management, 2010, 46(2):131-142.
[15] 张晓娟. 查询意图自动分类与分析[D]. 武汉:武汉大学, 2014.
[16] KHABSA M, WU Z, C. GILES L. Towards better understanding of academic search[C]//Joint conference on digital libraries 2016. Newark:ACM, 2016:111-114.
[17] CHANG Y, HE K, YU S, et al. Identifying user goals from Web search results[C]//International conference on Web intelligence Hong Kong:ACM, 2006:1038-1041.
[18] MENDOZA M, ZAMORA J. Identifying the intent of a user query using support vector machines[C]//International symposium on string processing and information Retrieval. Berlin:Springer, 2009:131-142.
[19] GUO J, XU G, CHENG X, et al. Named entity recognition in query[C]//Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. Boston:ACM, 2009:267-274.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References