Semantic Retrieval Technology of Academic Resources Based on Word Embedding Extension

  • Wang Renwu ,
  • Chen Chuanbao ,
  • Meng Xianru
Expand
  • Department of Information Management, Faculty of Economics and Management, East China Normal University, Shanghai 200241

Received date: 2018-04-09

  Revised date: 2018-06-14

  Online published: 2018-10-05

Abstract

[Purpose/significance] Based on the statistical method, the paper explored the semantic retrieval technology based on word embedding expansion to enhance the semantic retrieval ability of academic resources.[Method/process] Using Natural Language Processing and text mining technology, the paper preprocessed the collected academic resources (mainly academic papers) metadata, combined the Word2vec word embedding generation tool and the elasticsearch full text retrieval engine to build semantic retrieval system, and explored the semantic retrieval of academic resources.[Result/conclusion] The method proposed in this paper can effectively improve the retrieval effect of academic information, and it realizes the semantic retrieval of academic resources to a certain extent, and could provide reference for further research on the follow-up semantic retrieval.

Cite this article

Wang Renwu , Chen Chuanbao , Meng Xianru . Semantic Retrieval Technology of Academic Resources Based on Word Embedding Extension[J]. Library and Information Service, 2018 , 62(19) : 111 -119 . DOI: 10.13266/j.issn.0252-3116.2018.19.014

References

[1] 王洁慧. 高校科研用户对图书馆一站式资源发现平台的功能需求分析[J]. 情报理论与实践,2014(12):95-98,80.
[2] FURNAS G W. The vocabulary problem in human-system communication[J]. Communications of the ACM, 1987, 30(11):964-971.
[3] MOLDOVAN D I, MIHALCEA R. Using wordnet and lexical operators to improve Internet searches[J]. IEEE Internet computing, 2000, 4(1):34-43.
[4] 高雪霞,炎士涛. 基于WordNet词义消歧的语义检索研究[J]. 湘潭大学自然科学学报,2017(2):118-121.
[5] 王李冬,张慧熙. 基于HowNet的微博文本语义检索研究[J]. 情报科学,2016(9):134-137.
[6] BLEI D M, Ng A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003(3):993-1022.
[7] BLEI D M, LAFFERTY J D. Correction:a correlated topic model of science[J]. Statistics, 2007, 1(1):17-35.
[8] 刘启华. 基于LDA的文本语义检索模型[J]. 情报科学, 2014(8):38-43,55.
[9] GOOGLE. Word2vec[EB/OL].[2017-08-26].https://code.google.com/archive/p/word2vec/.
[10] MIKOLOV T.Word2vec[EB/OL].[2017-08-26].https://github.com/tmikolov/word2vec.
[11] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].[2018-06-17].https://arxiv.org/pdf/1301.3781v3.pdf.
[12] 范桥青, 方钰. 面向健康问答社区的语义检索技术研究与分析[J]. 电子技术与软件工程, 2017(2):202-204.
[13] 刘梦兰, 刘斌, 彭智勇. 基于词向量的专利自动扩展查询研究[J]. 计算机工程与科学, 2017(12):2297-2305.
[14] 许稳堂. 基于词向量的微博检索系统研究与实现[D]. 上海:东华大学, 2017.
[15] STANFORD.GLOVE[EB/OL].[2017-08-26].https://nlp.stanford.edu/projects/glove/.
[16] 陈国华, 汤庸, 许玉赢,等. 基于词向量的学术语义搜索研究[J]. 华南师范大学学报(自然科学版), 2016, 48(3):53-58.
[17] GOOGLE[EB/OL].[2018-06-17].https://research.google.com/semanticexperiences/.
[18] 张榕. 术语定义抽取、聚类与术语识别研究[D]. 北京:北京语言大学, 2006.
[19] ELASTICSEARCH[EB/OL].[2017-08-26]. https://www.elastic.co/cn/.
[20] 张孝飞,孔繁秀.基于语义概念分析的科技文献检索研究[J].情报理论与实践,2016,39(8):115-118.
[21] PAGE L,BRIN S,MOTWANI R,et al. The pagerank cita-tion ranking:bringing order to the Web[R]. Stanford InfoLab, 1999.
[22] 百度百科. 2017年最新版《中文核心期刊要目总览》[EB/OL].[2017-08-26]. https://wenku.baidu.com/view/15c20df10d22590102020740be1e650e52eacfa4.html.
Outlines

/