[目的/意义]针对当前科研人员无法从海量的网络科技信息中及时甄别有情报价值的情报内容的问题,建立一套综合性情报价值计算方法,从而对网络科技信息的情报价值进行计算判断,最终帮助科研人员快速而准确地发现有情报价值的网络科技信息。[方法/过程]综合考虑情报外部特征与文本语义内容特征,利用深度学习(预训练语言模型) BERT方法构建基于文本语义内容特征的情报价值计算模型,利用深度学习模型的预测输出完成打分,并结合基于情报外部特征的原始计算方法得到最终的综合评价得分。[结果/结论]实验结果显示,基于文本语义内容特征的情报价值计算模型可以对情报按照情报价值得分进行有效的星级区分,弥补了基于情报外部特征的原始计算模型中星级区分度差的问题,最终的综合评价结果表明本文提出的情报价值计算模型在实际应用中也能够很好地满足科研人员的需求。
[Purpose/significance] In view of the problem that it's difficult for researchers to find valuable information from large amounts of scientific and technological information in the Web, this paper constructs a comprehensive calculation method for information value. It can calculate the information value of Web technology information and help researchers find Web technology information of information value quickly and accurately.[Method/process] Taking overall consideration of the external feature and textual semantic feature of the information, this paper used deep learning (pretrained language model) BERT to construct information value calculation model based on the textual semantic feature, used the predictive output of the deep learning model to complete the scoring, and combined the original calculation method of the external feature of the information to get the final information value score.[Result/conclusion] The experimental results show that the information value calculation model based on the textual semantic feature can rank the information to different levels according to their information value score, which makes up for the problem of poor star differentiation in the original calculation method only based on the external feature of the information. And the final comprehensive evaluation results show that the information value calculation model proposed in this paper can also meet the needs of researchers in the practical application.
[1] 张智雄, 张晓林, 刘建华, 等. 网络科技信息结构化监测的思路和技术方法实现[J]. 中国图书馆学报, 2014, 40(4):4-15.
[2] 邹益民. 基于对象计算的情报价值判断方法研究[D].北京:中国科学院大学,2013.
[3] 张洋, 张磊. 网络信息资源评价研究综述[J]. 中国图书馆学报, 2010, 36(5):75-89.
[4] 邹益民, 张智雄. 网络科技信息情报价值评价方法综述[J]. 情报杂志, 2014, 33(5):25-30, 59.
[5] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7):1527-1554.
[6] RICHMOND B. Ccccccc. ccc (ten cs) for evaluating Internet resources[J]. Teacher librarian, 1998, 25(5):20.
[7] STOKER D, COOKE A. Evaluation of networked information sources[C]//Proceedings of the 17th international essen symposium. Washington:ERIC, 1994:287-312.
[8] SMITH A G. Criteria for evaluating information resources[J]. Public access computer systems review, 1997, 8(3):1-14.
[9] 赵继海. Internet信息评估:新世纪图书馆员的重要职责[J].大学图书馆学报,2000(5):35-38.
[10] 苏广利. 因特网信息资源评价研究[J]. 情报资料工作, 2001(6):26-28.
[11] PAGE L, BRIN S, MOTWANI R, et al. The pagerank citation ranking:bringing order to the Web[R]. Stanford:Stanford InfoLab, 1999.
[12] KLEINBERG J M. Authoritative sources in a hyperlinked environment[J]. Journal of the ACM, 1999, 46(5):604-632.
[13] 赵玉遂, 许燕, 吴青青, 等. 应用德尔菲法构建网络健康信息质量评价指标体系[J].预防医学, 2018, 30(2):121-124.
[14] 邓胜利, 赵海平. 用户视角下网络健康信息质量评价标准框架构建研究[J]. 图书情报工作, 2017, 61(21):30-39.
[15] 刘建华, 张智雄. 情报重要度的指标体系和计算方法[R]. 北京:中国科学院文献情报中心, 2011.
[16] KARODIYA H,SINGH A P D K. User specific search ranking technique[J]. International research journal of computer science engineering and applications, 2013, 2(1):212-215.
[17] PRINCE S L,NIELSEN M L, DELCAMBRE L, et al. Using semantic components to search for domain-specific documents:an evaluation from the system perspective and the userperspective[J]. Information systems, 2009, 34(8):724-752.
[18] HAN M, QIU X H. Personalized search engineer model[C]//Advanced materials research. Switzerland:Trans Tech Publications Ltd, 2011:1216-1221.
[19] TAMINE-LECHANI L, BOUGHANEM M, ZEMIRLI N. Personalized document ranking:exploiting evidence from multiple user interests for profiling and retrieval[J]. Journal of digital information management, 2008, 6(5):354-366.
[20] 王晓丽, 闫实, 刘占波, 等. 网络信息资源评价指标体系构建分析[J].软件, 2020, 41(5):53-56.
[21] 王晰巍, 张长亮, 韩雪雯, 等.信息生态视角下网络社群信息互动效果评价研究[J]. 情报理论与实践, 2018, 41(11):83-88,62.
[22] BING R. Information filtering algorithm based on feature vector[C]//Proceedings of the 2011 international conference on intelligence science and information engineering. New York:IEEE, 2011:468-471.
[23] VATANI N, SHIRI M E. A personalized information filtering method[J]. International journal of computer science and security, 2012, 6(1):1-8.
[24] DEVLIN J, CHANG M W, LEE K, et al. Bert:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2020-12-25]. https://arxiv.org/abs/1810.04805.
[25] BELTAGY I, LO K, COHAN A. Scibert:a pretrained language model for scientific text[EB/OL].[2020-12-30]. https://arxiv.org/abs/1903.10676.
[26] LEE J, YOON W, KIM S, et al. Biobert:a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4):1234-1240.