收稿日期: 2015-01-15
修回日期: 2015-02-18
网络出版日期: 2015-03-05
基金资助
本文系"十二五"国家科技支撑计划项目"技术创新服务平台关键技术与应用示范"(项目编号:2011BAH30B00)研究成果之一.
Research on Key Technologies and Application of Intelligent Search Engine
Received date: 2015-01-15
Revised date: 2015-02-18
Online published: 2015-03-05
[目的/意义]技术创新服务平台的建设中需要智能搜索引擎技术,智能搜索引擎技术的内涵或者说重点在于自动语义标注.技术创新服务平台上对搜索引擎的要求,与大众的搜索引擎的需求还是不同的,处理的对象主要是专业领域的文本,通过语义标注技术,能快速对企业文档进行语义化和结构化组织,从而为企业提供精准的知识服务.[方法/过程]针对专业领域语义标注的相关问题,在进行深入研究与探讨的基础上,将语义标注理解为是对一组文档资源进行组织语义化的过程,提出利用结构化语义概念资源或集合对数字化文本进行自动标引的方法,并根据概念实体出现频次、位置和关系等因素,自动抽取相关语义概念集合,实现相关文本的语义内容的自动标注.[结果/结论]评价语义标注相关实验的效果,展示语义标注的具体应用场景.同时,体现领域本体与语义标注语料不断更新、进化、形成互动的过程,旨在为专业领域的语义自动标注及智能搜索引擎的构建提供有益的参考.
刘耀 , 郑德举 , 潘希阳 , 黄毅 . 智能搜索引擎关键技术及应用研究[J]. 图书情报工作, 2015 , 59(5) : 113 -118 . DOI: 10.13266/j.issn.0252-3116.2015.05.018
[Purpose/significance] The construction of Technological Innovation Service Platform is heavily reliant on intelligent search engine, and the key lies in automatic semantic annotation. While the general search engine could not fully fill the requirements the platform asks, and it mainly deals with texts in professional fields. With semantic annotation technology, we can quickly get the documents of an enterprise semantically organized and structured so as to provide precise knowledge services to users. [Method/process] This paper conducts an in-depth research towards the issues related to key technologies and application of intelligent search engine, based on the fact that semantic annotation can be understood as the semantic organization of a set of documents. Therefore, this paper proposes a method to automatically annotate digital text fragments by extracting some key concepts to form a concept set based on occurrence frequencies, positions and relations between concepts or instances, with the help of structural semantic concepts resources or collections. [Result/conclusion] Then, we evaluated the experiment result, and conducted application research in automatic composition. At the same time, the update and evolution of ontology and semantically annotated fragments form a virtuous cycle of continuous process improvement. This paper aims to provide a useful reference to the automatic semantic annotation for professional literature.
[1] Liu Yao, Sui Zhifang, Zhao Qingliang, et al. On automatic construction of medical ontology concept's description architecture[J].International Journal of Innovative Computing, Information and Control, 2012,8(5):3601-3616.
[2] Liu Yao, Chen Xuefei, Li Sujian, et al. A semantic analyzing method in the field of technological literature[J]. ICIC Express Letters, 2011, 5(9):3225-3230.
[3] Liu Yao, Zhao Yazhen. Research on ancient literature corpus creation and development of chinese traditional medicine[J]. ICIC Express Letters, 2009,3(4B):1227-1232.
[4] Sui Zhifang, Liu Yao, Hu Yongwei. Extracting hyponymy relation between chinese terms based on term types' commonality[J]. ICIC Express Letters, 2009, 3(4):1233-1238.
[5] Kim H L, Scerri S, Breslin J G, et al. The state of the art in tag ontologies: A semantic model for tagging and folksonomies[C]//Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications. Berlin:Dublin Core Metadata Initiative, 2008: 128-137.
[6] Specia L, Motta E. Integrating folksonomies with the Semantic Web[M]//The Semantic Web: research and applications. Springer Berlin Heidelberg, 2007: 624-639.
[7] Huang C C, Chuang S L, Chien L F. Using a Web-based categorization approach to generate thematic metadata from texts[J]. ACM Transactions on Asian Language Information Processing (TALIP), 2004, 3(3): 190-212.
[8] McCandless M, Hatcher E, Gospodnetic O. Lucene in action: Covers apache lucene 3.0[M]. Connecticut: Manning Publications Co., 2010:86-89.
[9] 李鹏,王斌,石志伟,等.Tag-TextRank:一种基于Tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351.
[10] Pérez-Iglesias J, Pérez-Agüera J R, Fresno V, et al. Integrating the probabilistic models BM25/BM25F into Lucene[J/OL].[2015-02-01].http: //arxiv.org/abs/0911.5046.
/
〈 |
|
〉 |