图书情报工作 ›› 2022, Vol. 66 ›› Issue (17): 35-46.DOI: 10.13266/j.issn.0252-3116.2022.17.004

• 专题:网络信息资源保存与利用研究 • 上一篇    下一篇

基于领域本体的政府网站网页专题知识库构建——以“新冠疫苗科普”专题为例

黄新平, 潘荣壮, 毛英豪, 朱思媛, 徐嵩嵚   

  1. 吉林大学商学与管理学院 长春 130012
  • 收稿日期:2022-04-09 修回日期:2022-06-19 出版日期:2022-09-05 发布日期:2022-09-09
  • 作者简介:黄新平,副教授,博士,硕士生导师,E-mail:hxp0730@163.com;潘荣壮,本科生;毛英豪,硕士研究生;朱思媛,硕士研究生;徐嵩嵚,硕士研究生。
  • 基金资助:
    本文系国家社会科学基金青年项目"基于云计算的政府网站网页在线归档与开发利用研究"(项目编号:18CTQ040)研究成果之一。

Construction of Subject Knowledge Base of Government Website Pages Based on Domain Ontology:
Taking the Subject of “COVID-19 Vaccine Science Popularization” as an Example

Huang Xinping, Pan Rongzhuang, Mao Yinghao, Zhu Siyuan, Xu Songqin   

  1. School of Business and Management, Jilin University, Changchun 130012
  • Received:2022-04-09 Revised:2022-06-19 Online:2022-09-05 Published:2022-09-09

摘要: [目的/意义]从知识管理的视角,以定题采集获取的大量孤立分散的政府网站网页为知识源构建相应的专题知识库,帮助公众从海量网络存档资源中快速高效地获取所需的关键信息和精准知识。[方法/过程]基于网页定题采集、自然语言处理、领域本体、知识推理等技术,提出包含专题知识源、知识获取、知识表示、知识推理、知识服务等流程的"新冠疫苗科普"专题知识库构建方法。首先,设计网络爬虫获取专题型网页文本数据,利用混合方法从中抽取领域概念知识。其次,通过定义本体类及类间层次结构、对象属性、数据属性以及添加实例来建构领域本体并对其中的知识规则进行形式化处理,从而完成专题知识库构建。最后,利用Protégé软件及其插件、知识推理等方法实现"新冠疫苗科普"专题知识库的语义知识检索、本体可视化查询与知识问答服务。[结果/结论]研究结果表明,所构建的专题知识库具有较好的推理分析功能,可以有效实现新冠疫苗科普知识的精准获取,其应用对提高新冠疫苗科普效果具有重要的现实意义。

关键词: 领域本体, 专题知识库, 新冠疫苗科普, 网络信息资源采集

Abstract: [Purpose/Significance] From the perspective of knowledge management, this paper takes a large number of isolated and scattered government Web pages collected by selected topics as the knowledge source to construct the corresponding subject knowledge base, the aim is to help the public quickly and efficiently obtain the required key information and precise knowledge from the massive Web archiving resources.[Method/Process] Based on the technologies of Web crawlers of selected topics, natural language processing, domain ontology and knowledge reasoning, a method to construct" COVID-19 vaccine science popularization" knowledge base was proposed, which includes subject knowledge source, knowledge acquisition, knowledge representation, knowledge reasoning, and knowledge service. Firstly, design a Web crawler to obtain text data for thematic Web pages, and use a hybrid method to extract domain concept knowledge from it. Then, the domain ontology was constructed by defining ontology classes and hierarchical structure between classes, object attributes, data attributes, and adding instances, meanwhile, the knowledge rules were formalized, so as to complete the construction of subject knowledge base. Finally, Protégé software and its plug-in units, knowledge reasoning and other methods were used to realize the semantic knowledge retrieval, ontology visualization query and knowledge Q&A service of "COVID-19 vaccine science popularization" knowledge base.[Result/Conclusion] The research results show that the subject knowledge base has good reasoning and analysis functions, which can effectively realize the accurate acquisition of knowledge in COVID-19 vaccine science popularization. Its application has important practical significance for improving the effect of COVID-19 vaccine science popularization.

Key words: domain ontology, subject knowledge base, COVID-19 vaccine science popularization, Web information resource collection

中图分类号: