图书情报工作 ›› 2021, Vol. 65 ›› Issue (21): 118-130.DOI: 10.13266/j.issn.0252-3116.2021.21.019

• 知识组织 • 上一篇    下一篇

融合篇章结构的文本知识网络构建

刘耀1, 张越2, 叶璐3   

  1. 1. 中国科学技术信息研究所 北京 100038;
    2. 密歇根州立大学 东兰辛 489132;
    3. 北京大学软件与微电子学院 北京 100871
  • 收稿日期:2021-06-01 修回日期:2021-09-06 出版日期:2021-11-05 发布日期:2021-11-19
  • 作者简介:刘耀(ORCID:0000-0003-3729-3866),研究员,博士,E-mail:liuy@istic.ac.cn;张越(ORCID:0000-0003-2153-6536),博士研究生;叶璐(ORCID:0000-0001-5571-9440),硕士研究生。
  • 基金资助:
    本文系国家重点研发计划"基于大数据的科技咨询技术与服务平台研发"(项目编号:2018YFB143502)和国家社会科学基金一般项目"数字资源知识共享与知识再利用模式与方法研究"(项目编号:21BTQ011)研究成果之一。

Construction of Text Knowledge Network Integrating Discourse Structure

Liu Yao1, Zhang Yue2, Ye Lu3   

  1. 1. Institute of Scientific and Technical Information of China(ISTIC), Beijing 100038;
    2. Michigan State University, East Lansing 489132;
    3. School of Software & Microelectronics, Peking University, Beijing 100871
  • Received:2021-06-01 Revised:2021-09-06 Online:2021-11-05 Published:2021-11-19

摘要: [目的/意义] 文本向量化处理是文本挖掘、信息检索、情感分析等领域必须要经过的预处理过程,使节点向量包含丰富且有效的语义及结构信息是目前亟待解决的问题。[方法/过程] 首先对科技政策类的文本特征进行分析,分别依照概念与概念间关系的分类体系,用BiLSTM-CRF算法和SVM分别实现对概念与概念关系进行自动标引,在特征工程同时融入基本特征和句法语义特征,在识别准确性和效率方面有显著提升。并提出结合推理知识的概念知识网络及进一步融合篇章结构的知识网络构建方法。[结果/结论] 基于此知识网络模型,实现一种能够融合节点语义、拓扑结构以及类别标签信息的网络表示学习模型,能够充分挖掘并表示文本的语义及结构信息,并通过可视化和实验验证所提方法的有效性。

关键词: 命名实体识别, 关系提取, 神经网络, 表示学习, 篇章结构

Abstract: [Purpose/significance] Text vectorization is a necessary pre-processing process in the fields of text mining, information retrieval, sentiment analysis, etc. It is an urgent problem to make node vectors contain rich and effective semantic and structural information.[Method/process] At first, this paper analyzed the text characteristic of science and technology policy. According to the classification system of the concept and the relationship between the concepts, this paper used BiLSTM-CRF algorithm and SVM respectively to extract index the concepts and their relations automatically. Meanwhile, the model integrated basic characteristics and syntactic semantic features in feature engineering, leading to a boost in recognition accuracy and efficiency. This article also put forward the concept knowledge network combining reasoning knowledge and the knowledge network construction method of furtherly integrating discourse structure.[Result/conclusion] Based on this knowledge network model, this paper implements a network representation learning model that can integrate node semantics, topology structure and category label information. It can fully exploit and represent text semantic and structural information, and through the visualization and experiment to verify the effectiveness of the proposed method.

Key words: named entity recognition, relationship extraction, neural network, representation learning, discourse structure

中图分类号: