知识组织

融合篇章结构的文本知识网络构建

  • 刘耀 ,
  • 张越 ,
  • 叶璐
展开
  • 1. 中国科学技术信息研究所 北京 100038;
    2. 密歇根州立大学 东兰辛 489132;
    3. 北京大学软件与微电子学院 北京 100871
刘耀(ORCID:0000-0003-3729-3866),研究员,博士,E-mail:liuy@istic.ac.cn;张越(ORCID:0000-0003-2153-6536),博士研究生;叶璐(ORCID:0000-0001-5571-9440),硕士研究生。

收稿日期: 2021-06-01

  修回日期: 2021-09-06

  网络出版日期: 2021-11-19

基金资助

本文系国家重点研发计划"基于大数据的科技咨询技术与服务平台研发"(项目编号:2018YFB143502)和国家社会科学基金一般项目"数字资源知识共享与知识再利用模式与方法研究"(项目编号:21BTQ011)研究成果之一。

Construction of Text Knowledge Network Integrating Discourse Structure

  • Liu Yao ,
  • Zhang Yue ,
  • Ye Lu
Expand
  • 1. Institute of Scientific and Technical Information of China(ISTIC), Beijing 100038;
    2. Michigan State University, East Lansing 489132;
    3. School of Software & Microelectronics, Peking University, Beijing 100871

Received date: 2021-06-01

  Revised date: 2021-09-06

  Online published: 2021-11-19

摘要

[目的/意义] 文本向量化处理是文本挖掘、信息检索、情感分析等领域必须要经过的预处理过程,使节点向量包含丰富且有效的语义及结构信息是目前亟待解决的问题。[方法/过程] 首先对科技政策类的文本特征进行分析,分别依照概念与概念间关系的分类体系,用BiLSTM-CRF算法和SVM分别实现对概念与概念关系进行自动标引,在特征工程同时融入基本特征和句法语义特征,在识别准确性和效率方面有显著提升。并提出结合推理知识的概念知识网络及进一步融合篇章结构的知识网络构建方法。[结果/结论] 基于此知识网络模型,实现一种能够融合节点语义、拓扑结构以及类别标签信息的网络表示学习模型,能够充分挖掘并表示文本的语义及结构信息,并通过可视化和实验验证所提方法的有效性。

本文引用格式

刘耀 , 张越 , 叶璐 . 融合篇章结构的文本知识网络构建[J]. 图书情报工作, 2021 , 65(21) : 118 -130 . DOI: 10.13266/j.issn.0252-3116.2021.21.019

Abstract

[Purpose/significance] Text vectorization is a necessary pre-processing process in the fields of text mining, information retrieval, sentiment analysis, etc. It is an urgent problem to make node vectors contain rich and effective semantic and structural information.[Method/process] At first, this paper analyzed the text characteristic of science and technology policy. According to the classification system of the concept and the relationship between the concepts, this paper used BiLSTM-CRF algorithm and SVM respectively to extract index the concepts and their relations automatically. Meanwhile, the model integrated basic characteristics and syntactic semantic features in feature engineering, leading to a boost in recognition accuracy and efficiency. This article also put forward the concept knowledge network combining reasoning knowledge and the knowledge network construction method of furtherly integrating discourse structure.[Result/conclusion] Based on this knowledge network model, this paper implements a network representation learning model that can integrate node semantics, topology structure and category label information. It can fully exploit and represent text semantic and structural information, and through the visualization and experiment to verify the effectiveness of the proposed method.

参考文献

[1] 张晓艳, 王挺, 陈火旺. 命名实体识别研究[J]. 计算机科学, 2005, 32(4):44-48.
[2] COLLINS M, SINGER Y. Unsupervised models for named entity classification[C]//1999 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora. Stroudsburg:ACL, 1999.
[3] BIKEL D M, SCHWARTZ R, WEISCHEDEL R M. An algorithm that learns what's in a name[J]. Machine learning, 1999, 34(1-3):211-231.
[4] CURRAN J R, CLARK S. Language independent NER using a maximum entropy tagger[C]//Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003. Stroudsburg:ACL, 2003:164-167.
[5] MCNAMEE P, MAYFIELD J. Entity extraction without language-specific resources[C]//Proceedings of Association for Computational Linguistics. Stroudsburg:ACL, 2002:1-4.
[6] MCCALLUM A, LI W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons[C]//Association for computational linguistics. Stroudsburg:ACL, 2003:188-191.
[7] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of machine learning research, 2011, 12(Aug):2493-2537.
[8] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. Computer science, 2015:1-10.[2021-08-27]. https://arxiv.org/pdf/1508.0-1991.pdf.
[9] PHAM T H, LE-HONG P. End-to-end recurrent neural network models for vietnamese named entity recognition:Word-level vs. character-level[C]//International conference of the pacific association for computational linguistics. Singapore:Springer, 2017:219-232.
[10] MA X, HOVY E. End-to-end sequence labeling via bi-directional lstm-cnns-crf[J]. arXiv preprint, 2016, arXiv:1603.01354.
[11] WANG W, CHANG L, BIN C, et al. ESN-NER:entity storage network using attention mechanism for chinese NER[C]//Information processing and cloud computing. New York:ACM, 2019:1-8.
[12] 余传明, 黄婷婷, 林虹君, 等. 基于标签迁移和深度学习的跨语言实体抽取研究[J]. 现代情报, 2020, 40(12):3-16,35.
[13] BRIN S. Extracting patterns and relations from the world wide web[C]//International Workshop on the World Wide Web and databases. Berlin:Springer, 1998:172-183.
[14] HASEGAWA T, SEKINE S, GRISHMAN R. Discovering relations among named entities from large corpora[C]//Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Stroudsburg:ACL, 2004:415.
[15] PIASECKI M, RAMOCKI R, KALINSKI M. Information spreading in expanding wordnet hypernymy structure[C]//Proceedings of the international conference recent advances in natural language processing. New York:ACM, 2013:553-561.
[16] PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk:online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2014:701-710.
[17] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems. MA:MIT Press, 2013:3111-3119.
[18] 涂存超, 杨成, 刘知远, 等. 网络表示学习综述[J]. 中国科学:信息科学, 2017(8):32-48.
[19] GROVER A, LESKOVEC J. Node2vec:scalable feature learning for networks[C]//Proceedings of the 22th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2016:855-864.
[20] WANG D, CUI P, ZHU W. Structural deep network embedding[C]//Proceedings of the 22th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2016:1225-1234.
[21] YANG C, LIU Z, ZHAO D, et al. Network representation learning with rich text information[C]//International joint conference on knowleclge discovery and data mining. New York:ACM, 2015:2111-2117.
[22] TU C, ZHANG Z, LIU Z, et al. TransNet:translation-based network representation learning for social relation extraction[C]//IJCAI. New York:ACM, 2017:2864-2870.
[23] BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in neural information processing systems. New York:ACM, 2013:2787-2795.
[24] 刘丹丹, 彭成, 钱龙华, 等. 词汇语义信息对中文实体关系抽取影响的比较[J]. 计算机应用, 2012, 32(8):2238-2244.
[25] 刘向, 马费成, 陈潇俊, 等. 知识网络的结构与演化——概念与理论进展[J]. 情报科学, 2011(6):801-809.
[26] PAN S, JIA W, ZHU X, et al. Tri-party deep network representation[C]//International joint conference on Artificial Intelligence. New York:ACM,2016:1895-1901.
文章导航

/