[Purpose/significance] Text vectorization is a necessary pre-processing process in the fields of text mining, information retrieval, sentiment analysis, etc. It is an urgent problem to make node vectors contain rich and effective semantic and structural information.[Method/process] At first, this paper analyzed the text characteristic of science and technology policy. According to the classification system of the concept and the relationship between the concepts, this paper used BiLSTM-CRF algorithm and SVM respectively to extract index the concepts and their relations automatically. Meanwhile, the model integrated basic characteristics and syntactic semantic features in feature engineering, leading to a boost in recognition accuracy and efficiency. This article also put forward the concept knowledge network combining reasoning knowledge and the knowledge network construction method of furtherly integrating discourse structure.[Result/conclusion] Based on this knowledge network model, this paper implements a network representation learning model that can integrate node semantics, topology structure and category label information. It can fully exploit and represent text semantic and structural information, and through the visualization and experiment to verify the effectiveness of the proposed method.
Liu Yao
,
Zhang Yue
,
Ye Lu
. Construction of Text Knowledge Network Integrating Discourse Structure[J]. Library and Information Service, 2021
, 65(21)
: 118
-130
.
DOI: 10.13266/j.issn.0252-3116.2021.21.019
[1] 张晓艳, 王挺, 陈火旺. 命名实体识别研究[J]. 计算机科学, 2005, 32(4):44-48.
[2] COLLINS M, SINGER Y. Unsupervised models for named entity classification[C]//1999 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora. Stroudsburg:ACL, 1999.
[3] BIKEL D M, SCHWARTZ R, WEISCHEDEL R M. An algorithm that learns what's in a name[J]. Machine learning, 1999, 34(1-3):211-231.
[4] CURRAN J R, CLARK S. Language independent NER using a maximum entropy tagger[C]//Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003. Stroudsburg:ACL, 2003:164-167.
[5] MCNAMEE P, MAYFIELD J. Entity extraction without language-specific resources[C]//Proceedings of Association for Computational Linguistics. Stroudsburg:ACL, 2002:1-4.
[6] MCCALLUM A, LI W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons[C]//Association for computational linguistics. Stroudsburg:ACL, 2003:188-191.
[7] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of machine learning research, 2011, 12(Aug):2493-2537.
[8] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. Computer science, 2015:1-10.[2021-08-27]. https://arxiv.org/pdf/1508.0-1991.pdf.
[9] PHAM T H, LE-HONG P. End-to-end recurrent neural network models for vietnamese named entity recognition:Word-level vs. character-level[C]//International conference of the pacific association for computational linguistics. Singapore:Springer, 2017:219-232.
[10] MA X, HOVY E. End-to-end sequence labeling via bi-directional lstm-cnns-crf[J]. arXiv preprint, 2016, arXiv:1603.01354.
[11] WANG W, CHANG L, BIN C, et al. ESN-NER:entity storage network using attention mechanism for chinese NER[C]//Information processing and cloud computing. New York:ACM, 2019:1-8.
[12] 余传明, 黄婷婷, 林虹君, 等. 基于标签迁移和深度学习的跨语言实体抽取研究[J]. 现代情报, 2020, 40(12):3-16,35.
[13] BRIN S. Extracting patterns and relations from the world wide web[C]//International Workshop on the World Wide Web and databases. Berlin:Springer, 1998:172-183.
[14] HASEGAWA T, SEKINE S, GRISHMAN R. Discovering relations among named entities from large corpora[C]//Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Stroudsburg:ACL, 2004:415.
[15] PIASECKI M, RAMOCKI R, KALINSKI M. Information spreading in expanding wordnet hypernymy structure[C]//Proceedings of the international conference recent advances in natural language processing. New York:ACM, 2013:553-561.
[16] PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk:online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2014:701-710.
[17] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems. MA:MIT Press, 2013:3111-3119.
[18] 涂存超, 杨成, 刘知远, 等. 网络表示学习综述[J]. 中国科学:信息科学, 2017(8):32-48.
[19] GROVER A, LESKOVEC J. Node2vec:scalable feature learning for networks[C]//Proceedings of the 22th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2016:855-864.
[20] WANG D, CUI P, ZHU W. Structural deep network embedding[C]//Proceedings of the 22th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2016:1225-1234.
[21] YANG C, LIU Z, ZHAO D, et al. Network representation learning with rich text information[C]//International joint conference on knowleclge discovery and data mining. New York:ACM, 2015:2111-2117.
[22] TU C, ZHANG Z, LIU Z, et al. TransNet:translation-based network representation learning for social relation extraction[C]//IJCAI. New York:ACM, 2017:2864-2870.
[23] BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in neural information processing systems. New York:ACM, 2013:2787-2795.
[24] 刘丹丹, 彭成, 钱龙华, 等. 词汇语义信息对中文实体关系抽取影响的比较[J]. 计算机应用, 2012, 32(8):2238-2244.
[25] 刘向, 马费成, 陈潇俊, 等. 知识网络的结构与演化——概念与理论进展[J]. 情报科学, 2011(6):801-809.
[26] PAN S, JIA W, ZHU X, et al. Tri-party deep network representation[C]//International joint conference on Artificial Intelligence. New York:ACM,2016:1895-1901.