[Purpose/Significance] The technique of word separation and named entity recognition in the field of ancient Chinese medical texts is investigated without annotated resources, based on the word separation and named entity recognition model, the word separation is carried out and the language model is trained for the text in the field of Chinese medicine. [Method/Process] In the training process, the study used a multi-task learning framework of entity concept ranking prediction and masked word prediction to effectively incorporate prior conceptual knowledge from the lexicon into the language model, and obtain a language model that integrates the semantics of the discourse with prior knowledge. Starting from the MLM task used in the model training, a text generation task based on the completion type was designed to perform knowledge citation of a single ancient text, traversing all the phrases in the single text and performing full citation of knowledge concepts based on the phrase-entity path, and discovering the implicit knowledge structure from the single text based on the mining of a priori rules to construct the implicit chapter structure. [Result/Conclusion] The comparative experiments show that the proposed text citation approach can effectively utilize the model’s prior knowledge in only five annotated samples, which can well solve the problem of knowledge citation of ancient Chinese medical texts in the absence of annotation compared with the traditional method, and provide a solution for further realization of the parsing of single texts of ancient Chinese medical texts. It is of great theoretical and practical significance for the development of traditional Chinese medicine and modern medicine, as well as the research of medical history to sort out, proofread and annotate ancient Chinese medicine books and dig out the knowledge contained in them.
Liu Yao
,
Li Guanlin
,
Li Huanqing
. Knowledge Indexing and Structural Analysis Techniques for Single Text of Ancient Chinese Medical Books[J]. Library and Information Service, 2022
, 66(24)
: 118
-127
.
DOI: 10.13266/j.issn.0252-3116.2022.24.011
[1] WALLACH H M. Conditional random fields: an introduction[J]. Technical reports (CIS), 2004, 04-21.
[2] RABINER L, JUANG B. An introduction to hidden Markov models[J]. Ieee assp magazine, 1986, 3(1): 4-16.
[3] CHEN M, CHANG B, PEI W. A joint model for unsupervised Chinese word segmentation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing. 2014: 854-863.
[4] GOLDWATER S, GRIFFITHS T L, JOHNSON M. A Bayesian framework for word segmentation: exploring the effects of context[J]. Cognition, 2009, 112(1): 21-54.
[5] MOCHIHASHI D, YAMADA T, UEDA N. Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling[C]//Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP. 2009: 100-108.
[6] 俞敬松,魏一,张永伟,等.基于非参数贝叶斯模型和深度学习的古文分词研究[J].中文信息学报,2020,34(6):1-8.
[7] 孟洪宇, 孟庆刚. 基于条件随机场的中医术语抽取方法及其应用探析[J]. 中华中医药学刊, 2014, 32(10):2334-2337.
[8] 叶辉, 姬东鸿. 基于多特征条件随机场的《金匮要略》症状药物信息抽取研究[J]. 中国中医药图书情报杂志, 2016, 40(5):14-17.
[9] 张艺品, 关贝, 吕荫润,等. 深度学习基础上的中医实体抽取方法研究[J]. 医学信息学杂志, 2019, 40(2):62-67.
[10] 高甦, 金佩, 张德政. 基于深度学习的中医典籍命名实体识别研究[J]. 情报工程, 2019, 5(1):113-123.
[11] PAWER S, PALSHIKAR G K, BHATTACHARYYA P. Relation extraction: a survey[J]. arXiv preprint arXiv:1712.05191, 2017.
[12] FUNDEL K, KUFFNER R, ZIMMER R. RelEx-Relation extraction using dependency parse trees[J]. Bioinformatics, 2007, 23(3): 365-371.
[13] AGICHTEIN E, GRAVANO L. Snowball: extracting relations from large plain-text collections[C]//Proceedings of the fifth ACM conference on digital libraries. 2000: 85-94.
[14] ANGELI G, TIBSHIRANI J, WU J, et al. Combining distant and partial supervision for relation extraction[C]//Proceedings of the 2014 conference on empirical methods in natural language processing. 2014: 1556-1567.
[15] CHEN J, Ji D, TAN C L, et al. Relation extraction using label propagation based semi-supervised learning[C]//Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. 2006: 129-136.
[16] 罗计根, 杜建强, 聂斌,等. 基于双向LSTM和GBDT的中医文本关系抽取模型[J]. 计算机应用研究, 2019,36(12):3744-3747.
[17] 赵立鹏. 面向中医文本的关系抽取技术研究[D]. 唐山:华北理工大学,2019.
[18] MA X, HOVY E. End-to-end sequence labeling via bi-directional lstm-cnns-crf[J]. arXiv preprint arXiv:1603.01354, 2016.
[19] JASKIE K, SPANIAS A. Positive and unlabeled learning algorithms and applications: a survey[C]//2019 10th international conference on information, intelligence, systems and applications. IEEE, 2019: 1-8.
[20] YU H, HAN J, CHANG K C C. PEBL: positive example based learning for web page classification using SVM[C]//Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. 2002: 239-248.
[21] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015,2(7).
[22] LIANG C, YU Y, JIANG H, et al. Bond: bert-assisted open-domain named entity recognition with distant supervision[C]//Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020: 1054-1064.