中文在线医疗社区问答内容知识图谱构建研究

席运江; 李曼; 邓雨珊; 廖晓; 邝云英

doi:10.13266/j.issn.0252-3116.2024.04.010

图书情报工作 >

2024 , Vol. 68 >Issue 4: 124 - 136

DOI: https://doi.org/10.13266/j.issn.0252-3116.2024.04.010

知识组织

中文在线医疗社区问答内容知识图谱构建研究

席运江 ,
李曼 ,
邓雨珊 ,
廖晓 ,
邝云英

展开

1 华南理工大学工商管理学院广州 510641
2 广东金融学院互联网金融与信息工程学院广州 510521
3 广州城市理工学院管理学院广州 510800
4 广州科技职业技术大学信息工程学院广州 510550

席运江,副教授,博士,硕士生导师;李曼,硕士研究生;邓雨珊,硕士研究生;廖晓,副教授,博士,通信作者, E-mail:winnie3223@163.com;邝云英,讲师,硕士。

收稿日期: 2023-06-05

修回日期: 2023-08-19

网络出版日期: 2024-03-15

基金资助

本文系国家自然科学基金项目“虚拟健康社区信息可信度评价模型及智能推荐研究”（项目编号： 72171090）和广东省基础与应用基础研究基金自然科学基金项目“基于超网络建模的用户创新社区知识价值评价模型及方法研究”（项目编号： 2023A1515011551）研究成果之一。

收起

A Knowledge Graph Construction for Q&A Text in Chinese Online Medical Community

Xi Yunjiang ,
Li Man ,
Deng Yushan ,
Liao Xiao ,
Kuang Yunying

Expand

1 School of Business Administration, South China University of Technology, Guangzhou 510641
2 School of Internet Finance and Information Engineering, Guangdong University of Finance, Guangzhou 510521
3 School of Management, Guangzhou City University of Technology, Guangzhou 510800
4 School of Information Engineering, Guangzhou Vocational University of Science and Technology, Guangzhou 510550

Received date: 2023-06-05

Revised date: 2023-08-19

Online published: 2024-03-15

Supported by

This work is supported by the National Natural Science Foundation of China project titled “Research on Quantitive Model of Information Credibility Evaluation for Virtual Health Communities and Its Applications”(Grant No.72171090) and the Guangdong Basic and Applied Basic Research Fund project titled “Research on Knowledge Value Evaluation Model and Method of User Innovation Community Based on Supernetwork Modeling”(Grant No.2023A1515011551)

Fold

摘要

[目的/意义]为有效抽取在线医疗社区问答文本中包含的医疗知识，综合利用多种深度学习方法，有针对性地设计一套知识图谱构建方法，以应对其口语化、噪声多、规范性差的文本特性给知识抽取带来的巨大挑战。[方法/过程]以寻医问药网糖尿病相关问答文本为数据源，结合对社区用户健康需求的分析，定义适合社区文本的实体和关系类型。使用BERT-wwm进行词嵌入以解决一词多义问题，通过BiLSTM-CRF模型进行实体识别。在关系标注时，设计一种实体遮蔽(entity mask)方式以解决关系重叠问题，而后使用CNN-Attention模型进行关系抽取。最后综合使用词典匹配和实体名称相似度进行实体对齐，并使用Neo4j图数据库存储和可视化得到的糖尿病知识图谱。[结果/结论]实验结果显示上述方法能够大幅提升对在线医疗社区问答文本的知识抽取效果，有效将非结构化的社区医疗问答文本转化为结构化的数据，对于社区知识发现、在线智能健康服务等方面具有推动作用。

关键词： 在线医疗社区; 知识图谱; BERT; 注意力机制; 深度学习

本文引用格式

席运江 , 李曼 , 邓雨珊 , 廖晓 , 邝云英 . 中文在线医疗社区问答内容知识图谱构建研究[J]. 图书情报工作, 2024 , 68(4) : 124 -136 . DOI: 10.13266/j.issn.0252-3116.2024.04.010

Abstract

[Purpose/Significance] This paper designs a set of knowledge graph construction method with some deep learning methods to facilitate knowledge extraction from colloquial, noisy and poorly normalized online medical community Q&A texts. [Method/Process] This paper utilized diabetes-related Q&A texts from xywy. com as the dataset, and determined entity and relationship categories through an analysis of the healthcare needs of the community users. The BERT-wwm model was employed for word embedding to solve polysemy, and then the BiLSTM-CRF model for entity recognition. When annotating the relations between entities, an entity mask was designed to avoid the relation overlap, and the CNN-Attention model was adopted for relation extraction. Ultimately, structured data was obtained through entity alignment using dictionary matching and entity name similarity, and stored and visualized using Neo4j. [Result/Conclusion] Experiments verify the effectiveness of the above methods. This paper extracts the medical knowledge from non-structured OMC text into structured data, which can promote the community knowledge discovery and online intelligent health services.

Key words： online medical community; knowledge graph; BERT; attention mechanism; deep learning

参考文献

[1] AMIT S. Introducing the knowledge graph[EB/OL].[2023-11-05]. https://blog.google/products/search/introducing-knowledgegraph-things-not/.
[2] 徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4):589-606. (XU Z L, SHENG Y P, HE L R, et al. Review on knowledge graph techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4):589-606.)
[3] 范智渊, 何璇, 梁品, 等. 中文医学文献的实体关系提取研究及在糖尿病医学文献中的应用[J]. 生物医学工程学杂志, 2021, 38(3):563-573. (FAN Z Y, HE X, LIANG P, et al. Research on entity relationship extraction of Chinese medical literature and application in diabetes medical literature[J]. Journal of biomedical engineering, 2021, 38(3):563-573.)
[4] 林燕榕, 张怡, 刘迪, 等. 基于肾病专科电子病历构建肾病医学知识图谱[J]. 西南大学学报(自然科学版), 2020, 42(11):52-58. (LIN Y R, ZHANG Y, LIU D, et al. Constructing a medical knowledge graph of nephropathy based on the electronic medical records of nephropathy specialists[J]. Journal of Southwest University (natural science edition), 2020, 42(11):52-58.)
[5] 付洋, 刘茂福, 乔瑞. 心脏病中文知识图谱的构建[J]. 武汉大学学报(理学版), 2020, 66(3):261-267. (FU Y, LIU M F, QIAO R. Construction of Chinese knowledge graph of heart disease[J]. Journal of Wuhan University (natural science edition), 2020, 66(3):261-267.)
[6] ERNST P, SIU A, WEIKUM G. KnowLife:a versatile approach for constructing a large knowledge graph for biomedical sciences[J]. BMC bioinformatics, 2015, 16(1):1-13.
[7] 金碧漪. 基于多源UGC数据的健康领域知识图谱构建[D]. 上海:华东师范大学, 2016. (JIN B Y. Construction of health knowledge graph based on multi-source UGC data[D]. Shanghai:East China Normal University, 2016.)
[8] YANG H Z, GAO H Y. Toward sustainable virtualized health care:extracting medical entities from Chinese online health consultations using deep neural networks[J]. Sustainability, 2018, 10(9):3292.
[9] ZHANG Y L, LI X M, ZHANG Z. Disease-pertinent knowledge extraction in online health communities using GRU based on a double attention mechanism[J]. IEEE access, 2020, 8:95947-95955.
[10] RAU L F. Extracting company names from text[C]//Proceedings the 7th IEEE conference on artificial intelligence application. Miami Beach:IEEE Computer Society, 1991:29-32.
[11] TODOROVIC B T, RANCIC S R, MARKOVICI M, et al. Named entity recognition and classification using context hidden Markov model[C]//Proceedings of the 9th symposium on neural network applications in electrical engineering. Belgrade:IEEE, 2008:43-46.
[12] LI D C, KARIN K S, GUERGANA S. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts[C]//Proceedings of the workshop on current trends in biomedical natural language processing. Ohio:Association for Computational Linguistics, 2008:94-95.
[13] LIU X H, ZHANG S D, WEI F R, et al. Recognizing named entities in Tweets[C]//Proceedings of the 49th annual meeting of the association for computational linguistics:human language technologies. Oregon:Association for Computational Linguistics, 2011:359-367.
[14] LUO L, YANG Z, YANG P, et al. An attention-based BiLSTMCRF approach to document-level chemical named entity recognition[J]. Bioinformatics, 2018, 34(8):1381-1388.
[15] XU K, YANG Z G, KANG P P, et al. Document-level attention based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition[J]. Computers in biology and medicine. 2019, 108:122-132.
[16] 黄梦醒, 李梦龙, 韩惠蕊. 基于电子病历的实体识别和知识图谱构建的研究[J]. 计算机应用研究, 2019, 36(12):3735-3739. (HUANG M X, LI M L, HAN H R. Research on entity recognition and knowledge graph construction based on electronic medical records[J]. Application research of computers, 2019, 36(12):3735-3739.)
[17] 李纲, 潘荣清, 毛进, 等. 整合BiLSTM-CRF网络和词典资源的中文电子病历实体识别[J]. 现代情报, 2020, 40(4):3-12, 58. (LI G, PAN R Q, MAO J, et al. Entity recognition of Chinese electronic medical records based on BiLSTM -CRF network and dictionary resources[J]. Journal of modern information, 2020, 40(4):3-12, 58.)
[18] SOCHER R, HUVAL B, MANNING C D, et al. Semantic compositionality through recursive matrix-vector spaces[C]//Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Jeju Island:Association for Computational Linguistics, 2012:1201-1211.
[19] LIU C Y, SUN W B, CHAO W H, et al. Convolution neural network for relation extraction[C]//International conference on advanced data mining and applications. Berlin:Springer, 2013:231-242.
[20] SHEN Y, HUANG X J. Attention-based convolutional neural network for semantic relation extraction[C]//Proceedings of the 26th international conference on computational linguistics:technical papers. Osaka:The COLING 2016 organizing committee, 2016:2526-2536.
[21] ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th annual meeting of the Association for Computational Linguistics. Berlin:Association for Computational Linguistics, 2016:207-212.
[22] 张兰霞, 胡文心. 基于双向GRU神经网络和双层注意力机制的中文文本中人物关系抽取研究[J]. 计算机应用与软件, 2018, 35(11):130-135, 189. (ZHANG L X, HU W X. Character relation extraction in Chinese text based on bidirectional GRU neural network and dual-attention mechanism[J]. Computer applications and software, 2018, 35(11):130-135, 189.)
[23] HARRIS Z S. Distributional structure[J]. Word, 1954, 10(2/3):146-162.
[24] PENNINGTON J, SOCHER R, MANNING C D. Glove:global vectors for word representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing. Doha:Association for Computational Linguistics, 2014:1532-1543.
[25] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//International conference on learning representations. Arizona:OpenReview.net, 2013.
[26] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies. Louisiana:Association for Computational Linguistics, 2018:2227-2237.
[27] DEVLIN J, CHANG M W, LEE K, et al. BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies. Minnesota:Association for Computational Linguistics, 2019:4171-4186.
[28] 许力, 李建华. 基于BERT和BiLSTM-CRF的生物医学命名实体识别[J]. 计算机工程与科学, 2021, 43(10):1873-1879. (XU L, LI J H. Biomedical named entity recognition based on BERT and BiLSTM-CRF[J]. Computer engineering and science, 2021, 43(10):1873-1879.)
[29] CUI Y, CHE W, LIU T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM transactions on audio, speech, and language processing, 2021, 29:3504-3514.
[30] 邱锡鹏. 神经网络与深度学习[M]. 北京:机械工业出版社, 2020:141-143. (QIU X P. Neural networks and deep learning[M]. Beijing:China Machine Press, 2020:141-143.)
[31] 周志华. 机器学习[M]. 北京:清华大学出版社, 2016:325-327. (ZHOU Z H. Machine learning[M]. Beijing:Tsinghua University Press, 2016:325-327.)

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献