KNOWLEDGE ORGANIZATION

A Knowledge Graph Construction for Q&A Text in Chinese Online Medical Community

  • Xi Yunjiang ,
  • Li Man ,
  • Deng Yushan ,
  • Liao Xiao ,
  • Kuang Yunying
Expand
  • 1 School of Business Administration, South China University of Technology, Guangzhou 510641
    2 School of Internet Finance and Information Engineering, Guangdong University of Finance, Guangzhou 510521
    3 School of Management, Guangzhou City University of Technology, Guangzhou 510800
    4 School of Information Engineering, Guangzhou Vocational University of Science and Technology, Guangzhou 510550

Received date: 2023-06-05

  Revised date: 2023-08-19

  Online published: 2024-03-15

Supported by

This work is supported by the National Natural Science Foundation of China project titled “Research on Quantitive Model of Information Credibility Evaluation for Virtual Health Communities and Its Applications”(Grant No.72171090) and the Guangdong Basic and Applied Basic Research Fund project titled “Research on Knowledge Value Evaluation Model and Method of User Innovation Community Based on Supernetwork Modeling”(Grant No.2023A1515011551)

Abstract

[Purpose/Significance] This paper designs a set of knowledge graph construction method with some deep learning methods to facilitate knowledge extraction from colloquial, noisy and poorly normalized online medical community Q&A texts. [Method/Process] This paper utilized diabetes-related Q&A texts from xywy. com as the dataset, and determined entity and relationship categories through an analysis of the healthcare needs of the community users. The BERT-wwm model was employed for word embedding to solve polysemy, and then the BiLSTM-CRF model for entity recognition. When annotating the relations between entities, an entity mask was designed to avoid the relation overlap, and the CNN-Attention model was adopted for relation extraction. Ultimately, structured data was obtained through entity alignment using dictionary matching and entity name similarity, and stored and visualized using Neo4j. [Result/Conclusion] Experiments verify the effectiveness of the above methods. This paper extracts the medical knowledge from non-structured OMC text into structured data, which can promote the community knowledge discovery and online intelligent health services.

Cite this article

Xi Yunjiang , Li Man , Deng Yushan , Liao Xiao , Kuang Yunying . A Knowledge Graph Construction for Q&A Text in Chinese Online Medical Community[J]. Library and Information Service, 2024 , 68(4) : 124 -136 . DOI: 10.13266/j.issn.0252-3116.2024.04.010

References

[1] AMIT S. Introducing the knowledge graph[EB/OL].[2023-11-05]. https://blog.google/products/search/introducing-knowledgegraph-things-not/.
[2] 徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4):589-606. (XU Z L, SHENG Y P, HE L R, et al. Review on knowledge graph techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4):589-606.)
[3] 范智渊, 何璇, 梁品, 等. 中文医学文献的实体关系提取研究及在糖尿病医学文献中的应用[J]. 生物医学工程学杂志, 2021, 38(3):563-573. (FAN Z Y, HE X, LIANG P, et al. Research on entity relationship extraction of Chinese medical literature and application in diabetes medical literature[J]. Journal of biomedical engineering, 2021, 38(3):563-573.)
[4] 林燕榕, 张怡, 刘迪, 等. 基于肾病专科电子病历构建肾病医学知识图谱[J]. 西南大学学报(自然科学版), 2020, 42(11):52-58. (LIN Y R, ZHANG Y, LIU D, et al. Constructing a medical knowledge graph of nephropathy based on the electronic medical records of nephropathy specialists[J]. Journal of Southwest University (natural science edition), 2020, 42(11):52-58.)
[5] 付洋, 刘茂福, 乔瑞. 心脏病中文知识图谱的构建[J]. 武汉大学学报(理学版), 2020, 66(3):261-267. (FU Y, LIU M F, QIAO R. Construction of Chinese knowledge graph of heart disease[J]. Journal of Wuhan University (natural science edition), 2020, 66(3):261-267.)
[6] ERNST P, SIU A, WEIKUM G. KnowLife:a versatile approach for constructing a large knowledge graph for biomedical sciences[J]. BMC bioinformatics, 2015, 16(1):1-13.
[7] 金碧漪. 基于多源UGC数据的健康领域知识图谱构建[D]. 上海:华东师范大学, 2016. (JIN B Y. Construction of health knowledge graph based on multi-source UGC data[D]. Shanghai:East China Normal University, 2016.)
[8] YANG H Z, GAO H Y. Toward sustainable virtualized health care:extracting medical entities from Chinese online health consultations using deep neural networks[J]. Sustainability, 2018, 10(9):3292.
[9] ZHANG Y L, LI X M, ZHANG Z. Disease-pertinent knowledge extraction in online health communities using GRU based on a double attention mechanism[J]. IEEE access, 2020, 8:95947-95955.
[10] RAU L F. Extracting company names from text[C]//Proceedings the 7th IEEE conference on artificial intelligence application. Miami Beach:IEEE Computer Society, 1991:29-32.
[11] TODOROVIC B T, RANCIC S R, MARKOVICI M, et al. Named entity recognition and classification using context hidden Markov model[C]//Proceedings of the 9th symposium on neural network applications in electrical engineering. Belgrade:IEEE, 2008:43-46.
[12] LI D C, KARIN K S, GUERGANA S. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts[C]//Proceedings of the workshop on current trends in biomedical natural language processing. Ohio:Association for Computational Linguistics, 2008:94-95.
[13] LIU X H, ZHANG S D, WEI F R, et al. Recognizing named entities in Tweets[C]//Proceedings of the 49th annual meeting of the association for computational linguistics:human language technologies. Oregon:Association for Computational Linguistics, 2011:359-367.
[14] LUO L, YANG Z, YANG P, et al. An attention-based BiLSTMCRF approach to document-level chemical named entity recognition[J]. Bioinformatics, 2018, 34(8):1381-1388.
[15] XU K, YANG Z G, KANG P P, et al. Document-level attention based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition[J]. Computers in biology and medicine. 2019, 108:122-132.
[16] 黄梦醒, 李梦龙, 韩惠蕊. 基于电子病历的实体识别和知识图谱构建的研究[J]. 计算机应用研究, 2019, 36(12):3735-3739. (HUANG M X, LI M L, HAN H R. Research on entity recognition and knowledge graph construction based on electronic medical records[J]. Application research of computers, 2019, 36(12):3735-3739.)
[17] 李纲, 潘荣清, 毛进, 等. 整合BiLSTM-CRF网络和词典资源的中文电子病历实体识别[J]. 现代情报, 2020, 40(4):3-12, 58. (LI G, PAN R Q, MAO J, et al. Entity recognition of Chinese electronic medical records based on BiLSTM -CRF network and dictionary resources[J]. Journal of modern information, 2020, 40(4):3-12, 58.)
[18] SOCHER R, HUVAL B, MANNING C D, et al. Semantic compositionality through recursive matrix-vector spaces[C]//Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Jeju Island:Association for Computational Linguistics, 2012:1201-1211.
[19] LIU C Y, SUN W B, CHAO W H, et al. Convolution neural network for relation extraction[C]//International conference on advanced data mining and applications. Berlin:Springer, 2013:231-242.
[20] SHEN Y, HUANG X J. Attention-based convolutional neural network for semantic relation extraction[C]//Proceedings of the 26th international conference on computational linguistics:technical papers. Osaka:The COLING 2016 organizing committee, 2016:2526-2536.
[21] ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th annual meeting of the Association for Computational Linguistics. Berlin:Association for Computational Linguistics, 2016:207-212.
[22] 张兰霞, 胡文心. 基于双向GRU神经网络和双层注意力机制的中文文本中人物关系抽取研究[J]. 计算机应用与软件, 2018, 35(11):130-135, 189. (ZHANG L X, HU W X. Character relation extraction in Chinese text based on bidirectional GRU neural network and dual-attention mechanism[J]. Computer applications and software, 2018, 35(11):130-135, 189.)
[23] HARRIS Z S. Distributional structure[J]. Word, 1954, 10(2/3):146-162.
[24] PENNINGTON J, SOCHER R, MANNING C D. Glove:global vectors for word representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing. Doha:Association for Computational Linguistics, 2014:1532-1543.
[25] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//International conference on learning representations. Arizona:OpenReview.net, 2013.
[26] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies. Louisiana:Association for Computational Linguistics, 2018:2227-2237.
[27] DEVLIN J, CHANG M W, LEE K, et al. BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies. Minnesota:Association for Computational Linguistics, 2019:4171-4186.
[28] 许力, 李建华. 基于BERT和BiLSTM-CRF的生物医学命名实体识别[J]. 计算机工程与科学, 2021, 43(10):1873-1879. (XU L, LI J H. Biomedical named entity recognition based on BERT and BiLSTM-CRF[J]. Computer engineering and science, 2021, 43(10):1873-1879.)
[29] CUI Y, CHE W, LIU T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM transactions on audio, speech, and language processing, 2021, 29:3504-3514.
[30] 邱锡鹏. 神经网络与深度学习[M]. 北京:机械工业出版社, 2020:141-143. (QIU X P. Neural networks and deep learning[M]. Beijing:China Machine Press, 2020:141-143.)
[31] 周志华. 机器学习[M]. 北京:清华大学出版社, 2016:325-327. (ZHOU Z H. Machine learning[M]. Beijing:Tsinghua University Press, 2016:325-327.)
Outlines

/