知识组织

中文超声文本结构化与知识网络构建方法研究

  • 尚小溥 ,
  • 许吴环 ,
  • 赵红梅 ,
  • 张润彤 ,
  • 朱燊
展开
  • 1. 北京交通大学经济管理学院信息管理系 北京 100044;
    2. 北京大学人民医院 北京 100044
尚小溥(ORCID:0000-0002-7872-5744),讲师,博士,E-mail:sxp@bjtu.edu.cn;许吴环(ORCID:0000-0003-2621-7913),硕士研究生;赵红梅(ORCID:0000-0001-6880-3342),副研究员,硕士;张润彤(ORCID:0000-0003-0246-5058),系主任,教授,博士;朱燊(ORCID:0000-0002-5802-8132),本科。

收稿日期: 2018-11-20

  修回日期: 2019-03-22

  网络出版日期: 2019-08-20

基金资助

本文系国家自然科学基金项目"面向临床决策辅助的电子病历文本结构化方法与知识挖掘研究"(项目编号:61702033)和教育部人文社科项目"基于电子病历文本的临床知识挖掘研究"(项目编号:17YJC870015)研究成果之一。

Research on Structure and Knowledge Network Construction of Chinese Ultrasonic Text

  • Shang Xiaopu ,
  • Xu Wuhuan ,
  • Zhao Hongmei ,
  • Zhang Runtong ,
  • Zhu Shen
Expand
  • 1. Department of Information Management, School of Economic Management, Beijing Jiaotong University, Beijing 100044;
    2. Peking University People's Hospital, Beijing 100044

Received date: 2018-11-20

  Revised date: 2019-03-22

  Online published: 2019-08-20

摘要

[目的/意义]超声检查是判断患者病情的重要依据,目前主要检查数据是以文本形式存在。本文提出一种基于超声检查数据的文本结构化和知识网络构建方法,为进一步挖掘临床知识奠定数据基础。[方法/过程]对自然语言处理技术在超声文本环境下的应用进行改进,包括分词处理、内容定位、结构化识别三个主要步骤,实现对超声文本的切分与标记,并且在此基础上建立其结构化知识网络。[结果/结论]真实数据测试结果显示,本文提出的面向超声检查文本的结构化方法具有较好的性能表现。该方法可以实现对批量超声文本结构化网络的自动构建,能够反映超声文本中结构化内容的层次关系与属性结构等潜在知识。

本文引用格式

尚小溥 , 许吴环 , 赵红梅 , 张润彤 , 朱燊 . 中文超声文本结构化与知识网络构建方法研究[J]. 图书情报工作, 2019 , 63(16) : 112 -120 . DOI: 10.13266/j.issn.0252-3116.2019.16.012

Abstract

[Purpose/significance] Ultrasound examination is an important basis for diagnosis, but the major examination data is in the form of text. So, based these data, this paper studies a method that can automatically structure natural language texts and construct knowledge network, which lays the data foundation for further mining clinical knowledge hidden in EMR.[Method/process] This paper improved the application of natural language processing technology in ultrasonic, including three main steps:segmentation processing, content location and structured recognition, to realize the segmentation and labeling of ultrasonic text, and on this basis, the ultrasound examination knowledge network was established.[Result/conclusion] The test results of real data show that the method for structuring ultrasound texts proposed in this paper has better performance. This method can realize the automatic construction of knowledge network of batch ultrasound texts, and can reflect the potential knowledge of hierarchical relationship and attribute structure of structured content in ultrasonic text.

参考文献

[1] 陈永莉,洪漪.检索语言在医学信息管理与检索中的应用综述[J].图书情报知识,2015(3):72-79.
[2] 郭熙铜,张晓飞,刘笑笑,等.数据驱动的电子健康服务管理研究:挑战与展望[J].管理科学, 2017,30(1):3-14.
[3] JIMÉNEZ P, CORCHUELO R. On learning web information extraction rules with TANGO[J]. Information systems, 2016, 62(12):74-103.
[4] 刘峤,李杨,段宏,等.知识图谱构建技术综述[J].计算机研究与发展,2016,53(3):582-600.
[5] 张义,李治江.基于高斯词长特征的中文分词方法[J].中文信息学报,2016,30(5):89-93.
[6] 郭顺利,张向先.面向中文图书评论的情感词典构建方法研究[J].现代图书情报技术,2016,32(2):67-74.
[7] STANFORD NLP.The stanford natural language progressing group[EB/OL].[2018-06-09]. https://nlp.stanford.edu/.
[8] JIEBA.结巴中文分词[EB/OL].[2018-04-09].http://www.oss.io/p/fxsjy/jieba.
[9] LTP.语言云[EB/OL].[2018-04-08].https://www.ltp-cloud.com/.
[10] 王兰英,雍文明,王连柱,等.中美医学论文英文摘要文本对比分析[J].科技与出版,2011(11):78-82.
[11] 刘洋,崔雷.引文上下文在文献内容分析中的信息价值研究[J].图书情报工作,2014,58(6):101-104.
[12] ZHANG S, TIAN K, ZHANG X, et al. Speculation detection for Chinese clinical notes:impacts of word segmentation and embedding models[J]. Journal of biomedical informatics, 2016, 60:334-341.
[13] 于跃,徐志健,王坤,等.基于双聚类方法的生物医学信息学文本数据挖掘研究[J].图书情报工作,2012,56(18):133-136.
[14] FINLAYSON S G, LEPENDU P, SHAH N H. Building the graph of medicine from millions of clinical narratives[J]. Scientific data, 2014, 1:140032.
[15] 郭少友,李亚菲,梁园园.基于细粒度语义化描述的医学文本检索[J].情报理论与实践,2015,38(8):130-134.
[16] 魏巍,郑杜.融合统计学习和语义过滤的ADR信号抽取模型构建研究[J].图书情报工作,2017,62(5):115-124.
[17] 李国垒,陈先来,夏冬,等.中文病历文本分词方法研究[J].中国生物医学工程学报,2016,35(4):477-481.
[18] 张晔,张晗,尹玢璨,等.基于电子病历利用支持向量机构建疾病预测模型——以重度急性胰腺炎早期预警为例[J].现代图书情报技术,2016,32(2):83-89.
[19] LEI J, TANG B, LU X, et al. A comprehensive study of named entity recognition in Chinese clinical text[J]. Journal of the American medical informatics association, 2014, 21(5):808-814.
[20] LIANG J, XIAN X, HE X, et al. A novel approach towards medical entity recognition in Chinese clinical text[J]. Journal of healthcare engineering, 2017, 2017.
[21] JENSEN P B, JENSEN L J, Brunak S. Mining electronic health records:towards better research applications and clinical care[J]. Nature reviews genetics, 2012, 13(6):395-405.
[22] 李国垒,陈先来,夏冬,等.面向临床决策的电子病历文本潜在语义分析[J].现代图书情报技术,2016,32(3):50-57.
[23] WANG H, ZHANG W, ZENG Q, et al. Extracting important information from Chinese operation notes with natural language processing methods[J]. Journal of biomedical informatics, 2014, 48:130-136.
[24] HE B, DONG B, GUAN Y, et al. Building a comprehensive syntactic and semantic corpus of Chinese clinical texts[J]. Journal of biomedical informatics, 2017, 69:203-217.
[25] 张盈利,夏小玲.非结构化病理文本的结构化信息抽取方法[J].医学信息学杂志,2016,37(4):54-58.
[26] 陈德华,冯洁莹,乐嘉锦,等.中文病理文本的结构化处理方法研究[J].计算机科学,2016,43(10):272-276.
[27] 丁祥武,张夕华.医疗领域文本结构化[J].计算机工程与设计,2017,38(10):2873-2878.
[28] DONG X, CHOWDHURY S, QIAN L, et al. Transfer bi-directional LSTM RNN for named entity recognition in Chinese electronic medical records[C]//Dalian, Liaoning, China:2017 IEEE 19th International Conference one-Health Networking, Applications and Services (Healthcom). Dalian:IEEE, 2017.
[29] 王鹏远,姬东鸿.基于多标签CRF的疾病名称抽取[J].计算机应用研究,2017,34(1):118-122.
[30] 侯伟涛,姬东鸿.基于Bi-LSTM的医疗事件识别研究[J].计算机应用研究,2018,35(7):1974-1977.
[31] BEAN D M, WU H, IQBAL E, et al. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records[J]. Scientific reports, 2017, 7(1):16416.
[32] ROTMENSCH M, HALPERN Y, TLIMAT A, et al. Learning a health knowledge graph from electronic medical records[J]. Scientific reports, 2017, 7(1):5994.
[33] 黄梦醒,李梦龙,韩惠蕊.基于电子病历的实体识别和知识图谱构建的研究[J/OL].计算机应用研究:1-7[2019-03-12].http://kns.cnki.net/kcms/detail/51.1196.TP.20181129.1122.011.html.
[34] CHARIKAR M S. Similarity estimation techniques from rounding algorithms[C]//Montreal, Quebec, Canada:Proceedings of the thirty-fourth annual ACM symposium on Theory of computing. ACM, 2002:380-388.
[35] REZAEIAN N, NOVIKOVA G M. Detecting near-duplicates in Russian documents through using fingerprint algorithm Simhash[J]. Procedia computer science, 2017, 103:421-425.
文章导航

/