图书情报工作 ›› 2018, Vol. 62 ›› Issue (5): 115-124.DOI: 10.13266/j.issn.0252-3116.2018.05.013

• 知识组织 • 上一篇    下一篇

融合统计学习和语义过滤的ADR信号抽取模型构建研究

魏巍1, 郑杜2   

  1. 1. 中南财经政法大学大数据研究院 武汉 430074;
    2. 武汉大学信息管理学院 武汉 430072
  • 收稿日期:2017-09-07 修回日期:2017-12-05 出版日期:2018-03-05 发布日期:2018-03-05
  • 作者简介:魏巍(ORCID:0000-0003-3580-8360),讲师,博士,E-mail:503175355@qq.com;郑杜,博士研究生。
  • 基金资助:
    本文系国家自然科学基金项目"基于文本和web语义分析的智能咨询服务研究"(项目编号:71673209)研究成果之一。

The Study of Adverse Drug Reaction Signal Extraction Framework Based on the Integrated Statistical Learning and Semantic Filter

Wei Wei1, Zheng Du2   

  1. 1. Big data Institute, Zhongnan University of Economics and Law, Wuhan 430074;
    2. The Center for the Studies of Information Resources, Wuhan University, Wuhan 430072
  • Received:2017-09-07 Revised:2017-12-05 Online:2018-03-05 Published:2018-03-05

摘要: [目的/意义]社交媒体的出现为医疗健康数据的收集提供了新的途径,应用自然语言处理技术从社交媒体中抽取患者报告的ADR(Adverse Drug Reaction,药物不良反应)信号对于改善药物不良反应监测的临床和科学知识具有很大的潜力。然而,从社会媒体中提取患者报告的ADR信号仍然面临重大挑战。为此,开发一个利用高级自然语言处理技术从健康主题社交媒体中抽取ADR信号的研究模型。[方法/过程]该模型首先采用基于多词典源匹配的方法,从嘈杂的社交媒体中识别医学实体;然后采用最短依存路径核函数为基础的统计学习方法提取药物不良事件;并利用药品安全数据库的语义知识过滤药物的治疗和适用症信息以及否定的药物不良事件;最后,对报告源进行分类剔除传闻等噪音信息。[结果/结论]通过收集糖尿病论坛上的数据对模型的有效性进行验证,结果显示该模型的每一部分都有助于其整体性能的提升。

关键词: 医学实体识别, 药物不良事件抽取, 健康社交媒体, 统计学习, 语义过滤

Abstract: [Purpose/significance] The emergence of social media provides a new way to collect healthcare data. By using natural language management technology,the adverse drug reaction(ADR)signal can be extracted from social media,it has great potential to improve the clinical and scientific knowledge of ADR monitoring.However, the extraction of ADR from patients' reports in the social media is still a major challenge. This paper puts forwards an adverse drug reaction signal extraction framework based on advanced natural language processing techniques.[Method/process] The ADR signal extraction framework include the following implementation steps:Firstly,it recognizes the medical entity from the noisy social media based on multi-dictionary sources matching. Secondly, it applies statistical learning based on the shortest dependency path kernel to extract the adverse drug events.Then, filtering the information on the treatment and application of drugs as well as negative drug adverse events by though the semantic knowledge of the drug safety database. Finally,in order to remove rumors and other noise information, we should categorize the source of the report.[Result/conclusion] We collect data from BBS diabetes to identify the validity of the model,the result shows that each part of the model contributes to its overall performance.

Key words: medical entity recognition, adverse drug event extraction, health social media, statistical learning, semantic filter

中图分类号: