[目的/意义]分析并提出虚拟健康社区文本数据的知识发现策略,构建虚拟健康社区文本数据知识发现模型。[方法/过程]通过总结分析虚拟健康社区文本数据特点,针对其特点带来的数据挖掘困难制定相应的知识发现策略,并在DIKW体系指导下,依据提出的知识发现策略构建虚拟健康社区文本数据知识发现模型。通过应用计算机编码、自然语言处理技术、句法分析、制定推理规则等方法实现从自由文本数据到药物不良反应智慧的数据价值升华过程。[结果/结论]通过实证研究验证提出的知识发现策略和知识发现模型的有效性和可操作性,为后续虚拟健康社区文本数据知识发现的相关理论与实证研究提供参考。
[Purpose/significance] This study aims to analyze and propose the knowledge discovery strategy and build a knowledge discovery model of virtual health community text data. [Method/process] Firstly it summarized features of virtual health community text data, in view of the difficult of data mining to formulate the corresponding knowledge discovery strategy, and guided by DIKW system, to build knowledge discovery model of virtual health community text data based on knowledge discovery strategy. Through the application of computer code, natural language processing, syntactic analysis, and methods of inference rules, it realized the sublimation process of data value from free text data to the wisdom of adverse drug reactions. [Result/conclusion] Empirical research is carried out to verify the effectiveness and operability of the proposed knowledge discovery strategy and knowledge discovery model, so that it can provide reference for the subsequent theory and empirical research on knowledge discovery of virtual health community text data.
[1] ZAFARANI R, ABBASI M, LIU H.Social media mining:an introduction[M]. Cambridge:Cambridge University Press,2014:16.
[2] CHEN Y, LI Z, NIE L, et al. A semi-supervised bayesian network model for microblog topic classification[C]//24th International conference on computational linguistics.Mumbai:COLING, 2012:561-576.
[3] 景悦诚.基于丰富语言特征的中文社交媒体事件发掘[D].上海:上海交通大学, 2015.
[4] 朱晓光.基于半监督学习的微博情感分析方法研究[D].济南:山东财经大学, 2014.
[5] JI X, CHUN S A, GELLER J. Monitoring public health concerns using twitter sentiment classifications[C]//IEEE international conference on healthcare informatics. Philadelphia:IEEE Computer Society, 2013:335-344.
[6] GHOSH D,GUHA R.What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and geographic information system[J].Cartography and geographic information science,2013, 40(2):90-102.
[7] MEHROTRA R, SANNER S, BUNTINE W, et al.Improving LDA topic models for microblogs via tweet pooling and automatic labeling[C]//International ACM SIGIR conference on research and development in information retrieval. Gold Coast:ACM, 2013:889-892.
[8] PARKER J, WEI Y, YATES A, et al.A framework for detecting public health trends with Twitter[C]//IEEE/AMC international conference on advances in social networks analysis and mining.Niagare Falls:IEEE, 2013:556-563.
[9] DOAN S, OHNO-MACHADO L, COLLIER N. Enhancing twitter data analysis with simple semantic filtering:example in tracking influenza-like illnesses[C]//IEEE second international conference on healthcare informatics, imaging and systems biology. Piscataway:IEEE Computer Society, 2012:62-71.
[10] KOSTKOVA P, SZOMSZOR M, ST LOUIS C.Swineflu:the use of twitter as an early warning and risk communication tool in the 2009 swine flu pandemic[J].ACM transactions on management information systems, 2014, 5(2):1-25.
[11] YOUNG S D, RIVERS C, LEWIS B.Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes[J]. Preventive medicine, 2014, 63(3):112-115.
[12] BARAZANJI D, BJELKMAR P.System for surveillance and investigation of disease outbreaks[C]//23rd International conference on World Wide Web pages. Seoul:Association for Computing Machinery, 2014:667-668.
[13] ACKOFF R L.From data to wisdom[J].Journal of applies systems analysis,1989,16(1):3-9.
[14] BELLINGER G, CASTRO D.Data, information, knowledge, and wisdom[J]. Anaesthesia & intensive care medicine, 2004, 15(1):44-45.
[15] 马彬.事件关系识别关键技术研究[D].苏州:苏州大学, 2014.
[16] MedHelp[EB/OL].[2017-01-25].http://www.medhelp.org/.
[17] 冯丽芝. 面向命名实体抽取的大规模中医临床病历语料库构建方法研究[D].北京:北京交通大学, 2015.
[18] 医学一体化语言系统[EB/OL].[2016-12-12].http://www.cintcm.com/yuyan/content/word/UMLS.ppt.
[19] CHV Wiki[EB/OL].[2017-08-13].http://consumerhealthvocab.chpc.utah.edu/CHVwiki/.
[20] SIDER[EB/OL].[2017-09-12].http://sideeffects.embl.de/.
[21] MetaMap[EB/OL].[2017-03-12].http://metamap.nlm.nih.gov/.