[目的/意义]针对共词分析存在的普遍问题,提出一种基于细粒度语义分析的共词网络构建与分析方法。[方法/过程]借助SemRep实现源文本主题概念及其语义关系的规范化抽取并由此构建语义共词网络,然后以节点的中心度和边的频次为指标对内容特征词进行抽取,利用UMLS语义网络规定的语义搭配模式,通过概念-语义类型-语义类型组的两级映射,对语义述谓项进行类团划分。[结果/结论]通过与常规共词分析方法比较,发现基于细粒度语义关系的共词分析能有效地揭示文本主题内容,利用UMLS语义网络资源能从语义学角度清晰准确地对语义共词网络进行类团划分。
[Purpose/significance] To solve the general problems in co-word analysis, we propose a method for constructing and analyzing fine-grained semantic co-word network. [Method/process] The standard concepts and semantic relations between concepts were extracted from the source text with SemRep and hence the semantic co-word network was built. The feature words were extracted according to the centrality of the nodes and the frequency of the edges. The semantic predications were grouped based on the semantic schema defined by UMLS semantic network and the mapping from concept to its semantic type and semantic type to semantic type group. [Result/conclusion] Compared with routine co-word analysis method, the fine-grained semantic co-word analysis we proposed can effectively represent the content for source text. UMLS semantic network can be used to partition the semantic co-word network accurately.
[1] CALLON M, LAW J, RIP A. Mapping the dynamics of science and technology: sociology of science in the real world[M]. Basingstoke: Macmillan,1986:124.
[2] 郭红梅,张智雄. 基于图挖掘的文本主题识别方法研究综述[J]. 中国图书馆学报, 2015(6): 97-108.
[3] 王玉林,王忠义. 细粒度语义共词分析方法研究[J]. 图书情报工作, 2014, 58(21): 73-80.
[4] 高继平,丁堃,潘云涛,等. 共词网络中连线的重要性分析及其应用[J]. 情报理论与实践, 2015, 38(2): 79-83, 70.
[5] 钟伟金,李佳,杨兴菊. 共词分析法研究(三)-共词聚类分析法的原理与特点[J]. 情报杂志, 2008, 27(7): 118-120.
[6] AROSON A R, LANG F M. An overview of MetaMap: historical perspective and recent advances[J]. Journal of American Medical Informatics Association, 2010, 17(3): 229-236.
[7] RINDFLESCH T C, FISZMAN M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text[J]. Journal of biomedical informatics, 2003, 36(6): 462-477.
[8] 李纲,毛进. 文本图表示模型及其在文本挖掘中的应用[J]. 情报学报, 2013, 32(12): 1257-1264.
[9] 张晗,刘双梅. 节点中心度指标对语义述谓网络概念抽取的比较分析-以疾病治疗学研究为例[J]. 现代图书情报技术, 2013(6): 30-35.
[10] FISZMAN M, RINDFLESCH T C, KILICOGLU H. Summarizing drug information in Medline citations[C]//AMIA annual symposium proceeding.Washington D.C.: AMIA,2006:254-258.
[11] 赵文清,侯小可. 基于词共现图的中文微博新闻话题识别[J]. 智能系统学报, 2012, 7(5): 444-449.
[12] 胡昌平,陈果. 科技论文关键词特征及其对共词分析的影响[J]. 情报学报, 2014, 33(1): 23-32.
[13] 浦墨,郑彦宁,赵筱媛,等. 基于词共现关系强度和关键词词频的叙词选词方法探究[J]. 图书情报工作, 2013, 57(15): 121-125.
[14] 李佳. 共词矩阵在聚类结果分析中的作用[J]. 中华医学图书情报杂志, 2009(4): 77-80.
[15] UMLS terminology services[EB/OL]. [2016-03-10]. https://uts.nlm.nih.gov/home.html.
[16] FISZMAN M, RINDFLESCH T C, KILICOGLU H. Abstraction summarization for managing the biomedical research literature[C]// Proceedings of HLT-NAACL workshop on computational lexical semantics.Boston:HLT, 2004:76-83.
[17] ZHANG H, FISZMAN M, SHIN D, et al. Clustering cliques for graph-based summarization of the biomedical research literature[J]. BMC bioinformatics, 2013, 14: 182.
[18] HRISTOVSKI D, DINEVSKI D, KASTRIN A, et al. Biomedical question answering using semantic relations[J]. BMC bioinformatics, 2015, 16: 6.
[19] 崔雷. 专题文献高频主题词的共词聚类分析[J]. 情报理论与实践, 1996, 19(4): 49-51.
[20] 钟伟金,李佳. 共词分析法研究(二)-类团分析[J]. 情报杂志, 2008, 27(6): 141-143.