基于样本加权的引文网络的社团划分

肖雪; 王钊伟; 陈云伟; 邓勇

doi:10.13266/j.issn.0252-3116.2016.20.011

图书情报工作 >

2016 , Vol. 60 >Issue 20: 86 - 93

DOI: https://doi.org/10.13266/j.issn.0252-3116.2016.20.011

情报研究

基于样本加权的引文网络的社团划分

肖雪 ,
王钊伟 ,
陈云伟 ,
邓勇

展开

1. 中国科学院大学北京 100049;
2. 中国科学院计算技术研究所北京 100190;
3. 中国科学院成都文献情报中心成都 610041

肖雪(ORCID:0000-0002-7010-6084),硕士研究生,E-mail:xiaoxue@mail.las.ac.cn;王钊伟(ORCID:0000-0001-6279-7172),硕士研究生;陈云伟(ORCID:0000-0002-6597-7416),副研究员,博士;邓勇(ORCID:0000-0001-9179-0500),研究员。

收稿日期: 2016-05-16

修回日期: 2016-08-26

网络出版日期: 2016-10-20

基金资助

本文系国家高技术研究发展计划（“863”计划）“微生物数字资源知识管理系统构建及关键技术研究”（项目编号：2014AA021503）和中国科学院2013年度“西部之光”人才培养计划“引文耦合网络演化分析及在科技评价与预测中的应用研究”（项目编号：科发人字165号（3-6））研究成果之一。

收起

Community Detection Algorithm Based on Sample Weighting

Xiao Xue ,
Wang Zhaowei ,
Chen Yunwei ,
Deng Yong

Expand

1. University of Chinese Academy of Sciences, Beijing 100049;
2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190;
3. Chengdu Library of Chinese Academy of Sciences, Chengdu 610041

Received date: 2016-05-16

Revised date: 2016-08-26

Online published: 2016-10-20

Fold

摘要

[目的/意义] 为提高引文网络的社团划分的准确性，提出一种基于加权的引文网络的社团划分方法。[方法/过程] 以Louvain社团划分方法为算法基础，将科学论文用向量空间模型表示，利用改进的余弦相似度方法计算相邻论文之间的相似度，并将其作为权重，综合考虑论文内容属性与结构属性，提出一种基于样本加权的引文网络社团划分方法。[结果/结论] 该算法将引文网络中论文的文本内容属性与拓扑结构属性结合起来，通过对Scientometrics期刊发表的论文以及主题为CRISPR的论文进行社团划分研究实验，结果表明该方法能改善引文网络社团的划分效果。

关键词： 引文网络; 社团划分; 聚类; 文本挖掘

本文引用格式

肖雪 , 王钊伟 , 陈云伟 , 邓勇 . 基于样本加权的引文网络的社团划分[J]. 图书情报工作, 2016 , 60(20) : 86 -93 . DOI: 10.13266/j.issn.0252-3116.2016.20.011

Abstract

[Purpose/significance] The study of community discovery has great value for text mining. In order to improve the accuracy of the communities of citation networks, this paper describes a new community discovering algorithm for literature based on weighted networks. [Method/process] The algorithm was based on the "Louvain community detecting algorithm", and established the vector space model to calculate the similarity of the adjacent papers as the weight of the link. Finally, based on the weighted network, the authors detected the community structure of the network. [Result/conclusion] Experiments show that the proposed algorithm is an effective solution to improve the performance of community detection.

Key words： citation network; community discovery; clustering; text mining

参考文献

[1] ROUSSEAU R. Concentration and diversity of availability and use in information systems:a positive reinforcement model[J]. Journal of the American Society for Information Science,1992,43(5):391-395.
[2] KERNIGHAN B W, LIN S. An efficient heuristic procedure for partitioning graphs[J]. Bell system technical journal, 1970, 49(2):291-307.
[3] BARNES E R. An algorithm for partitioning the nodes of a graph[J]. SIAM journal on algebraic discrete methods, 1982, 3(4):541-550.
[4] PAN J Y, YANG H J, Faloutsos C, et al. Automatic Multimedia Cross-modal Correlation Discovery[J]. Kdd, 2004:653-658.
[5] NEWMAN M E J. Fast algorithm for detecting community structure in networks[J]. Physical review e statistical nonlinear & soft matter physics, 2004, 69(6):066133.
[6] GIRVAN M, NEWMAN M E. Community structure in social and biological networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(12):7821-7826.
[7] NEWMAN M E J. Detecting community structure in networks[J]. The European physical journal b-condensed matter and complex systems, 2004, 38(2):321-330.
[8] WANJANTUK P, KEANE P. Finding related documents via communities in the citation graph[J]. Communications and information technology,2004(1):445-450.
[9] SHIBATA N, KAJIKAWA Y, TAKEDA Y,et al. Detecting emerging research fronts based on topological measures in citation networks of scientic publications[J]. Technovation,2008, 28(11):758-775.
[10] TIAN Y, HANKINS R A, PATEL J M. Efficient aggregation for graph summarization[C]//ACM SIGMOD International Conference on Management of Data. Vancouver, 2008:567-580.
[11] CHENG H, ZHOU Y, YU J X. Clustering large attributed graphs:a balance between structural and attribute similarities[J].ACM transactions on knowledge discovery from data,2011,5(2):1-33.
[12] LIU Z,YU J X,CHENG H. Approximate homogeneous graph summarization[J]. Jonrnal of information processing, 2012,20(1):77-88.
[13] 张佳玉.基于节点相似度的社团发现算法研究[D].马鞍山:安徽工业大学,2014.
[14] 章成志, 师庆辉, 薛德军. 基于样本加权的文本聚类算法研究[J]. 情报学报, 2008, 27(1):42-48.
[15] LI J, GAO X, JIAO L. A novel typical-sample-weighted clustering algorithm for large data sets[C]//International conference on computational intelligence and security.Springer,2005:696-703.
[16] 刘勘, 周丽红, 陈譞. 基于关键词的科技文献聚类研究[J]. 图书情报工作, 2012, 56(4):6-11.
[17] ARENAS A, DUCH J, FERNANDEZ,et al. Size reduction of complex networks preserving modularity[J]. New journal of physics,2007,9(26):176.
[18] CLAUSET A, NEWMAN M E, MOORE C. Finding community structure in very large networks[J]. Physical review e statistical nonlinear & soft matter physics, 2004, 70(6):264-277.
[19] BLONDEL V D, Guillaume J L, Lambiotte R, et al. Fast unfolding of communities in large networks[J]. Journal of statistical mechanics:theory and experiment, 2008(10):155-168.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献