图书情报工作 ›› 2012, Vol. 56 ›› Issue (04): 6-11.

• 专题 • 上一篇    下一篇

基于关键词的科技文献聚类研究

刘勘,周丽红,陈譞   

  1. 中南财经政法大学信息与安全工程学院
  • 收稿日期:2011-10-25 修回日期:2011-09-06 出版日期:2012-02-20 发布日期:2012-02-20
  • 通讯作者: 刘勘

A New Clustering Algorithm for Scientific Literature Based on Keywords

Liu Kan ,Zhou Lihong ,Chen Xuan   

  1. Information and Safety Engineering School, Zhongnan University of Economics and Law,
  • Received:2011-10-25 Revised:2011-09-06 Online:2012-02-20 Published:2012-02-20
  • Contact: Liu Kan

摘要:

描述一种基于改进TFIDF特征词加权算法的科技文献聚类方法:首先提取科技文献的特征词;然后根据特征词的词频、所在位置和词性为特征词加权,建立科技文献的向量空间模型;接着使用基于密度的聚类算法对科技文献向量空间模型数据进行聚类分析;最后使用主成分分析法对科技文献聚类的结果进行标识,利用Fmeasure方法对聚类结果进行评价。实验表明,用提出的科技文献聚类方法能够从所检索的科技文献中发现热点研究领域,并能识别具有学科融合性质的研究方向。

 

关键词: 科技文献, 文本挖掘, 聚类

Abstract:

This paper describes a new clustering algorithm for scientific literature based on an improved TF-IDF weighted algorithm for feature words. Firstly, the authors extract feature words from the sets of literature. Then, they weight the feature words with their frequency, places in literature, parts of speech and establish the vector space model. After that, they cluster the data of VSM by the clustering algorithm based on density. Finally, they label the cluster by using the method of principal component analysis and evaluate the cluster by using Fmeasure method. Experiments show that: the clustering algorithm for scientific literature can find some fields of disciplinary research and discover a few fields of research with interdisciplinary.

Key words: scientific literature, text mining, clustering