图书情报工作 ›› 2018, Vol. 62 ›› Issue (4): 14-20.DOI: 10.13266/j.issn.0252-3116.2018.04.002

• 专题:中文科研论文未被引探索 • 上一篇    下一篇

中文科研论文未被引探索Ⅱ:基于关键词的内容因素影响研究——以图书馆情报与文献学为例

韩毅, 伍玉, 申东阳, 况书梅, 袁庆   

  1. 西南大学计算机与信息科学学院 重庆 400715
  • 收稿日期:2017-08-31 修回日期:2017-12-02 出版日期:2018-02-20 发布日期:2018-02-20
  • 作者简介:韩毅(ORCID:0000-0001-7021-3229),教授,博士,博士生导师,E-mail:hanyi72@swu.edu.cn;伍玉,硕士研究生;申东阳,硕士研究生;况书梅,硕士研究生;袁庆,硕士研究生.

PartⅡ of the Exploration on Uncited Papers in Chinese:The Influences of Content Features Based on Keywords in Paper - A Case Study of Library and Information Science

Han Yi, Wu Yu, Shen Dongyang, Kuang Shumei, Yuan Qing   

  1. College of Computer and Information Science, Southwest University, Chongqing 400715
  • Received:2017-08-31 Revised:2017-12-02 Online:2018-02-20 Published:2018-02-20

摘要: [目的/意义]从内容差异来探索论文未被引规律,不仅是论文未被引现象研究的重要内容,也有利于扩展基于内容的引文分析方法范畴。[方法/过程]以CSSCI作为来源数据库,以图书馆情报与文献学为样本学科,依据该学科学者的h指数分布特征随机选择200名学者作为样本对象,下载其1998-2015年的所有被收录论文数据;下载样本学科1998-2015年的所有收录论文数据,并离析出对应被引论文、高被引论文的相关数据;以6年为时间窗口,将发表后1-3年内被引的论文定义为被引论文,其余的为未被引论文;析取未被引论文、被引论文、学科整体论文及高被引论文的关键词,按关键词频数从高到低排序,选取排序前50的关键词构成关键词向量,计算关键词向量的内积、欧几里得长度和余弦相似度。[结果/结论]图书馆情报与文献学领域在21世纪初形成较为稳定的研究内容体系,其未被引论文与学科整体论文、被引论文、高被引论文的内容相似度都较低,表明研究内容对论文未被引有重要影响。

关键词: 未被引论文, 零被引论文, 图书馆情报与文献学, 论文内容特征, 论文关键词, 向量空间模型

Abstract: [Purpose/significance] It is of great importance to study the law of uncitednessfrom content differences, which is not only the important content in studying uncitedness phenomena, but helps to expand the boundary of citation content analysis. [Method/process] CSSCI was selected as the source database, and library and information science was chosen as the sample source.According to the features of the h index, 200 scholars were selected randomly as samples, and their related data, recorded in CSSCI from 1998 to 2015, were downloaded. All the collected data of library and information science from 1998 to 2015 were downloaded, and their relevant data about cited papers and highly cited papers were extracted.Taking 6 years as time window, the papers cited in 1 to 3 years were defined as cited papers, and the others as uncited papers.The key words of uncited papers, cited papers, all discipline papers and highly cited papers were taken and listed according to keyword frequencies from high to low, first 50 keywords were selected to be keywords vector, and their inner product, Euclidean length and cosine similarity were calculated respectively. [Result/conclusion] The results have showed that: the research content of library and information science has been probably stable at the beginning of 21st century; the content similarity between uncited papers and all discipline papers, cited papers, and highly cited papers are lower, which means the research content has a significant effect on uncited papers.

Key words: uncited articles, non-cited articles, library and information science, content features in article, keywords in article, vector space model

中图分类号: