[目的/意义] 提出一个全新的量化指标——文档主题新颖度,通过自然语言词对方法对文献主题内容的新颖性进行探测研究,并探讨其可行性和优缺点以及新颖度与F1000推荐文献和引文指标之间的关系。[方法/过程] 以F1000为基础,选取hematology主题近一个月内推荐的文献,在Pubmed中查找并获取该推荐文献发表之前6个月内密切相关的文献,构成整个文献集。定义自然语言法新颖度的概念、计算公式并利用Oracle数据库PL/SQL语言进行编程,通过MetaMap软件提取自然语言词汇进行文献主题新颖度的运算。[结果/结论] 自然语言法在文献主题新颖性探测的运算上具有一定的可行性;文档主题新颖度与F1000推荐文献、引用情况并非成等价关系,分属于科技论文评价的不同维度、不同范畴,不可一概而论。应将文档主题新颖度这一新指标与同行评议情况和文献计量学等其他相关论文评价指标结合起来对文献进行综合评价分析,选取优质文献给予推荐。
[Purpose/significance] This study proposes a new quantitative indicator:document theme novelty, through document theme novelty detection research with natural language pairs method, to discuss the feasibility, advantages and disadvantages as well as the novelty, and to explore its relationship among document theme novelty, F1000 recommend literature and citation index.[Method/process] Based on the F1000, this paper selected hematology theme literatures which were recommended nearly a month, then returned to Pubmed to search closely related literatures within six months before the publication of each recommended one to constitute the whole documents. The paper defined the concept of natural language theme novelty and calculation formula, used Oracle database with PL/SQL programming language, and extracted natural language word through MetaMap software for the calculation of the document theme novelty.[Result/conclusion] There is a certain feasibility in the novelty detection of literature theme operation of natural language method. Document theme novelty value, F1000 recommended literature, and citation index don't show the equivalence relation. They belong to different dimensions and different categories of scientific papers assessment, and cannot be treated as the same. It suggests that document theme novelty indicator should combine with peer review, literature metrology index, and other related thesis evaluation indexes for comprehensive evaluation of the literature analysis, to select high quality literature for recommendations.
[1] 邢美凤,过仕明.文本内容新颖性探测研究综述[J].情报科学,2011,29(7):1098-1103.
[2] HARMAN D.Overview of the TREC 2002 novelty track[EB/OL].[2017-01-08].http://trec.nist.gov/pubs/trec11/papers/NOVELTY.OVER.pdf?origin=publication_detail.
[3] ZHANG Y,TSAI F S.Chinese novelty mining[EB/OL].[2017-01-08].http://www.aclweb.org/anthology/D09-1162.
[4] KUMARAN G,ALLAN J.Text classificationand named entities for new event. detection.[EB/OL].[2017-01-08].http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.9552&rep=rep1&type=pdf.
[5] RAJARAMAN K,TAN A H.Topicdetection,tracking,and trend analysis usingself-organizing neural networks[EB/OL].[2017-01-08].http://www3.ntu.edu.sg/home/asahtan/papers/trac_pakdd01.pdf.
[6] Expansion-based technologies in finding relevant and new information:the TREC2002 novelty track experiments[EB/OL].[2017-01-09].http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.143.8780&rep=rep1&type=pdf.
[7] ZHANG H P, SUN J, WANG B, et al.Computation on sentence semantic distance for novelty detection[J].Journal of computer science and technology,2005,20(3):331-337.
[8] TSAI F S,ZHANG Y.D2S:document-to-sentence framework for novelty detection[J].Knowledge & information systems,2011,29(2):419-433.
[9] 沈律.科技创新的一般均衡理论-关于科技成果创新度评价的科学计量学分析[J].科学学研究,2003,21(2):205-209.
[10] 沈阳.一种基于关键词的创新度评价方法[J].情报理论与实践,2007,30(1):125-127.
[11] 胡淑礼,张京辉.度量科技文献新颖性程度的一个数学模型[J].情报理论与实践,1995(5):23-24.
[12] 钱玲飞,杨建林,张莉.基于关键词分析的学科创新力比较——以情报学图书馆学为例[J].情报理论与实践,2011,34(1):117-120.
[13] 杨建林,钱玲飞.基于关键词对逆文档频率的主题新颖度度量方法[J].情报理论与实践,2013,36(3):99-102.
[14] 薛晨.国际大数据研究论文的计量分析[J].现代情报,2013,33(9):129-139.
[15] MetaMap-atool for recognizing UMLS concepts in text[EB/OL].[2015-03-10].http://metamap.nlm.nih.gov/.
[16] 张云秋,冷伏海. MetaMap的文本映射原理及其对检索效果影响的研究[J].情报学报,2007,26(3):344-349.
[17] 百度百科.Oracle数据库.[EB/OL].[2015-03-20]. http://baike.baidu.com/view/1685727.htm.
[18] 百度百科.plsql[EB/OL].[2015-03-20].http://baike.baidu.com/link?url=TOjaqL199OyPA1Gk0UKOtVuqL3kTCzwn1dUsWbl0HB4kFnTroirJHBbnC9q1ICOHYFUoV8tie4IYa3aG_pprpq.
[19] 宋丽萍,王建芳,王树义.科学评价视角下F1000、Mendeley与传统文献计量指标的比较[J].中国图书馆学报,2014,40(7):48-54.
[20] 刘春丽.基于软同行评议的科学论文影响力评价方法-F1000因子[J].中国科技期刊研究,2012,23(2):383-386.
[21] BRODY T,HARNAD S,CARR L.Earlier web usage statistics as predictors of later citation impact[J].Journal of the American Association for Information Science and Technology,2006,57(8):1060-1072.
[22] 任全娥.基于情报学的人文社会科学研究成果创新性测评[J].情报资料工作,2009(2):20-23.
[23] 邱均平.文献信息引证规律和引文分析法[J].情报理论与实践,2001,24(3):236-240.
[24] KUHN T S.The structure of scientific revolutions[M].Chicago:University of Chicago Press,2012.
[25] 科学家分析同行评审有效性[EB/OL].[2015-03-10].http://paper.sciencenet.cn/htmlpaper/201511219413977135306.shtm.
[26] DU J,TANG X L, WU Y S.The effects of research level and article typeon the differences between citationmetrics and f1000 recommendations[J].Journal of the information science and technology,2016,67(12):3008-3021.
[27] SILER K, LEE K,BERO L.Measuring the effectiveness of scientific gatekeeping[J].Proceedings of the national academy of sciences of the United States of America,2015,112(2):360-365.
[28] 宋丽萍,王建芳.基于F1000与WOS的同行评议与文献计量相关性研究[J].中国图书馆学报,2012,38(2):62-69.
[29] 王雯霞,刘春丽.不同学科间论文影响力评价指标模型的差异性研究[J].图书情报工作,2017,61(13):108-116.