[Purpose/significance] Identifying expertise tags of scholars is the most critical task in scholar profiling. Expertise tags contribute to finding peer experts, clustering domain scholars and selecting reviewers.[Method/process] This study analyzed related factors on the scholar expertise in academic publications, then constructed a hierarchical analysis model on the weight allocation of the factors. The TextRank algorithm has been used to identify topical terms in Chinese corpus, and the conceptual linking technique in English corpus. The extracted terms, together with the previously analyzed factors have been combined to select the expertise tags of the scholars. In this study, a group of honored scholars of different domains have been selected. Their research expertise information from their resumes have been taken as evaluation benchmark. And the expertise tags extracted from their publications have been compared with the benchmark by human judgment and additional semantic similarity judgment.[Result/conclusion] The evaluation shows that the expertise tags of 71.9% scholars are acceptable for Chinese, and 77.2% for English. The experiment proves that the method proposed in this article is pragmatic and may lead to reasonable results. The chief innovation of this study lies in three aspects, Firstly, term extraction approaches that suit to different application conditions have been explored, such as the language of publication and the availability of domain knowledge base. Secondly, multiple features have been combined together to identify the expertise tags of scholars, including the content of publications, the substantial contribution to the publications of the scholars, and the influence to the domain of the publications. Thirdly, a reasonable experimental design and evaluation method is proposed, and the proposed approach has been verified by combining manual scoring and semantic calculation results.
[1] 袁莎,唐杰,顾晓韬.开放互联网中的学者画像技术综述[J].计算机研究与发展,2018,55(9):1903-1919.
[2] 朱伟珠,李春发.基于概念知识网络的"小同行"评议专家遴选方法实证研究[J].情报杂志,2017,36(7):78-83,88.
[3] 赵丽莹,冯树民,刘彤,等.如何选择"小同行"审稿专家[J].编辑学报,2007,19(1):75.
[4] 程薛柯.科技项目小同行评审专家识别研究[D].北京:中国科学技术信息研究所,2016.
[5] TANG J, YAO L, ZHANG D, et al. A combination approach to web user profiling[J]. ACM transactions on knowledge discovery from data (TKDD), New York:ACM, 2010,5(1):1-44.
[6] 胡媛,毛宁.基于用户画像的数字图书馆知识社区用户模型构建[J].图书馆理论与实践, 2017(4):82-85.
[7] 巩军,刘鲁.基于个人知识地图的专家推荐[J].管理学报,2011,8(9):1365-1371.
[8] TANG J, ZHANG D, YAO L. Social network extraction of academic researchers[C]//Seventh IEEE international conference on data mining. Piscataway:IEEE, 2007:292-301.
[9] YAN M, YU Z, ZHANG Y, et al. An expert recommendation approach combining project correlation and professional ability[C]//International conference on fuzzy systems and knowledge discovery. Piscataway:IEEE, 2015:1220-1224.
[10] 张思凤,梁梦丽,曹高辉.基于引文的科技文献主题抽取研究[J].情报理论与实践,2017, 40(6):122-127.
[11] 邓启明,王景辉.关键词标引中常见问题与分析[J].科技与出版,1999(2):36.
[12] 王思哲.我国学术期刊关键词标引质量探析[J].延安大学学报(社会科学版), 2001,23(3):97-99.
[13] 刘晓豫,朱东华,汪雪锋,等.多专长专家识别方法研究——以大数据领域为例[J].图书情报工作,2018,62(3):55-63.
[14] 任海英,王德营,王菲菲.主题词组合新颖性与论文学术影响力的关系研究[J].图书情报工作,2017,61(9):87-93.
[15] 赵英环,郭贵锁.基于主题词迭代提取的信息检索算法[J].华南理工大学学报(自然科学版),2004(S1):77-80.
[16] 俞征鹿,贾佳.中国科技论文合著情况分析[J].全球科技经济瞭望, 2017,32(Z1):92-100.
[17] 邹鼎杰.图情学4种两栖类核心期刊合著现象分析[J].农业图书情报学刊, 2016,28(3):61-64.
[18] 崔林蔚,陆颖.基于作者署名排序的作者贡献要素分析——以《图书情报工作》2015-2016年作者贡献声明为例[J].图书情报工作,2017,61(9):80-86.
[19] 左菊.科研合著中署名顺序与作者贡献研究[D].重庆:西南大学,2014.
[20] 贾贤,王霞,李忠富,等.科技论文中等同贡献作者和共同通讯作者的署名问题[J].中国科技期刊研究,2012,23(4):603-605.
[21] 李宗红.利用发文量和被引量综合测评期刊核心著者——以《农业图书情报学刊》为例[J].农业图书情报学刊,2007,19(10):161-163.
[22] 李品,周金元,杨国立.基于CSSCI的《情报学报》载文被引分析及研究[J].图书情报研究,2009,2(2):48-52.
[23] 谢瑞霞,李秀霞,韩霞,等.基于加权被引频次与署名顺序的作者影响力评价指标构建[J].情报科学,2018,36(8):90-93,111.
[24] MIHALCEA R, TARAU P. TextRank:bringing order into texts[C]//Conference on empirical methods in natural language processing. Stroudsburg:ACL, 2004:404-411.
[25] 夏天.词语位置加权TextRank的关键词抽取研究[J].现代图书情报技术, 2013,29(9):30-34.
[26] LIU Z, HUANG W, ZHENG Y, et al. Automatic keyphrase extraction via topic decomposition[C]//Conference on empirical methods in natural language processing. Stroudsburg:ACL, 2010:366-376.
[27] 刘萍,周梦欢.基于共词网络的专家专长挖掘[J].情报科学,2012,30(12):1815-1819.
[28] TSAI T M, SHIH C C, PENG T C, et al. Explore the possibility of utilizing blog semantic analysis for domain expert and opinion mining[C]//Conference on intelligent networking and collaborative systems. Piscataway:IEEE, 2009:241-244.
[29] 张晓娟,陆伟,程齐凯. PLSA在图情领域专家专长识别中的应用[J].现代图书情报技术, 2012,28(2):76-81.
[30] 杜雨萌,张伟男,刘挺.基于主题增强卷积神经网络的用户兴趣识别[J].计算机研究与发展,2018,55(1):188-197.
[31] 宁建飞,刘降珍.融合Word2vec与TextRank的关键词抽取研究[J].现代图书情报技术, 2016,32(6):20-27.
[32] 陆伟,刘杰,秦喜艳.基于专长词表的图情领域专家检索与评价[J].中国图书馆学报, 2010,36(2):70-76.
[33] 胡月红,刘萍.基于本体概念的专长表示研究[J].图书情报工作,2012,56(4):17-21,40.
[34] 陆伟,武川.实体链接研究综述[J].情报学报,2015,34(1):105-112.
[35] MIHALCEA R. Wikify!:linking documents to encyclopedic knowledge[C]//Conference on information and knowledge management. New York:ACM, 2007:233-242.
[36] 罗鹏程.基于概念层次的文档组织方法研究[D].北京:北京师范大学,2014.
[37] FERRAGINA P, SCAIELLA U. Tagme:on-the-fly annotation of short text fragments (by wikipedia entities)[C]//Conference on information and knowledge management. New York:ACM, 2010:1625-1628.
[38] TSAI C, ROTH D. Illinois cross-lingual wikifier:grounding entities in many languages to the English wikipedia[C]//International conference on computational linguistics. New York:ICCL, 2016:146-150.
[39] 毛进,李纲.一种基于OKM的研究领域专家图谱构建方法[J].图书情报工作, 2014,58(14):34-40.
[40] 范晓玉,窦永香,赵捧未,等.融合多源数据的科研人员画像构建方法研究[J].图书情报工作,2018,62(15):31-40.
[41] 杜建,张玢,唐小利.作者学术影响力双重测度探讨:引用影响力和合作影响力之整合[J].情报学报,2014,33(4):388-395.
[42] 潘启树,吴冲,程建霞.基于模糊AHP理论的科学论文学术价值评审研究[J].编辑学报,2001,13(1):16-18.