基于成果特征的学者学术专长识别方法

陈翀; 李楠; 梁冰; 王晨琳; 徐曾旭林; 郑婷婷

doi:10.13266/j.issn.0252-3116.2019.20.011

图书情报工作 >

2019 , Vol. 63 >Issue 20: 96 - 103

DOI: https://doi.org/10.13266/j.issn.0252-3116.2019.20.011

情报研究

基于成果特征的学者学术专长识别方法

陈翀 ,
李楠 ,
梁冰 ,
王晨琳 ,
徐曾旭林 ,
郑婷婷

展开

1. 北京师范大学政府管理学院北京 100875;
2. 中国科学技术信息研究所北京 100038

陈翀(ORCID:0000-0002-9704-1575),副教授,博士,E-mail:chenchong@bnu.edu.cn;李楠(ORCID:0000-0002-5724-8926),硕士研究生;梁冰(ORCID:0000-0002-7622-6618),高级工程师,博士;王晨琳(ORCID:0000-0002-0640-9339),本科生;徐曾旭林(ORCID:0000-0002-8181-1168),本科生;郑婷婷(ORCID:0000-0003-4542-7908),硕士研究生。

收稿日期: 2019-01-25

修回日期: 2019-06-03

网络出版日期: 2019-10-20

收起

Identifying Expertise Tags of Scholars by Multiple Features of Academic Publications

Chen Chong ,
Li Nan ,
Liang Bing ,
Wang Chenlin ,
Xu Zengxulin ,
Zheng Tingting

Expand

1. School of Government Management, Beijing Normal University, Beijing 100875;
2. Institute of Scientific and Technical Information of China, Beijing 100038

Received date: 2019-01-25

Revised date: 2019-06-03

Online published: 2019-10-20

Fold

摘要

[目的/意义]基于成果特征标识学者的学术专长是学者画像的重要任务，对学者分类、评审专家遴选、发现小同行等应用具有重要价值。[方法/过程]首先分析揭示学术专长的因素，用层次分析法构造专长标签权重分配模型；采用TextRank和概念链接技术从中英文成果内容中识别主题术语，结合权重筛选出具有领域共识和专长概括性的词汇作为专长标签。选取获得人才称号的多个领域科研人员，从中文或英文代表成果中提取专长标签，以人才公示中的专长领域作为对照基准，通过人工打分和语义计算评测识别效果。[结果/结论]在被贴中文专长标签的学者中，71.9%的个体的专长描述被认为满意。在被贴英文专长标签的学者中，77.2%的个体的专长描述被认为满意。实验表明提出的学者学术专长识别方法具有合理性。主要创新在于：在中英文不同语种以及是否存在外部知识库的条件下，提出从文献内容中挖掘候选标签词的解决方案；结合计量因素，用多种成果特征筛选专长标签，并提出权重分配的方案；针对评价基准欠缺的问题，提出基于语义计算的方式补充答案，从而扩充评价手段。

关键词： 学者画像; 专长标签; 层次分析法; 术语提取; 专长标识评价方法

本文引用格式

陈翀 , 李楠 , 梁冰 , 王晨琳 , 徐曾旭林 , 郑婷婷 . 基于成果特征的学者学术专长识别方法[J]. 图书情报工作, 2019 , 63(20) : 96 -103 . DOI: 10.13266/j.issn.0252-3116.2019.20.011

Abstract

[Purpose/significance] Identifying expertise tags of scholars is the most critical task in scholar profiling. Expertise tags contribute to finding peer experts, clustering domain scholars and selecting reviewers.[Method/process] This study analyzed related factors on the scholar expertise in academic publications, then constructed a hierarchical analysis model on the weight allocation of the factors. The TextRank algorithm has been used to identify topical terms in Chinese corpus, and the conceptual linking technique in English corpus. The extracted terms, together with the previously analyzed factors have been combined to select the expertise tags of the scholars. In this study, a group of honored scholars of different domains have been selected. Their research expertise information from their resumes have been taken as evaluation benchmark. And the expertise tags extracted from their publications have been compared with the benchmark by human judgment and additional semantic similarity judgment.[Result/conclusion] The evaluation shows that the expertise tags of 71.9% scholars are acceptable for Chinese, and 77.2% for English. The experiment proves that the method proposed in this article is pragmatic and may lead to reasonable results. The chief innovation of this study lies in three aspects, Firstly, term extraction approaches that suit to different application conditions have been explored, such as the language of publication and the availability of domain knowledge base. Secondly, multiple features have been combined together to identify the expertise tags of scholars, including the content of publications, the substantial contribution to the publications of the scholars, and the influence to the domain of the publications. Thirdly, a reasonable experimental design and evaluation method is proposed, and the proposed approach has been verified by combining manual scoring and semantic calculation results.

Key words： scholar profiling; expertise tagging; analytic hierarchy process; term extraction; evaluation on expertise tagging

参考文献

[1] 袁莎,唐杰,顾晓韬.开放互联网中的学者画像技术综述[J].计算机研究与发展,2018,55(9):1903-1919.
[2] 朱伟珠,李春发.基于概念知识网络的"小同行"评议专家遴选方法实证研究[J].情报杂志,2017,36(7):78-83,88.
[3] 赵丽莹,冯树民,刘彤,等.如何选择"小同行"审稿专家[J].编辑学报,2007,19(1):75.
[4] 程薛柯.科技项目小同行评审专家识别研究[D].北京:中国科学技术信息研究所,2016.
[5] TANG J, YAO L, ZHANG D, et al. A combination approach to web user profiling[J]. ACM transactions on knowledge discovery from data (TKDD), New York:ACM, 2010,5(1):1-44.
[6] 胡媛,毛宁.基于用户画像的数字图书馆知识社区用户模型构建[J].图书馆理论与实践, 2017(4):82-85.
[7] 巩军,刘鲁.基于个人知识地图的专家推荐[J].管理学报,2011,8(9):1365-1371.
[8] TANG J, ZHANG D, YAO L. Social network extraction of academic researchers[C]//Seventh IEEE international conference on data mining. Piscataway:IEEE, 2007:292-301.
[9] YAN M, YU Z, ZHANG Y, et al. An expert recommendation approach combining project correlation and professional ability[C]//International conference on fuzzy systems and knowledge discovery. Piscataway:IEEE, 2015:1220-1224.
[10] 张思凤,梁梦丽,曹高辉.基于引文的科技文献主题抽取研究[J].情报理论与实践,2017, 40(6):122-127.
[11] 邓启明,王景辉.关键词标引中常见问题与分析[J].科技与出版,1999(2):36.
[12] 王思哲.我国学术期刊关键词标引质量探析[J].延安大学学报(社会科学版), 2001,23(3):97-99.
[13] 刘晓豫,朱东华,汪雪锋,等.多专长专家识别方法研究——以大数据领域为例[J].图书情报工作,2018,62(3):55-63.
[14] 任海英,王德营,王菲菲.主题词组合新颖性与论文学术影响力的关系研究[J].图书情报工作,2017,61(9):87-93.
[15] 赵英环,郭贵锁.基于主题词迭代提取的信息检索算法[J].华南理工大学学报(自然科学版),2004(S1):77-80.
[16] 俞征鹿,贾佳.中国科技论文合著情况分析[J].全球科技经济瞭望, 2017,32(Z1):92-100.
[17] 邹鼎杰.图情学4种两栖类核心期刊合著现象分析[J].农业图书情报学刊, 2016,28(3):61-64.
[18] 崔林蔚,陆颖.基于作者署名排序的作者贡献要素分析——以《图书情报工作》2015-2016年作者贡献声明为例[J].图书情报工作,2017,61(9):80-86.
[19] 左菊.科研合著中署名顺序与作者贡献研究[D].重庆:西南大学,2014.
[20] 贾贤,王霞,李忠富,等.科技论文中等同贡献作者和共同通讯作者的署名问题[J].中国科技期刊研究,2012,23(4):603-605.
[21] 李宗红.利用发文量和被引量综合测评期刊核心著者——以《农业图书情报学刊》为例[J].农业图书情报学刊,2007,19(10):161-163.
[22] 李品,周金元,杨国立.基于CSSCI的《情报学报》载文被引分析及研究[J].图书情报研究,2009,2(2):48-52.
[23] 谢瑞霞,李秀霞,韩霞,等.基于加权被引频次与署名顺序的作者影响力评价指标构建[J].情报科学,2018,36(8):90-93,111.
[24] MIHALCEA R, TARAU P. TextRank:bringing order into texts[C]//Conference on empirical methods in natural language processing. Stroudsburg:ACL, 2004:404-411.
[25] 夏天.词语位置加权TextRank的关键词抽取研究[J].现代图书情报技术, 2013,29(9):30-34.
[26] LIU Z, HUANG W, ZHENG Y, et al. Automatic keyphrase extraction via topic decomposition[C]//Conference on empirical methods in natural language processing. Stroudsburg:ACL, 2010:366-376.
[27] 刘萍,周梦欢.基于共词网络的专家专长挖掘[J].情报科学,2012,30(12):1815-1819.
[28] TSAI T M, SHIH C C, PENG T C, et al. Explore the possibility of utilizing blog semantic analysis for domain expert and opinion mining[C]//Conference on intelligent networking and collaborative systems. Piscataway:IEEE, 2009:241-244.
[29] 张晓娟,陆伟,程齐凯. PLSA在图情领域专家专长识别中的应用[J].现代图书情报技术, 2012,28(2):76-81.
[30] 杜雨萌,张伟男,刘挺.基于主题增强卷积神经网络的用户兴趣识别[J].计算机研究与发展,2018,55(1):188-197.
[31] 宁建飞,刘降珍.融合Word2vec与TextRank的关键词抽取研究[J].现代图书情报技术, 2016,32(6):20-27.
[32] 陆伟,刘杰,秦喜艳.基于专长词表的图情领域专家检索与评价[J].中国图书馆学报, 2010,36(2):70-76.
[33] 胡月红,刘萍.基于本体概念的专长表示研究[J].图书情报工作,2012,56(4):17-21,40.
[34] 陆伟,武川.实体链接研究综述[J].情报学报,2015,34(1):105-112.
[35] MIHALCEA R. Wikify!:linking documents to encyclopedic knowledge[C]//Conference on information and knowledge management. New York:ACM, 2007:233-242.
[36] 罗鹏程.基于概念层次的文档组织方法研究[D].北京:北京师范大学,2014.
[37] FERRAGINA P, SCAIELLA U. Tagme:on-the-fly annotation of short text fragments (by wikipedia entities)[C]//Conference on information and knowledge management. New York:ACM, 2010:1625-1628.
[38] TSAI C, ROTH D. Illinois cross-lingual wikifier:grounding entities in many languages to the English wikipedia[C]//International conference on computational linguistics. New York:ICCL, 2016:146-150.
[39] 毛进,李纲.一种基于OKM的研究领域专家图谱构建方法[J].图书情报工作, 2014,58(14):34-40.
[40] 范晓玉,窦永香,赵捧未,等.融合多源数据的科研人员画像构建方法研究[J].图书情报工作,2018,62(15):31-40.
[41] 杜建,张玢,唐小利.作者学术影响力双重测度探讨:引用影响力和合作影响力之整合[J].情报学报,2014,33(4):388-395.
[42] 潘启树,吴冲,程建霞.基于模糊AHP理论的科学论文学术价值评审研究[J].编辑学报,2001,13(1):16-18.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献