图书情报工作 ›› 2019, Vol. 63 ›› Issue (20): 96-103.DOI: 10.13266/j.issn.0252-3116.2019.20.011

• 情报研究 • 上一篇    下一篇

基于成果特征的学者学术专长识别方法

陈翀1, 李楠1, 梁冰2, 王晨琳1, 徐曾旭林1, 郑婷婷1   

  1. 1. 北京师范大学政府管理学院 北京 100875;
    2. 中国科学技术信息研究所 北京 100038
  • 收稿日期:2019-01-25 修回日期:2019-06-03 出版日期:2019-10-20 发布日期:2019-10-20
  • 作者简介:陈翀(ORCID:0000-0002-9704-1575),副教授,博士,E-mail:chenchong@bnu.edu.cn;李楠(ORCID:0000-0002-5724-8926),硕士研究生;梁冰(ORCID:0000-0002-7622-6618),高级工程师,博士;王晨琳(ORCID:0000-0002-0640-9339),本科生;徐曾旭林(ORCID:0000-0002-8181-1168),本科生;郑婷婷(ORCID:0000-0003-4542-7908),硕士研究生。

Identifying Expertise Tags of Scholars by Multiple Features of Academic Publications

Chen Chong1, Li Nan1, Liang Bing2, Wang Chenlin1, Xu Zengxulin1, Zheng Tingting1   

  1. 1. School of Government Management, Beijing Normal University, Beijing 100875;
    2. Institute of Scientific and Technical Information of China, Beijing 100038
  • Received:2019-01-25 Revised:2019-06-03 Online:2019-10-20 Published:2019-10-20

摘要: [目的/意义]基于成果特征标识学者的学术专长是学者画像的重要任务,对学者分类、评审专家遴选、发现小同行等应用具有重要价值。[方法/过程]首先分析揭示学术专长的因素,用层次分析法构造专长标签权重分配模型;采用TextRank和概念链接技术从中英文成果内容中识别主题术语,结合权重筛选出具有领域共识和专长概括性的词汇作为专长标签。选取获得人才称号的多个领域科研人员,从中文或英文代表成果中提取专长标签,以人才公示中的专长领域作为对照基准,通过人工打分和语义计算评测识别效果。[结果/结论]在被贴中文专长标签的学者中,71.9%的个体的专长描述被认为满意。在被贴英文专长标签的学者中,77.2%的个体的专长描述被认为满意。实验表明提出的学者学术专长识别方法具有合理性。主要创新在于:在中英文不同语种以及是否存在外部知识库的条件下,提出从文献内容中挖掘候选标签词的解决方案;结合计量因素,用多种成果特征筛选专长标签,并提出权重分配的方案;针对评价基准欠缺的问题,提出基于语义计算的方式补充答案,从而扩充评价手段。

关键词: 学者画像, 专长标签, 层次分析法, 术语提取, 专长标识评价方法

Abstract: [Purpose/significance] Identifying expertise tags of scholars is the most critical task in scholar profiling. Expertise tags contribute to finding peer experts, clustering domain scholars and selecting reviewers.[Method/process] This study analyzed related factors on the scholar expertise in academic publications, then constructed a hierarchical analysis model on the weight allocation of the factors. The TextRank algorithm has been used to identify topical terms in Chinese corpus, and the conceptual linking technique in English corpus. The extracted terms, together with the previously analyzed factors have been combined to select the expertise tags of the scholars. In this study, a group of honored scholars of different domains have been selected. Their research expertise information from their resumes have been taken as evaluation benchmark. And the expertise tags extracted from their publications have been compared with the benchmark by human judgment and additional semantic similarity judgment.[Result/conclusion] The evaluation shows that the expertise tags of 71.9% scholars are acceptable for Chinese, and 77.2% for English. The experiment proves that the method proposed in this article is pragmatic and may lead to reasonable results. The chief innovation of this study lies in three aspects, Firstly, term extraction approaches that suit to different application conditions have been explored, such as the language of publication and the availability of domain knowledge base. Secondly, multiple features have been combined together to identify the expertise tags of scholars, including the content of publications, the substantial contribution to the publications of the scholars, and the influence to the domain of the publications. Thirdly, a reasonable experimental design and evaluation method is proposed, and the proposed approach has been verified by combining manual scoring and semantic calculation results.

Key words: scholar profiling, expertise tagging, analytic hierarchy process, term extraction, evaluation on expertise tagging

中图分类号: