图书情报工作 ›› 2020, Vol. 64 ›› Issue (4): 95-102.DOI: 10.13266/j.issn.0252-3116.2020.04.011

• 知识组织 • 上一篇    下一篇

中文文献题录数据机构名称归一化研究

杨昭1, 任娟2,3   

  1. 1. 上海交通大学图书馆 上海 200240;
    2. 上海出版印刷高等专科学校 上海 200093;
    3. 上海出版传媒研究院 上海 200093
  • 收稿日期:2019-06-21 修回日期:2019-10-15 出版日期:2020-02-20 发布日期:2020-02-20
  • 作者简介:杨昭(ORCID:0000-0003-1803-3516),馆员,硕士,E-mail:zhaoyang2017@sjtu.edu.cn;任娟(ORCID:0000-0002-1814-9378),副主任,副教授,博士。

Research on Institution Name Normalization Based on Chinese Bibliographic Data

Yang Zhao1, Ren Juan2,3   

  1. 1. Shanghai Jiao Tong University Library, Shanghai 200240;
    2. Shanghai Publishing and Printing College, Shanghai 200093;
    3. Shanghai Research Institute of Publishing and Media, Shanghai 200093
  • Received:2019-06-21 Revised:2019-10-15 Online:2020-02-20 Published:2020-02-20

摘要: [目的/意义] 大数据时代,机构名称数据呈现海量性、动态性、多样性等新特征,机构名称归一化可改善大数据环境下科研管理、学科评价、学科服务中的数据可靠性,提升基于机构名称的数据检索质量和应用效果。[方法/过程] 从语言学角度和模型构建层面研究机构名称归一化,构建基于共现关系和相似度的机构名称归一化框架模型,提出机构名称实体边界识别方法,编制机构多层级词表,提出机构名称归一化方法,最后选取2008-2018年中文文献题录数据进行实验。[结果/结论] 实验结果验证了模型的有效性,对其他类型机构名称归一化有一定的启发。

关键词: 机构名称, 归一化, 模型构建, 大数据, 实体边界识别

Abstract: [Purpose/significance] In the era of big data, institution name data presents new features such as mass, dynamic and diversity. Normalization of institution name can improve the reliability of data in scientific research management, subject evaluation and subject service under big data environment, and improve the quality and application effect of data retrieval based on institution name.[Method/process] From the perspective of linguistics and model construction, this paper studied name normalization. This paper constructs a Framework Model for Normalization of Institutional Names Based on Co-occurrence Relations and Similarity. Firstly, it proposed a method of identifying the entity boundary of names. Secondly, it compiled a multi-level vocabulary and proposes a normalized method of names. Finally, the Chinese bibliographic data from 2008 to 2018 were selected for experiment.[Result/conclusion] Experiments verify the validity of the model, which has some enlightening significance for the normalization of the names of other types of institutions.

Key words: institution name, normalization, model construction, big data, entity boundary recognition

中图分类号: