[目的/意义] 大数据时代,机构名称数据呈现海量性、动态性、多样性等新特征,机构名称归一化可改善大数据环境下科研管理、学科评价、学科服务中的数据可靠性,提升基于机构名称的数据检索质量和应用效果。[方法/过程] 从语言学角度和模型构建层面研究机构名称归一化,构建基于共现关系和相似度的机构名称归一化框架模型,提出机构名称实体边界识别方法,编制机构多层级词表,提出机构名称归一化方法,最后选取2008-2018年中文文献题录数据进行实验。[结果/结论] 实验结果验证了模型的有效性,对其他类型机构名称归一化有一定的启发。
[Purpose/significance] In the era of big data, institution name data presents new features such as mass, dynamic and diversity. Normalization of institution name can improve the reliability of data in scientific research management, subject evaluation and subject service under big data environment, and improve the quality and application effect of data retrieval based on institution name.[Method/process] From the perspective of linguistics and model construction, this paper studied name normalization. This paper constructs a Framework Model for Normalization of Institutional Names Based on Co-occurrence Relations and Similarity. Firstly, it proposed a method of identifying the entity boundary of names. Secondly, it compiled a multi-level vocabulary and proposes a normalized method of names. Finally, the Chinese bibliographic data from 2008 to 2018 were selected for experiment.[Result/conclusion] Experiments verify the validity of the model, which has some enlightening significance for the normalization of the names of other types of institutions.
[1] 贾君枝,曾建勋,李捷佳,等.科研机构名称归一化实现[J].图书情报工作,2018,62(13):103-110.
[2] 曾建勋,王立学.面向知识评价的规范文档建设方法[J].图书情报工作,2012,56(10):101-106.
[3] 曾建勋,贾君枝.机构名称规范数据的语义模型构建[J].大学图书馆学报,2019,37(1):42-47.
[4] 刘兵.情感分析:挖掘观点、情感和情绪[M].刘康,赵军译.北京:机械工业出版社.2017(7):1.
[5] 沈嘉懿,李芳,徐飞玉,等.中文组织机构名称与简称的识别[J].中文信息学报,2007(6):17-21.
[6] 杨波,杨军威,阎素兰.基于规则的机构名规范化研究[J].现代图书情报技术,2015(6):57-63.
[7] 胡万亭,杨燕,尹红风,等.一种基于词频统计的组织机构名识别方法[J].计算机应用研究,2013,30(7):2014-2016.
[8] 买合木提·买买提,王路路,吐尔根·依布拉音,等.基于条件随机场的维吾尔文机构名识别[J].计算机工程与设计,2019,40(1):273-278.
[9] 杨瑞仙,毛一雷.面向知识评价的我国科研机构命名识别方法研究[J].情报杂志,2015,34(7):179-183.
[10] 杨奕虹,李雅萍,张立丽,等.机构多层级词表的编制及在文献计量评价与科研绩效管理中的应用[J].数字图书馆论坛,2013(6):57-63.
[11] 孙海霞,李军莲,吴英杰.基于K-means的机构归一化研究[J].医学信息学杂志,2013,34(7):41-44+71.
[12] 申德荣,寇月,聂铁铮,等.实体识别技术[M].北京:机械工业出版社.2017(9):45-50.
[13] JARO M A. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida[J].Journal of the American statistical association, 1989,84(406):414-420.
[14] WINKLER W E. String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage[C]//Proceedings of the section on survey research methods, Washington,DC:American statistical association,1990:354-359.