Research on the Distribution Characteristics of New Terminology for the Update of the Thesaurus

  • Lei Xiao ,
  • Chang Chun ,
  • Liu Wei
Expand
  • Institute of Scientific and Technical Information of China, Beijing 100038

Received date: 2018-12-03

  Revised date: 2019-05-15

  Online published: 2019-10-20

Abstract

[Purpose/significance] In order to enhance the practicability of thesaurus, it is necessary to constantly update new terms in the field to thesaurus. In the process of updating and maintenance, we should explore the distribution characteristics of new terms from the perspective of time and frequency, which can provide reference for the method of discovering new terms.[Method/process] Based on the relevant characteristics of the new terminology, combined with the development distribution of the corresponding document frequency at time point and period, through the relevant statistical analysis, the distribution of terminologies in different development periods is studied, especially the characteristics of terminologies from the beginning to the maturity.[Result/conclusion] It is proved that the new terminology is generally in the growth stage of terminology. When the candidate new terminology keeps positive growth trend for more than a certain number of years, it is considered that the term has all novelty, time persistence and terminological features. Based on the distribution characteristics, the article selects a subject area to discover its new terminology. According to the judgment of the expert, the method has a high accuracy rate in the judgment of new term, and can effectively identify the low frequency words which are more occupied in practical applications.

Cite this article

Lei Xiao , Chang Chun , Liu Wei . Research on the Distribution Characteristics of New Terminology for the Update of the Thesaurus[J]. Library and Information Service, 2019 , 63(20) : 121 -128 . DOI: 10.13266/j.issn.0252-3116.2019.20.014

References

[1] 冷伏海,徐跃权,冯璐. 信息组织概论:第2版[M]. 北京:科学出版社,2008:197.
[2] 周晓英,曾建勋. 主题词表的社会应用研究[J]. 数字图书馆论坛,2014(10):2-6.
[3] 常春. 网络环境下叙词表编制与发展[M]. 北京:科学技术文献出版社,2015:101-103.
[4] 中国科学技术信息研究所.《汉语主题词表》服务系统[EB/OL].[2018-11-20]. https://ct.istic.ac.cn/site/organize/word.
[5] 侯丽,李姣,侯震,等. 基于混合策略的公众健康领域新词识别方法研究[J]. 图书情报工作, 2015,59(23):115-123.
[6] 苟恩东,李晟. 采用术语定义模式和多特征的新术语及定义识别方法[J]. 计算机研究与发展,2009,46(1):62-68.
[7] 邢恩军,赵富强. 基于上下文词频词汇量指标的新词发现方法[J]. 计算机应用与软件,2016, 33(6):64-67.
[8] 刘辉,刘耀. 基于条件随机场的专利术语抽取[J]. 数字图书馆论坛,2014(12):46-49.
[9] FRANTZI K, ANANIADOU S, MIMA H. Automatic recognition of multi-word terms:the C-value/NC-value method[J]. International journal on digital libraries,2000,3(2):115-130.
[10] 熊李艳,谭龙,钟茂生. 基于有效词频的改进C-value自动术语抽取方法[J]. 现代图书情报技术,2013,29(9):54-59.
[11] 胡阿沛,张静,刘俊丽. 基于改进C-value方法的中文术语抽取[J]. 现代图书情报技术,2013, 29(2):24-29.
[12] 韩红旗,安小米. C-value值和Unithood指标结合的中文科技术语抽取[J]. 图书情报工作, 2012,56(19):85-89.
[13] WANG M,LIN L,WANG F. New word identification in social network text based on time series information[C]//IEEE. International conference on computer supported cooperative work in design. New York:IEEE,2014:552-557.
[14] 黄轩,李熔烽. 博客语料的新词发现方法[J]. 现代电子技术,2013,36(2):144-146.
[15] 邹纲,刘洋,刘群,等. 面向Internet的中文新词语检测[J]. 中文信息学报,2004, 18(6):1-9.
[16] 吴悦,燕鹏举,翟鲁峰. 基于二元背景模型的新词发现[J]. 清华大学学报(自然科学版), 2011, 51(9):1317-1320.
[17] 苏其龙. 微博新词发现研究[D]. 哈尔滨:哈尔滨工业大学, 2013:43.
[18] LIU W, SU J, LEI X, et al. Graph-based equivalence concept matching in knowledge organization system integration:a case study on thesaurus[C]//IEEE. International conference on natural computation, fuzzy systems and knowledge discovery. New York:IEEE,2018:839-844.
[19] 高永伟. 近20年英语国家对新词的研究[J]. 外语与外语教学,1998(11):9-11.
[20] 亢世勇. 新词语大词典[M]. 上海:上海辞书出版社,2003.
[21] 刘长征. 新词语的生命力[J]. 北华大学学报(社会科学版),2012,13(5):4-8.
[22] 常春,杨婧. 基于生物种群增长规律的概念词频变化特征研究[J]. 情报科学, 2018,36(10):128-132.
[23] 中国科学技术信息研究所. 汉语主题词表:工程技术卷(第二分册)[M]. 北京:科学技术文献出版社, 2014:425-426.
[24] 中国科学技术信息研究所. 汉语主题词表:自然科学增订版(第三分册)[M]. 北京:科学技术文献出版社, 1991:353.
Outlines

/