[Purpose/significance] In order to make up for the shortcomings of the patent text collection itself to limit the effect of patent term extraction, this paper proposes to use the rich keyword knowledge to obtain effective features outside the patent text to improve the patent term extraction effect. [Method/process] According to the keyword knowledge of related papers, two kinds of characteristic, degree of domain relevance and degree of head & tail are proposed to measure the possibility that candidate terms become terminology, and these characteristics are incorporated into the traditional method of patent term extraction. [Result/conclusion] The experimental results show that the degree of domain relevance and the degree of head & tail of the candidate terms obtained by using the keyword information of the papers make the method of combining the keyword knowledge of the papers significantly higher than the accuracy of the traditional term extraction method.
[1] FRANTZI K, ANANIADOU S, MIMA H. Automatic recognition of multi-word terms:the C-value/NC-value method[J]. International journal on digital libraries, 2000, 3(2):115-130.
[2] 周霜霜,徐金安, 陈钰枫等. 融合规则与统计的微博新词发现方法[J]. 计算机应用, 2017, 37(4):1044-1050.
[3] HIROYUKI T, TAKAKAYUKI T. A bibliometric analysis of scientific literatures cited by influential patents[J]. Journal of information processing and management, 2006, 49(1):2-10.
[4] 陈红媚. 科技论文关键词选取[J]. 西安石油大学学报(自然科学版), 2011,26(4):109-110.
[5] 李娜, 戎文慧, 边志英. 如何确定关键词[J]. 临床荟萃, 2003, 18(12):674-674.
[6] 覃佳慧, 何耶奇, 叶鹰. 科学论文和技术专利的引用时滞及循环周期研究[J]. 情报理论与实践, 2018, 41(7):23-25.
[7] 曾文, 徐硕, 张运良, 等. 科技文献术语的自动抽取技术研究与分析[J]. 现代图书情报技术, 2014(1):51-55.
[8] SPASIC I, GREENWOOD M, PREECE A, et al. FlexiTerm:a flexible term recognition method[J]. Journal of biomedical semantics, 2013, 27(4):1-15.
[9] 韩红旗, 朱东华, 汪雪锋. 专利技术术语的抽取方法[J]. 情报学报, 2011, 30(12):1280-1285.
[10] 胡阿沛, 张静, 刘俊丽. 基于改进C-value方法的中文术语抽取[J]. 现代图书情报技术, 2013, 230(2):24-29.
[11] 张雷瀚, 吕学强, 李卓,等. 领域本体术语的抽取方法研究[J]. 情报学报, 2014, 33(2):167-174.
[12] 周霜霜, 徐金安, 陈钰枫,等. 融合规则与统计的微博新词发现方法[J]. 计算机应用, 2017, 37(4):1044-1050.
[13] 俞琰, 赵乃瑄. 基于通用词与术语部件的专利术语抽取[J]. 情报学报, 2018, 37(7):742-752.
[14] 丁杰, 吕学强, 刘克会. 基于边界标记集的专利文献术语抽取方法[J]. 计算机工程与科学, 2015, 37(8):1591-1598.
[15] 刘剑, 唐慧丰, 刘伍颖. 一种基于统计技术的中文术语抽取方法[J]. 中国科技术语, 2014, 16(5):10-14.
[16] 杜丽萍, 李晓戈, 于根,等. 基于互信息改进算法的新词发现对中文分词系统改进[J]. 北京大学学报(自然科学版), 2016, 52(1):35-40.
[17] ZHANG W, YOSHIDA T, TANG X, et al. Improving effectiveness of mutual information for substantival multiword expression extraction[J]. Expert systems with applications an international journal, 2009, 36(8):10919-10930.
[18] 木合亚提·尼亚孜别克, 古力沙吾利·塔里甫. 哈萨克语IT领域术语识别研究与实现[J]. 中文信息学报, 2016(3):68-73.
[19] 王昊, 王密平, 苏新宁. 面向本体学习的中文专利术语抽取研究[J]. 情报学报, 2016, 35(6):573-585.
[20] ZENG D, SUN C, LIN L, et al. LSTM-CRF for drug-named entity recognition[J]. Entropy, 2017, 19(6):283-295.
[21] CONRADO M, PARDO T, REZENDE S. A machine learning approach to automatic term extraction using a rich feature set[C]//The 2013 conference of the north American chapter of the association for computational Linguistics:human language technologies. Atlanta, Geoogia:Association for Computational Linguistics, 2013:16-23.
[22] BHATTACHARYA S, KRETSCHMER H, MEYER M. Characterizing intellectual spaces between science and technology[J]. Scientometrics, 2003, 58(2):369-390.
[23] NARIN F, NOMA E. Is technology becoming science?[J]. Scientometrics, 1985, 7(3):369-381.
[24] NARIN F, HAMILTON K S, OLIVASTRO D. The increasing linkage between U.S. technology and public science[J]. Research policy, 1997, 26(3):317-330.
[25] MAGERMAN T, LOOY B V, SONG X. Exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and scientific publications[J]. Scientometrics, 2010, 82(2):289-306.
[26] QI Y, ZHU N, ZHAI Y, et al. The mutually beneficial relationship of patents and scientific literature:topic evolution in nanoscience[J]. Scientometrics, 2018, 115(1):893-911.
[27] HUANG M H, YANG H W, CHEN D Z. Increasing science and technology linkage in fuel cells:a cross citation analysis of papers and patents[J]. Journal of informetrics, 2015, 9(2):237-249.
[28] 吴菲菲, 黄鲁成, 石媛嫄. 基于文献和专利相互引用的科学与技术关系分析[J]. 科学学与科学技术管理, 2013, 34(10):13-20.
[29] 彭彦淇, 覃佳慧, 叶鹰. 石墨烯研究中专利与论文的交叉引用分析[J]. 情报理论与实践, 2018, 41(7):18-21.
[30] 黄鲁成, 王静静, 李欣,等. 基于论文和专利的钙钛矿太阳能电池的技术机会分析[J]. 情报学报, 2016, 35(7):686-695.
[31] 陈二静, 姜恩波. 文本相似度计算方法研究综述[J]. 数据分析与知识发现, 2017, 6(6):1-11.