Research on the Selection of Chinese Patent Candidate Term Based on Dependency Syntax Parsing

  • Yu Yan ,
  • Chen lei ,
  • Jiang Jinde ,
  • Zhao Naixuan
Expand
  • 1. Information Service Department, Nanjing Tech University, Nanjing 210009;
    2. Computer Science Department, Chengxian College, Southeast University, Nanjing 211816;
    3. School of Business, Nanjing Xiaozhuang University, Nanjing 211171

Received date: 2019-01-22

  Revised date: 2019-04-14

  Online published: 2019-09-20

Abstract

[Purpose/significance] Aiming at the difficulties in making different pattern matching rules for different data sets and the low accuracy of Chinese patent term extraction, this paper proposes a selection method of Chinese patent candidate term based on dependency syntax parsing to improve the accuracy of Chinese patent term extraction.[Method/process] The method mainly includes three main steps:dependency syntax parsing, pruning and dependency subtree generation. Firstly, dependency syntax analysis was carried out on the Chinese patent text, from which dependency tree were obtained. Then, the dependency subtrees were generated by removing dependency relations which do not meet requirements. At last, the continuous word strings were selected as candidate terms to extract Chinese patent terms.[Result/conclusion] The experimental results show that compared with the existing related methods, the proposed method based on dependency syntax parsing can effectively improve the accuracy of Chinese patent term extraction.

Cite this article

Yu Yan , Chen lei , Jiang Jinde , Zhao Naixuan . Research on the Selection of Chinese Patent Candidate Term Based on Dependency Syntax Parsing[J]. Library and Information Service, 2019 , 63(18) : 109 -118 . DOI: 10.13266/j.issn.0252-3116.2019.18.013

References

[1] FRANTZI K, ANANIADOU S, MIMA H. Automatic recognition of multi-word terms:the C-value/NC-value method[J]. International journal on digital libraries, 2000, 3(2):115-130.
[2] 周浪, 史树敏, 冯冲,等. 基于多策略融合的中文术语抽取方法[J]. 情报学报, 2010, 29(3):460-467.
[3] 韦小丽, 孙涌, 张书奎,等. 基于最大熵模型的本体概念获取方法[J]. 计算机工程, 2009, 35(24):114-116.
[4] 王昊, 王密平, 苏新宁. 面向本体学习的中文专利术语抽取研究[J]. 情报学报, 2016, 35(6):573-585.
[5] LI L, DANG Y, ZHANG J, et al. Domain term extraction based on conditional random fields combined with active learning strategy[J]. North American review, 2012, 174:368-375.
[6] CONRADO M, PARDO T, REZENDE S. A machine learning approach to automatic term extraction using a rich feature set[C]//The North American chapter of the Association for Computational Linguistics. Stroudsburg PA:Association for computational linguistics, 2013:16-23.
[7] 胡阿沛, 张静, 刘俊丽. 基于改进C-value方法的中文术语抽取[J]. 现代图书情报技术, 2013, 29(2):24-29.
[8] 丁杰, 吕学强, 刘克会. 基于边界标记集的专利文献术语抽取方法[J]. 计算机工程与科学, 2015, 37(8):1591-1598.
[9] 刘剑, 唐慧丰, 刘伍颖. 一种基于统计技术的中文术语抽取方法[J]. 中国科技术语, 2014, 16(5):10-14..
[10] 曾镇, 吕学强, 李卓. 一种面向专利摘要的领域术语抽取方法[J]. 计算机应用与软件, 2016, 33(3):48-51.
[11] 杨双龙, 吕学强, 李卓,等. 中文专利文献术语自动识别研究[J]. 中文信息学报, 2016, 30(3):111-117.
[12] 徐川, 施水才, 房祥,等. 中文专利文献术语抽取[J]. 计算机工程与设计, 2013, 34(6):2175-2179.
[13] 张杰, 张海超, 翟东升. 面向中文专利权利要求书的分词方法研究[J]. 现代图书情报技术, 2014, 30(9):91-98.
[14] 胡文敏, 何婷婷, 张勇. 基于卡方检验的汉语术语抽取[J]. 计算机应用, 2007, 27(12):3019-3020.
[15] 韩红旗, 朱东华, 汪雪锋. 专利技术术语的抽取方法[J]. 情报学报, 2011, 30(12):1280-1285.
[16] 俞琰, 赵乃瑄. 基于通用词与术语部件的专利术语抽取[J]. 情报学报, 2018, 37(7):742-752.
[17] 林自芳, 蒋秀凤. 基于词内部模式的新词识别[J]. 计算机与现代化, 2010, 11(1):162-164.
[18] PECINA P, SCHLESINGER P. Combining association measures for collocation extraction.[C]//Proceedings of the COLING/ACL on main conference poster sessions. New York:ACM, 2016:651-658.
[19] 杜丽萍, 李晓戈, 于根,等. 基于互信息改进算法的新词发现对中文分词系统改进[J]. 北京大学学报(自然科学版), 2016, 52(1):35-40.
[20] ZHANG W, YOSHIDA T, TANG X, et al. Improving effectiveness of mutual information for substantival multiword expression extraction[J]. Expert systems with applications an international journal, 2009, 36(8):10919-10930.
[21] ROBINSON J. Dependency structures and transformational rules[J]. Language, 1970, 46(2):259-285.
[22] 白妙青, 郑家恒. 动词与动词搭配方法的研究[J]. 计算机工程与应用, 2004, 40(27):70-72.
[23] 刘怀军,车万翔,刘挺. 中文语义角色标注的特征工程[J]. 中文信息学报, 2007, 21(1):79-84.
[24] 王慧泽, 龚声蓉, 刘纯平. 融合全局和局部的Fisherfaces方法[J]. 计算机工程与应用, 2008, 44(24):194-196.
[25] CHE W, LI Z, LIU T. A Chinese language technology platform[C]//The 23th international conference on computational linguistics. New York:ACM, 2010:3-16.
[26] AGARWAL B, PORIA S, MITTAL N, et al. Concept-level sentiment analysis with dependency-based semantic parsing:a novel approach[J]. Cognitive computation, 2015, 7(4):487-499.
[27] 冯冲, 廖纯, 刘至润,等. 基于词汇语义和句法依存的情感关键句识别[J]. 电子学报, 2016, 44(10):2472-2476.
[28] 邓淑卿, 李玩伟, 徐健. 基于句法依存规则和词性特征的情感词识别研究[J]. 情报理论与实践,2018, 41(5):137-142.
[29] QUAN C, WANG M, REN F. An unsupervised text mining method for relation extraction from biomedical literature[J]. Plos one, 2014, 9(7):1-8.
[30] 李明耀, 杨静. 基于依存分析的开放式中文实体关系抽取方法[J]. 计算机工程, 2016, 42(6):201-207.
[31] 甘丽新, 万常选, 刘德喜,等. 基于句法语义特征的中文实体关系抽取[J]. 计算机研究与发展, 2016, 53(2):284-302.
[32] 李超, 柴玉梅, 高明磊,等. 句法分析和深度神经网络在中文问答系统答案抽取中的研究[J]. 小型微型计算机系统, 2017(6):1341-1346.
[33] 刘雄, 张宇, 张伟男,等. 基于依存句法分析的复合事实型问句分解方法[J]. 中文信息学报, 2017, 31(3):140-146.
[34] KLEIN S, MCCONLOGUE K, SIMMONS R F. Co-occurrence and dependency logic for answering English questions[J]. Journal of the American Society for Information Science & Technology, 2014, 15(3):196-204.
[35] WANG J, ZHANG J, AN Y, et al. Biomedical event trigger detection by dependency-based word embedding[J]. Bmc medical genomics, 2016, 9(2):45-54.
[36] 高源, 席耀一, 李弼程. 基于依存句法分析与分类器融合的触发词抽取方法[J]. 计算机应用研究, 2016, 33(5):1407-1410.
[37] 张仲华, 苏方方, 姬东鸿. 生物医学事件触发词识别研究[J]. 计算机应用研究, 2017, 34(3):661-670.
[38] 张雷瀚, 吕学强, 李卓,等. 领域本体术语的抽取方法研究[J]. 情报学报, 2014, 33(2):167-174.
Outlines

/