专题:自然语言处理与文本信息分析

基于机器翻译的专利术语翻译获取方法研究

  • 何彦青 ,
  • 刘建辉 ,
  • 屈鹏 ,
  • 李颖 ,
  • 徐红姣
展开
  • 1. 中国科学技术信息研究所;
    2. 石家庄经济学院
何彦青,中国科学技术信息研究所副研究员,E-mail:heyq@istic.ac.cn;刘建辉,石家庄经济学院副教授;屈鹏,中国科学技术信息研究所助理研究员;李颖,中国科学技术信息研究所副研究员;徐红姣,中国科学技术信息研究所助理研究员。

收稿日期: 2014-07-24

  修回日期: 2014-09-02

  网络出版日期: 2014-10-05

基金资助

本文系国家自然科学基金“面向专利文献的统计机器翻译语境分析”(项目编号:61303152)和中日国际合作项目“面向科技文献的日汉双向实用型机器翻译合作研究”(项目编号:2014DFA11350)研究成果之一。

Study on Acquisition Method of Patent Term Translation Based on Machine Translation

  • He Yanqing ,
  • Liu Jianhui ,
  • Qu Peng ,
  • Li Ying ,
  • Xu Hongjiao
Expand
  • 1. Institute of Scientific and Technical Information of China, Beijing 100038;
    2. Shijiazhuang University of Economics, Shijiazhuang 050031

Received date: 2014-07-24

  Revised date: 2014-09-02

  Online published: 2014-10-05

摘要

鉴于专利术语的翻译要求高度的准确性和专业性,而专利术语的自动获取翻译对于机器翻译、词典自动编纂、跨语言信息检索等自然语言处理具有重要的实用价值,从双语的专利摘要中分别抽取术语,之后融合多术语识别方法,采用规则翻译和统计机器翻译来动态地辅助词汇化方法进行术语对齐,以期尽可能多地在双语的专利文献中获取准确的专利术语翻译对。在专利文摘中进行实验验证的结果是:专利术语翻译对的准确率达到80%。

本文引用格式

何彦青 , 刘建辉 , 屈鹏 , 李颖 , 徐红姣 . 基于机器翻译的专利术语翻译获取方法研究[J]. 图书情报工作, 2014 , 58(19) : 25 -30 . DOI: 10.13266/j.issn.0252-3116.2014.19.004

Abstract

Patent term translation requires high degree of accuracy and professionalism. Research on automatically obtaining patent term translation has important practical value on natural language processing tasks such as machine translation, compilation of bilingual dictionaries, cross-language information retrieval. This article extracts terms respectively from the bilingual patent abstracts, then combines multiple methods of terms recognition, finally uses rule-based machine translation and statistical machine translation to dynamically help lexical method realize term alignment, so as to obtain more accurate term translation as much as possible. The experiment on patent abstract shows that the precision of term translation pairs reaches 80%.

参考文献

[1] 冯志伟.现代术语学引论[M].北京:语文出版社, 1997.

[2] Knight K, Jonathan G. Machine transliteration[J]. Computational Linguistics, 1998,24(4):599-612.

[3] Li Haizhou, Zhang Min, Su Jian. A joint source channel model for machine transliteration[C]//Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics (ACL). Barcelona:ACL,2004: 159-166.

[4] Li Haizhou, Sim Khe Chai, Kuo Jin Shea, et al. Semantic transliteration of personal names[C]//Proceedings of 45th Annual Meeting of the Association for Computational Linguistics (ACL).Prague:ACL, 2007:120-127.

[5] Zhang Min, Li Haizhou, Su Jian, et al. A phrase-based context-dependent joint probability model for named entity translation[C]//Proceedings of the Second International Joint Conference on Natural Language Processing. Jeju Island:Asian Federation of Natural Language Processing(AFNLP),2005: 600-611.

[6] Zong Chengqing, Seligman M. Toward practical spoken language translation[J]. Machine Translation, 2005,19(2):113-137.

[7] Hu Rile, Zong Chengqing, Xu Bo. An approach to automatic acquisition of translation templates based on phrase structure extraction and alignment[J]. IEEE Transactions on Audio, Speech, and Language Processing,2006,14(5):1656-1663.

[8] Chen Hsin-Hsi, Yang Changhua, Lin Ying. Learning formulation and transformation rules for multilingual named entities[C]//Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition. Sapporo:ACL,2003:1-8.

[9] Chen Hsin-Hsi, Lin Wencheng, Yang Changhua,et al. Translating/transliterating named entities for multilingual information access[J]. Journal of the American Society for Information Science and Technology (Special Issue on Multilingual Information Systems), 2006, 57(5):645-659.

[10] Chen Yufeng, Zong Chengqing, Su Keh-Yih. A joint model to simultaneously identify and align bilingual named entities[J]. Computational Linguistics, 2013,39(2):229-266.

[11] Li Hang, Cao Yunbo, Li Cong. Using bilingual Web data to mine and rank translations[J], IEEE Intelligent Systems, 2003,18(4): 54-59.

[12] Nagata M,Saito T, Suzuki K.Using the Web as a bilingual dictionary[C]//Proceedings ACL 2001 Workshop Data-Driven Methods in Machine Translation. Toulouse:ACL,2001: 95-102.

[13] Fang Gaolin, Yu Hao, Nishino F. Chinese-English Term Translation Mining Based on Semantic Prediction[C]// Proceedings of COLING/ACL 2006 Main Conference Poster Sessions.Sydney:ACL,2006:199-206.

[14] Chen Conrad, Chen Hsin-hsi. A hign accurate Chinese English NE Backward translation and web statistics [C]//Proceedings of the Coling/ACL 2006 Main Conference Poster Sessions.Sydney:ACL,2006:81-88.

[15] Zhang Ying,Huang Fei, Vogel S. Mining translations of OOV terms from the Web through cross lingual query expansion[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Dvelopment in Information Ietrieval(SIGIR'05).Salvador:ACM,2005.

[16] Cao Guihong, Gao Jianfeng, Nie Jianyun.A system to mine large scale binlingual dictionaries from monolingual web pages[C]//Proceedings of Machine Translation Summit XI. Copenhagen:Asia-Pacific Association for Machine Translation,2007.

[17] 郭稷, 吕亚娟, 刘群.一种有效的基于web的双语翻译对获取方法[J].中文信息学报, 2008, 22(6):103-109.

[18] Fung P. Finding terminology translations from nonparallel corpora[C]//Proceedings of the Fifth Annual Workshop on Very Large Corpora(WVLC'97), Hong Kong:Hong Kong University of Science and Technology,1997:192-202.

[19] Rapp R. Automatic identification of word translations from unrelated English and German corpora[C]//Proceedings of the 37th Annual Meeting Assoc.Computational Linguistics. Maryland:Association for Computational Linguistics,1999:519-526.

[20] Somers H. Bilingual parallel corpora and language engineering[C]//Proceedings of the Anglo-Indian Workshop Language Engineering for South-Asian Languages. Maryland:Association for Computational Linguistics,2001.

[21] Veronis J.Parallel text processing-alignment and uses of translation corpora[M]. The Netherlands: Kluwer Academic Publishers,2000.

[22] 孙乐, 金友兵, 杜林,等.平行语料库中双语术语间词典的自动抽取[J].中文信息学报. 2000,14(6):33-39.

[23] 刘豹, 张桂平, 蔡东风.基于统计和规则相结合的科技术语自动抽取研究[J].计算机工程与应用, 2008, 44(23):147-150.

[24] Justeson J, Katz S. Technical terminology: Some linguistic properties and an algorithm for identification in text [J]. Natural Language Engineering, 1996,3(2):259-289.

[25] Pantel P, Lin Dekang. A statistical corpus-based term extractor[C]//Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in AI. London: Springer-Verlag,2001:36-46.

[26] Frantzi K, Annaniadou S.The C-value/NC-value domain independent method for multi-word term extraction[J].Jounal of Natural Language Processing,1999,6(3):10-21.

[27] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究 [J]. 图书情报工作, 2013, 57(1): 130-135.

[28] [EB/OL].[2014-05-11].http://sourceforge.net/projects/champollion/.

[29] [EB/OL].[2014-05-11].https://translate.google.com/.

文章导航

/