Applying Graph Representations to Automatic Extraction of Semantic Information from Chinese Patent text

  • Jiang Chuntao
Expand
  • Department of Computer Science and Technology, Nanjing University, Nanjing 210023 Patent Information Service Center of Jiangsu Province, Nanjing 210008

Received date: 2015-08-20

  Revised date: 2015-10-18

  Online published: 2015-11-05

Abstract

[Purpose/significance]This paper proposes a graph representation based approach to extract automatically semantic information from Chinese patent texts; such information can be used to provide semantic support for text-content based patent intelligent analysis. [Method/process]The author devised two graph models using graph representations: ①a keyword based text graph model, ②a dependency tree based text graph model. The first graph model was constructed by computing the similarities between any two keywords; the second graph model was constructed by extracting syntactic relations from text sentences. In the case study, the author utilized a frequent subgraph mining algorithm to discover frequent subgraph patterns, and such patterns were further used as features to build text classifiers for the purpose of testing the expressivity and effectiveness of the graph models built before. [Result/conclusion] The constructed text classifiers were tested on datasets consisting of patents from four different technology domains, in comparison with using a classic text classifier. The experimental results show that the performance of two text classifiers using graph models has a gain of 2.1%-10.5% than a classic text classifier by using a smaller number of features. Thus, it can be inferred that employing graph representations and graph mining techniques to extract semantic information from patent texts is effective and facilitates a further patent text analysis.

Cite this article

Jiang Chuntao . Applying Graph Representations to Automatic Extraction of Semantic Information from Chinese Patent text[J]. Library and Information Service, 2015 , 59(21) : 115 -122 . DOI: 10.13266/j.issn.0252-3116.2015.21.017

References

[1] Jurgens J, Christa W H. Limitations of automatic patent IR[J]. Datenbank-Spektrum, 2014,14(1):5-17.
[2] Lai Kuei-kuei, Wu Shiao-jun. Using the patent co-citation approach to establish a new patent classification system [J]. Information Processing and Management, 2005,41(2):313-330.
[3] Lee C, Cho Y, Seol H,et al. A stochastic patent citation analysis approach to assessing future technological impacts [J]. Technological Forecasting & Social Change, 2012,79(1):16-29.
[4] Kang I S, Na S H, Kim J,et al. Cluster-based patent retrieval [J]. Information Processing & Management, 2007,43(5):1173-1182.
[5] Yang Shih-Yao, Lin Szu-Yin, Lin Shin-Neng, et al. Automatic extraction of semantic relations from patent claims[J]. International Journal of Electronic Business Management, 2008,6(1):45-54.
[6] Parapatics P, Dittenbach M. Patent claim decomposition for improved information extraction[A]//Lupu M, Mayer K, Tait J, et al. Current Challenges in Patent Information Retrieval[M]. Berlin:Springer, 2011:197-216.
[7] Nanba H, Anzen N, Okumura M. Automatic extraction of citation information in Japanese patent applications[J]. International Journal of Digital Library, 2008,9(2):151-161.
[8] Lopez P. Automatic extraction and resolution of bibliographical references in patent documents: Advances in multidisciplinary retrieval[C]//Berlin:Springer, 2010:120-135.
[9] Trappey A J C, Trappey C, Wu Chun-Yi. Automatic patent document summarization for collaborative knowledge systems and services[J]. Jounral of Systems Science and Systems Engineering, 2009, 18(1):71-94.
[10] Yang Shih-Yao, Soo V W. Comparing the conceptual graphs extracted from patent claims[C]//Proceedings of the 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing. Taichung, Taiwan: IEEE Computer Society, 2008:394-399.
[11] Tseng Y H, Lin C J, Lin Y I. Text mining techniques for patent analysis[J]. Information Processing and Management, 2007,43:1216-1247.
[12] Feng G, Chen X, Peng Z. A rules and statistical learning based method for Chinese patent information extraction[C]//Proceedings of the 8th Web Information Systems and Applications Conference.Chongqing: IEEE, 2011:114-118.
[13] Yang S Y, Soo V W. Extract conceptual graphs from plain texts in patent claims [J]. Engineering Applications of Artificial Intelligence, 2012, 25(4):874-887.
[14] 姜彩红, 乔晓东,朱礼军. 基于本体的专利摘要知识抽取[J]. 现代图书情报技术, 2009(2): 23-28.
[15] 王曰芬, 徐丹丹, 李飞. 专利信息内容挖掘及其实验研究[J]. 现代图书情报技术,2008(12): 59-65.
[16] 于霜. 基于专利引文网络的空间关系可视化研究[D]. 大连:大连理工大学,2010.
[17] Mihalcea R, Tarau P. TextRank: Bringing order into texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain:Association for Computational Linguistics, 2004:404-411.
[18] Hou Xin, Ong S K, Nee A Y C, et al. Graonto: A graph-based approach for automatic construction of domain ontology [J]. Expert Systems with Applications, 2011,38(9):11958-11975.
[19] Schenker A. Graph theorectic techniques for Web content mining[D]. Florida: University of South Florida, 2003.
[20] Wang Wei, Do D B, Lin Xuemin. Term raph model for text classification[A]//Li Xue, Wang Shuliang, Dong Zhaoyang. Advanced data mining and Applications. Berlin:Springer Berlin Heidelberg, 2005:19-30.
[21] Markov A, Last M, Kandel A. Fast categorization of Web documents represented by graphs: Advances in Web mining and Web usage analysis[A]//Nasraoui O, Spiliopoulou M, Srivastava J, et al. Advances in Web Mining and Web Usage Analysis. Berlin:Springer, 2007, 4811:56-71.
[22] Gee K R, Cook D J. Text classification using graph-encoded linguistic elements: FLAIRS Conference[C]//Proceedings of the 18th International Florida Artificial Intelligence Research Society Conference. Clearwater Beach, Florida:AAAI Press, 2005:487-492.
[23] 王少龙. 基于图结构的中文文本分类研究[D]. 西安:西安电子科技大学,2012.
[24] Markov A, Last M. Efficient graph-based representation of Web documents[C]//Proceedings of the 3rd International Workshop on Mining Graphs, Trees and Sequences. Porto:Portugal, 2005:52-62.
[25] Church K, Hanks K. Word association norms, mutual information and lexicography[J]. Computational Linguistics, 1990,16(1):22-29.
[26] Dunning T. Accurate methods for the statistic of suprise and coincidence[J]. Association for Computational Linguistics, 1993,19(1):61-76.
[27] Chang Pi-chuan, Tseng Huihsin, Jurafsky D, et al. Discriminative reordering with Chinese grammatical relations features[C]//Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation. Boulder:Association for Computational Linguistics, 2009:51-59.
[28] Salton G, Wong A, Yang Chunshu. A Vector Space Model for automatic indexing[J]. Communications of the ACM, 1975, 18(11):613-620.
[29] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval[J]. Information Process Management, 1998,24(4):323-328.
[30] Vapnik V. The nature of statistical learning theory[M]. 2nd ed. New York:Springer, 1999.
[31] Yan Xifeng, Han Jiawei. gSpan:Graph-based substructure pattern mining[C]//Proceedings of the 2002 International Conference on Data Mining.Maebashi City, Japan:IEEE, 2002, 721-724.
[32] Hall M, Frank E, Holmes G, et al. The weka data mining software: An update[J]. SIGKDD Explorations, 2009, 11(1):10-18.
[33] Fan R E, Chang K W, Hsien C J,et al. Liblinear: A library for large linear classification [J]. Journal of Machine Learning Research, 2008(9):1871-1874.

Outlines

/