Parse Tree-based SAO Structure Identification

  • Yang Chao ,
  • Zhu Donghua ,
  • Heng Xiaofan ,
  • Wang Xuefeng
Expand
  • School of Management and Economics, Beijing Institute of Technology, Beijing 100081

Received date: 2016-08-25

  Revised date: 2016-10-16

  Online published: 2016-11-05

Abstract

[Purpose/significance] Subject-Action-Object (SAO) is a triple structure which can be used to both describe topics in details and explore the relationship between topics. SAO analysis is a fast-growing research field. In order to obtain the SAO structures which are suitable for the bibliometric analysis, three problems need to be solved. Recall and precision have been low. The SAOs don't have close relationships with domain topics. There is a problem of matrix sparsity. [Method/process] This paper proposed a parse tree-based SAO identification method for the bibliometric analysis. It included:(1) a model to identify the core components of SAO structures, where co-word analysis and term clumping processes were involved; (2) a parse tree-based hierarchical SAO extraction model to implement SAO structures identification. [Result/conclusion] The case study shows that the average precision and average recall of the proposed method is 0.8058 and 0.8446. The SAO extracted with our method has a great relationship with the domain topic and improves the matrix sparsity, which makes it be used as an effective tool for the bibliometric analysis.

Cite this article

Yang Chao , Zhu Donghua , Heng Xiaofan , Wang Xuefeng . Parse Tree-based SAO Structure Identification[J]. Library and Information Service, 2016 , 60(21) : 113 -121 . DOI: 10.13266/j.issn.0252-3116.2016.21.015

References

[1] GUDIVADA R C, QU X Y A, CHEN J, et al. Identifying disease-causal genes using Semantic Web-based representation of integrated genomic and phenomic knowledge[J]. Journal of biomedical informatics, 2008, 41(5):717-729.
[2] AUER S, LEHMANN J. Creating knowledge out of interlinked data[J]. Semantic Web, 2010, 1(1/2):97-104.
[3] ZHAO Y, GAO S, GALLINARI P, et al. Knowledge base completion by learning pairwise-interaction differentiated embeddings[J]. Data mining and knowledge discovery, 2015, 29(5):1486-504.
[4] VERBITSKY M. Semantic TRIZ[M]. Boston:Invention Machine Corporation, 2004.
[5] 汪雪锋, 付芸, 邱鹏君,等. 基于SAO分析的R&D合作伙伴识别研究[J]. 科研管理, 2015(10):19-27.
[6] 黄鲁成, 张璐, 吴菲菲,等. 基于突现文献和SAO相似度的新兴主题识别研究[J]. 科学学研究, 2016(6):814-821.
[7] CASCINI G, FANTECHI A, SPINICCI E. Natural language processing of patents and technical documentation[M]//MARINAI S, DENGEL A. Document Analysis Systems VI. Berlin:Springer Berlin Heidelberg, 2004:508-520.
[8] 郭俊芳, 汪雪锋, 李乾瑞,等. 一种新型的技术形态识别方法——基于SAO语义挖掘方法[J]. 科学学研究, 2016(1):13-21.
[9] 谭晓, 张志强. 图情领域中专利分析主题的研究进展——基于WOS的文献分析[J]. 图书情报工作, 2012, 56(20):85-91.
[10] CHOI S, KIM H, YOON J, et al. An SAO-based text-mining approach for technology roadmapping using patent information[J]. R & D management, 2013, 43(1):52-74.
[11] 郭俊芳, 汪雪锋, 邱鹏君,等. 基于SAO分析的技术路线图构建研究[J]. 科学学研究, 2014,32(7):976-981.
[12] 汪雪锋, 邱鹏君, 付芸. 一种新型技术路线图构建研究——基于SAO结构信息[J]. 科学学研究, 2015(8):1134-1140.
[13] 胡正银, 方曙. 专利文本技术挖掘研究进展综述[J]. 现代图书情报技术, 2014(6):62-70.
[14] DARAIO C, GLÄNZEL W. Grand challenges in data integration-state of the art and future perspectives:an introduction[J]. Scientometrics, 2016, 108(1):391-400.
[15] ZHANG Y, PORTER A L, HU Z, et al. "Term clumping" for technical intelligence:a case study on dye-sensitized solar cells[J]. Technological forecasting and social change, 2014, 85:26-39.
[16] ZHANG Y, ZHOU X, PORTER A L, et al. How to combine term clumping and technology roadmapping for newly emerging science & technology competitive intelligence:"problem & solution" pattern based semantic TRIZ tool and case study[J]. Scientometrics, 2014, 101(2):1375-1389.
[17] BERGMANN I, BUTZKE D, WALTER L, et al. Evaluating the risk of patent infringement by means of semantic patent analysis:the case of DNA chips[J]. R & D management, 2008, 38(5):550-562.
[18] 黄承慧, 印鉴, 侯昉. 一种基于主谓宾结构的文本检索算法[J]. 计算机科学, 2010(9):173-176.
[19] 温浩, 温有奎. 基于语义互补推理的文献隐含知识的发现方法研究[J]. 计算机科学, 2014(6):171-175.
[20] YOON J, KIM K. Detecting signals of new technological opportunities using semantic patent analysis and outlier detection[J]. Scientometrics, 2012, 90(2):445-461.
[21] 许琦, 顾新建. 一种基于Subject-Action-Object三元组的知识基因提取方法[J]. 浙江大学学报(工学版), 2013(3):385-399.
[22] GUO J, WANG X, LI Q, et al. Subject-action-object-based morphology analysis for determining the direction of technological change[J]. Technological forecasting and social change, 2016, 105:27-40.
[23] MOEHRLE M G, WALTER L, GERITZ A, et al. Patent-based inventor profiles as a basis for human resource decisions in research and development[J]. R & D management, 2005, 35(5):513-524.
[24] 胡正银, 方曙, 张娴,等. 个性化语义TRIZ构建研究[J]. 图书情报工作, 2015,59(7):123-131.
[25] ZHANG Y, ZHOU X, PORTER A L, et al. Triple Helix innovation in China's dye-sensitized solar cell industry:hybrid methods with semantic TRIZ and technology roadmapping[J]. Scientometrics, 2013, 99(1):55-75.
[26] 杜玉锋, 季铎, 姜利雪,等. 基于SAO的专利结构化相似度计算方法[J]. 中文信息学报, 2016(1):30-35.
[27] 李欣, 王静静, 杨梓, 等. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016(3):80-84.
[28] PARK H, YOON J, KIM K. Identifying patent infringement using SAO based semantic technological similarities[J]. Scientometrics, 2012, 90(2):515-529.
[29] PARK H, YOON J, KIM K. Identification and evaluation of corporations for merger and acquisition strategies using patent information and text mining[J]. Scientometrics, 2013, 97(3):883-909.
[30] YOON J, PARK H, KIM K. Identifying technological competition trends for R&D planning using dynamic patent maps:SAO-based content analysis[J]. Scientometrics, 2013, 94(1):313-331.
[31] 吴菲菲, 李倩, 黄鲁成. 基于专利SAO结构的技术应用领域识别方法研究[J]. 科研管理, 2014(6):1-7.
[32] HU Z, FANG S, WEI L, et al. An SAO-based approach to technology evolution analysis using patent information:Case study-graphene sensors[J]. Chinese journal of library and information science, 2015(3):62-75.
[33] WANG X, QIU P, ZHU D, et al. Identification of technology development trends based on subject-action-object analysis:the case of dye-sensitized solar cells[J]. Technological forecasting and social change, 2015,98:24-46.
[34] CHOI S, YOON J, KIM K, et al. SAO network analysis of patents for technology trends identification:a case study of polymer electrolyte membrane technology in proton exchange membrane fuel cells[J]. Scientometrics, 2011, 88(3):863-883.
[35] YOON J, KIM K. Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks[J]. Scientometrics, 2011, 88(1):213-228.
[36] JIANG T, TAN A H, WANG K. Mining generalized associations of semantic relations from textual Web content[J]. IEEE transactions on knowledge and data engineering, 2007, 19(2):164-179.
[37] ABACHA A, ZWEIGENBAUM P. Automatic extraction of semantic relations between medical entities:a rule based approach[J]. Journal of biomedical semantics, 2011, 2(S5):S4.
[38] PUNURU J, CHEN J H. Learning non-taxonomical semantic relations from domain texts[J]. Journal of intelligent information systems, 2012, 38(1):191-207.
[39] BUNDSCHUS M, DEJORI M, STETTER M, et al. Extraction of semantic biomedical relations from text using conditional random fields[J]. BMC Bioinformatics, 2008(9):14.
[40] 饶齐, 王裴岩, 张桂平. 面向中文专利SAO结构抽取的文本特征比较研究[J]. 北京大学学报(自然科学版), 2015(2):349-356.
[41] GERBER D, HELLMANN S, BUHMANN L, et al. Real-Time RDF Extraction from Unstructured Data Streams[M]//ALANI H, KAGAL L, FOKOUE A, et al. Semantic Web-Iswc 2013, Part I. Berlin:Springer-Verlag Berlin,2013:135-150.
[42] CUNNINGHAM H, TABLAN V, ROBERTS A, et al. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics[J]. PLOS computational biology, 2013, 9(2):e1002854. 作者贡献说明:杨超:负责方法构建、算法设计、实验实施; 朱东华:负责方法构建; 衡晓帆:负责数据采集; 汪雪锋:负责算法设计。

Outlines

/