Technical Topic Analysis in Patents: SAO-based LDA Modeling

  • Yang Chao ,
  • Zhu Donghua ,
  • Wang Xuefeng ,
  • Zhu Fujin ,
  • Heng Xiaofan
Expand
  • 1. School of Management and Economics, Beijing Institute of Technology, Beijing 100081;
    2. Centre for Quantum Computation and Intelligent Systems, University of Technology Sydney, NSW 2007

Received date: 2016-10-07

  Revised date: 2016-12-12

  Online published: 2017-02-05

Abstract

[Purpose/significance] There are three problems we have to fix in performing technical topic analysis:difficult to classify topic; homonyms of words and terms; difficult to identify technical problem and solution.[Method/process] In this paper, we first extract SAO structures from patents, and then we explore and identify the problem & solution patterns embodied in SAO structures. At last, SAO-Based LDA model is built based on the "bag of P&S" assumption and it performs technical topic analysis at concept level.[Result/conclusion] The case study shows that the proposed method can effectively identify topics' distribution, and has great advantages in topic identification and word disambiguation compared with traditional LDA model.

Cite this article

Yang Chao , Zhu Donghua , Wang Xuefeng , Zhu Fujin , Heng Xiaofan . Technical Topic Analysis in Patents: SAO-based LDA Modeling[J]. Library and Information Service, 2017 , 61(3) : 86 -96 . DOI: 10.13266/j.issn.0252-3116.2017.03.012

References

[1] ZHANG Y, ZHANG G, CHEN H, et al. Topic analysis and forecasting for science, technology and innovation:methodology with a case study focusing on big data research[J]. Technological forecasting and social change, 2016, 105:179-191.
[2] YU Z G, JOHNSON T R, KAVULURU R. Phrase based topic modeling for semantic information processing in biomedicine[C]//201312th International conference on machine learning and applications. New Jersey:IEEE, 2013:440-445.
[3] PIEPENBRINK A, NURMAMMADOV E. Topics in the literature of transition economies and emerging markets[J].Scientometrics,2015,102(3):2107-2130.
[4] LV P H, WANG G-F, WAN Y, et al. Bibliometric trend analysis on global graphene research[J]. Scientometrics, 2011, 88(2):399-419.
[5] AMJAD T, DING Y, DAUD A, et al. Topic-based heterogeneous rank[J]. Scientometrics, 2015, 104(1):313-334.
[6] CALLON M, COURTIAL J P, TURNER W A, et al. From translations to problematic networks-an introduction to co-word analysis[J]. Social science information, 1983, 22(2):191-235.
[7] SONG M, KIM S Y. Detecting the knowledge structure of bioinformatics by mining full-text collections[J]. Scientometrics, 2013, 96(1):183-201.
[8] ZHANG J, WOLFRAM D, WANG P L, et al. Visualization of health-subject analysis based on query term co-occurrences[J]. Journal of the American Society for Information Science and Technology, 2008, 59(12):1933-1947.
[9] CALLON M, COURTIAL J P, LAVILLE F. Co-word analysis as a tool for describing the network of interactions between basic and technological research:the case of polymer chemsitry[J]. Scientometrics, 1991, 22(1):155-205.
[10] HE Q. Knowledge discovery through co-word analysis[J]. Library trends, 1999, 48(1):133-159.
[11] YAN B N, LEE T S, LEE T P. Analysis of research papers on E-commerce (2000-2013):based on a text mining approach[J]. Scientometrics, 2015, 105(1):403-417.
[12] RAVIKUMAR S, AGRAHARI A, SINGH S N. Mapping the intellectual structure of scientometrics:a co-word analysis of the journal Scientometrics (2005-2010)[J]. Scientometrics, 2015, 102(1):929-955.
[13] NATALE F, FIORE G, HOFHERR J. Mapping the research on aquaculture. A bibliometric analysis of aquaculture literature[J]. Scientometrics, 2012, 90(3):983-999.
[14] LEONE R P, ROBINSON L M, BRAGGE J, et al. A citation and profiling analysis of pricing research from 1980 to 2010[J]. Journal of business research, 2012, 65(7):1010-1024.
[15] LEE H, KIM C, CHO H, et al. An ANP-based technology network for identification of core technologies:a case of telecommunication technologies[J]. Expert systems with applications, 2009, 36(1):894-908.
[16] ERDI P, MAKOVI K, SOMOGYVARI Z, et al. Prediction of emerging technologies based on analysis of the US patent citation network[J]. Scientometrics, 2013, 95(1):225-242.
[17] KAJIKAWA Y, YOSHIKAWA J, TAKEDA Y, et al. Tracking emerging technologies in energy research:toward a roadmap for sustainable energy[J]. Technological forecasting and social change, 2008, 75(6):771-782.
[18] CHO T S, SHIH H Y. Patent citation network analysis of core and emerging technologies in Taiwan:1997-2008[J]. Scientometrics, 2011, 89(3):795-811.
[19] KIM E, CHO Y, KIM W. Dynamic patterns of technological convergence in printed electronics technologies:patent citation network[J]. Scientometrics, 2014, 98(2):975-998.
[20] 吴菲菲, 张辉, 黄鲁成,等. 基于专利引用网络度分布研究技术跨领域应用[J]. 科学学研究, 2015(10):1456-1463.
[21] PHAAL R, FARRUKH C J P, PROBERT D R. Technology roadmapping-a planning framework for evolution and revolution[J]. Technological forecasting & social change, 2004, 71(1/2):5-26.
[22] ZHANG Y, GUO Y, WANG X F, et al. A hybrid visualisation model for technology roadmapping:bibliometrics, qualitative methodology and empirical study[J]. Technology Analysis & strategic management, 2013, 25(6):707-724.
[23] SCHWERDTNER W, SIEBERT R, BUSSE M, et al. Regional open innovation roadmapping:a new framework for innovation-based regional development[J]. Sustainability, 2015, 7(3):2301-2321.
[24] MCDOWALL W. Technology roadmaps for transition management:the case of hydrogen energy[J]. Technological forecasting and social change, 2012, 79(3):530-542.
[25] LEE C, SONG B, PARK Y. An instrument for scenario-based technology roadmapping:how to assess the impacts of future changes on organisational plans[J]. Technological forecasting and social change, 2015, 90, PartA:285-301.
[26] LEE J H, PHAAL R, LEE C. An empirical analysis of the determinants of technology roadmap utilization[J]. R & D management, 2011, 41(5):485-508.
[27] COWAN K R. A New roadmapping technique for creatively managing the emerging smart grid[J]. Creativity and innovation management, 2013, 22(1):67-83.
[28] AMADI-ECHENDU J, LEPHAUPHAU O, MASWANGANYI M, et al. Case studies of technology roadmapping in mining[J]. Journal of engineering and technology management, 2011, 28(1/2):23-32.
[29] KOSTOFF R N, BOYLAN R, SIMONS G R. Disruptive technology roadmaps[J]. Technological forecasting and social change, 2004, 71(1-2):141-159.
[30] DEERWESTER S. Indexing by latent semantic analysis[J]. Journal of the Association for Information Science and Technology, 1990, 41(6):391-407.
[31] HOFMANN T. Probabilistic latent semantic analysis[C]//Kathryn B, Henri P. proceedings of the Fifteenth conference on Uncertainty in Artificial Intelligence. San Francisco:Morgan Kaufmann Publishers Inc, 1999:289-296.
[32] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of machine learning research, 2003, 3(4/5):993-1022.
[33] 范云满, 马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术, 2012(12):58-65.
[34] WANG B, LIU S, DING K, et al. Patent content analysis method based on LDA topic model[J]. Science research management, 2015,3:111-117.
[35] YAU C-K, PORTER A, NEWMAN N, et al. Clustering scientific documents with topic modeling[J]. Scientometrics, 2014, 100(3):767-786.
[36] HU Z, FANG S, LIANG T. Empirical study of constructing a knowledge organization system of patent documents using topic modeling[J]. Scientometrics, 2014, 100(3):787-799.
[37] CHEN H, ZHANG G, LU J, et al. A fuzzy approach for measuring development of topics in patents using Latent Dirichlet Allocation[C]//2015 IEEE international conference on fuzzy systems. New Jersey:IEEE, 2015:1-7.
[38] BATTISTI F, FERRARA A, SALINI S. A decade of research in statistics:a topic model approach[J]. Scientometrics, 2015, 103(2):413-433.
[39] LEE H, KWAK J, SONG M, et al. Coherence analysis of research and education using topic modeling[J]. Scientometrics, 2015, 102(2):1119-1137.
[40] BLEI D M, LAFFERTY J D. Dynamic topic models[M]//Proceedings of the 23rd international conference on Machine learning. Pittsburgh:ACM, 2006:113-120.
[41] WANG C, BLEI D M, HECKERMAN D. Continuous time dynamic topic models[C]//David M, Petri M. Proceedings of the uncertainty in artificial intelligence. Finnland:Omnipress, 2012:579-586.
[42] WANG X, MCCALLUM A. Topics over time:a non-markov continuous-time model of topical trends[C]//Han J. Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2006:424-433.
[43] BLEI D M, LAFFERTY J D. Correlated topic models[C]//Bernhard S, John P, Thomas H. Proceeding of advances in neural information processing systems. Massachusetts:MIT Press, 2006:147-154.
[44] TEH Y W, JORDAN M I, BEAL M J, et al. Hierarchical Dirichlet Processes[J]. Journal of the American Statistical Association, 2006, 101(476):1566-1581.
[45] LI W, Andrew M. Pachinko allocation:dag-structured mixture models of topic correlations[C]//Proceedings of the International Conference on Machine Learning. New Jersey:IEEE Computer Society Press, 2006:577-584.
[46] ROSEN-ZVI M, GRIFFITHS T, STEYVERS M, et al. The author-topic model for authors and documents[C]//Pproceedings of the Conference on Uncertainty in Artificial Intelligence. Virginia:AUAI Press, 2004:487-494.
[47] 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6):583-590.
[48] WANG B, LIU S, DING K, et al. Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis:a case study in LTE technology[J]. Scientometrics, 2014, 101(1):685-704.
[49] WALLACH H M. Topic modeling:beyond bag-of-words[C]//proceedings of the 23rd international conference on machine learning. New York:ACM, 2006:977-984.
[50] WANG X R, MCCALLUM A, WEI X. Topical n-grams:Phrase and topic discovery, with an application to information retrieval[M]//RAMAKRISHNAN N, ZAIANE O R, SHI Y, et al. Icdm 2007:Proceedings of the Seventh Ieee International Conference on Data Mining. Los Alamitos:Ieee Computer Soc,2007:697-702.
[51] GUDIVADA R C, QU X Y A, CHEN J, et al. Identifying disease-causal genes using Semantic Web-based representation of integrated genomic and phenomic knowledge[J]. Journal of biomedical informatics, 2008, 41(5):717-729.
[52] AUER S, LEHMANN J. Creating knowledge out of interlinked data[J]. Semant Web, 2010, 1(1-2):97-104.
[53] ZHAO Y, GAO S, GALLINARI P, et al. Knowledge base completion by learning pairwise-interaction differentiated embeddings[J]. Data mining and knowledge discovery, 2015, 29(5):1486-1504.
[54] CASCINI G, FANTECHI A, SPINICCI E. Natural language processing of patents and technical documentation[M]//MARINAI S, DENGEL A. Document Analysis Systems VI. Berlin:Springer Berlin Heidelberg,2004:508-520.
[55] MOEHRLE M G, WALTER L, GERITZ A, et al. Patent-based inventor profiles as a basis for human resource decisions in research and development[J]. R & D management, 2005, 35(5):513-524.
[56] BERGMANN I, BUTZKE D, WALTER L, et al. Evaluating the risk of patent infringement by means of semantic patent analysis:the case of DNA chips[J]. R&D management, 2008, 38(5):550-562.
[57] ZHANG Y, ZHOU X, PORTER A L, et al. Triple Helix innovation in China's dye-sensitized solar cell industry:hybrid methods with semantic TRIZ and technology roadmapping[J]. Scientometrics, 2014, 99(1):55-75.
[58] VERBITSKY M. Semantic TRIZ[M]. Boston:Invention Machine Corporation, 2004.
[59] ZHANG Y, ZHOU X, PORTER A L, et al. How to combine term clumping and technology roadmapping for newly emerging science & technology competitive intelligence:"problem & solution" pattern based semantic TRIZ tool and case study[J]. Scientometrics, 2014, 101(2):1375-1389.
[60] CUNNINGHAM H, TABLAN V, ROBERTS A, et al. Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics[J/OL].[2016-09-29].https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567135/.
[61] ZHANG Y, PORTER A L, HU Z, et al. "Term clumping" for technical intelligence:a case study on dye-sensitized solar cells[J]. Technological forecasting and social change, 2014, 85:26-39.
[62] KIM Y, TIAN Y, JEONG Y, et al. Automatic discovery of technology trends from patent text[M]. 2009 ACM Symposium on Applied Computing. Honolulu, Hawaii:ACM,2009:1480-1487.
[63] 胡正银, 方曙, 张娴, 等. 个性化语义TRIZ构建研究[J]. 图书情报工作, 2015, 59(7):123-131.
[64] 胡正银. 基于个性化语义TRIZ的专利技术挖掘研究[D].北京:中国科学院大学, 2015.
[65] 胡正银, 方曙, 文奕, 等. 面向TRIZ的专利自动分类研究[J]. 现代图书情报技术, 2015, 31(1):66-74.
[66] CHOI S, KANG D, LIM J, et al. A fact-oriented ontological approach to SAO-based function modeling of patents for implementing Function-based Technology Database[J]. Expert systems with applications, 2012, 39(10):9129-9140.
[67] CHOI S, PARK H, KANG D, et al. An SAO-based text mining approach to building a technology tree for technology planning[J]. Expert systems with applications, 2012, 39(13):11443-11455.

Outlines

/