Patent Topic Discovery Method Integrated with Term Knowledge

  • Yu Yan ,
  • Zhao Naixuan
Expand
  • 1. Information Service Department, Nanjing Tech University, Nanjing 210009;
    2. Computer Science Department, Southeast University Chengxian College, Nanjing 211816

Received date: 2018-04-07

  Revised date: 2018-06-20

  Online published: 2018-11-05

Abstract

[Purpose/significance] Aiming at the problem of analysis patent topic in terms of word which causes topics are difficult to explain in the patent topic analysis, this paper proposes a patent topic discovery model integrated with term knowledge.[Method/process]The proposed model firstly introduces the class entropy and effectively recognizes the terms in the patent literature. Then, the Generalized Pólya Urn model is used to increase the probability of the semantic similarity terms assigned to the same topic, in order to alleviate the data sparsity problem brought by the term as the basic topic model analysis unit.[Result/conclusion]The experimental results show that the proposed model contains the term information to improve the quality of the topic generation, making the topic representation more readable and topic discriminative.

Cite this article

Yu Yan , Zhao Naixuan . Patent Topic Discovery Method Integrated with Term Knowledge[J]. Library and Information Service, 2018 , 62(21) : 118 -126 . DOI: 10.13266/j.issn.0252-3116.2018.21.015

References

[1] TANG J, WANG B, YANG Y, et al. PatentMiner:topic-driven patent analysis and mining[C]//ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2012:1366-1374.
[2] WANG B, LIU S, DING K, et al. Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis:a case study in LTE technology[J]. Scientometrics, 2014, 101(1):685-704.
[3] CHEN H, ZHANG G, LU J, et al. A fuzzy approach for measuring development of topics in patents using Latent Dirichlet Allocation[C]//IEEE international conference on fuzzy systems. Piscataway, NJ:IEEE, 2015:1116-1116.
[4] KIM M, PARK Y, YOON J. Generating patent development maps for technology monitoring using semantic patent-topic analysis[J]. Computers & industrial engineering, 2016, 98(1):289-299.
[5] SUOMINEN A, TOIVANEN H, SEPPANEN M. Firms' knowledge profiles:mapping patent data with unsupervised learning[J]. Technological forecasting & social change, 2016, 115(1):1-12.
[6] 范宇, 符红光, 文奕. 基于LDA模型的专利信息聚类技术[J]. 计算机应用, 2013, 33(S1):87-89.
[7] 王博, 刘盛博, 丁堃,等. 基于LDA主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(3):111-117.
[8] 吴菲菲, 张亚茹, 黄鲁成,等. 基于AToT模型的技术主题多维动态演化分析——以石墨烯技术为例[J]. 图书情报工作, 2017,1(5):95-102.
[9] 廖列法, 勒孚刚. 基于LDA模型和分类号的专利技术演化研究[J]. 现代情报, 2017, 37(5):13-18.
[10] 陈亮, 张静, 张海超,等. 层次主题模型在技术演化分析上的应用研究[J]. 图书情报工作, 2017,1(5):103-108.
[11] WALLACH H M. Topic modeling:beyond bag-of-words[C]//International conference on machine learning. New York:ACM, 2006:977-984.
[12] WANG X, MCCALLUM A, WEI X. Topical N-grams:phrase and topic discovery, with an application to information retrieval[C]//IEEE international conference on data mining. Piscataway, NJ:IEEE, 2007:697-702.
[13] LINDSEY R V, Headden Ⅲ W P, STIPICEVIC M J. A phrase-discovering topic model using hierarchical Pitman-Yor processes[C]//Joint conference on empirical methods in natural language processing and computational natural language learning. Stroudsburg, PA:ACL,2012:214-222.
[14] DANILEVSKY M, WANG C, DESAI N, et al. Automatic construction and ranking of topical keyphrases on collections of short documents[C]//Proceedings of the 2014 SIAM international conference on data mining. Philadelphia, PA:SIAM,2014:398-406.
[15] El-KISHKY A, SONG Y, VOSS C R, et al. Scalable topical phrase mining from text corpora[J]. Proceedings of the VLDB endowment, 2014, 8(3):305-316.
[16] 张琴, 张智雄. 基于PhraseLDA模型的主题短语挖掘方法研究[J]. 图书情报工作, 2017,61(8):120-125.
[17] HEINRICH G. A generic approach to topic model[M]//Machine learning knowledge discovery in databases. Berlin:Springer, 2009:517-532.
[18] ZIPF G K. Selected studies of the principle of relative frequency in language[J]. Language, 1933, 9(1):89-92.
[19] 韩红旗, 朱东华, 汪雪锋. 专利技术术语的抽取方法[J]. 情报学报, 2011, 30(12):1280-1285.
[20] 徐川, 施水才, 房祥,等. 中文专利文献术语抽取[J]. 计算机工程与设计, 2013, 34(6):2175-2179.
[21] FRANTZI K, ANANIADOU S, MIMA H. Automatic recognition of multi-word terms:. the C-value/NC-value, method[J]. International journal on digital libraries, 2000, 3(2):115-130.
[22] SPASIC I, GREENWOOD M, PREECE A, et al. FlexiTerm:a flexible term recognition method[J]. Journal of biomedical semantics, 2013, 4(1):27-42.
[23] MAYNARD D, ANANIADOU S. Identifying terms by their family and friends[C]//Conference on computational linguistics. Stroudsburg, PA:ACL, 2000:530-536.
[24] 李超, 王会珍, 朱慕华,等. 基于领域类别信息C-value的多词串自动抽取[J]. 中文信息学报, 2010, 24(1):94-99.
[25] 刘里, 刘小明. 基于分隔符和上下文术语的领域现象术语抽取[J]. 华南理工大学学报(自然科学版), 2011, 39(7):146-149.
[26] 胡阿沛, 张静, 刘俊丽. 基于改进C-value方法的中文术语抽取[J]. 现代图书情报技术, 2013, 29(2):24-29.
[27] 张杰, 张海超, 翟东升. 面向中文专利权利要求书的分词方法研究[J]. 现代图书情报技术, 2014, 30(9):91-98.
[28] MAHMOUD H. Polya urn models[M]. New York:Champman & Hall/CRC, 2009.
[29] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3(1):993-1022.
[30] GRIFFITHS T L, STEYVERS M. Finding scientific topics[J]//Proceedings of the national academy of Science, 2004, 1(1):5228-5235.
[31] MIMNO D, WALLACH H M, TALLEY E, et al. Optimizing semantic coherence in topic models[C]//Proceedings of the conference on empirical methods in natural language processing. Stroudsburg, PA:ACL, 2011:262-272.
[32] CHEN Z, MUKHERJEE A, LIU B, et al. Leveraging multi-domain prior knowledge in topic models[C]//International joint conference on artificial intelligence. Menlo Park, CA:AAAI, 2013:2071-2077.
[33] CHEN Z, MUKHERJEE A, LIU B, et al. Discovering coherent topics using general knowledge[C]//ACM international conference on information & knowledge management. New York:ACM, 2013:209-218.
[34] CHEN Z, LIU B. Mining topics in documents:standing on the shoulders of big data[C]//ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2014:1116-1125.
[35] 孙锐, 郭晟, 姬东鸿. 融入事件知识的主题表示方法[J]. 计算机学报, 2017, 40(4):791-804.
Outlines

/