Semantic Recognition of Technological Innovation Theme Based on LDA

  • Zhu Na ,
  • Wang Xiaoyue ,
  • Yang Jing ,
  • Bai Rujiang
Expand
  • Institute of Scientific & Technical Information, Shandong University of Technology, Zibo 255049

Received date: 2015-04-27

  Revised date: 2015-06-20

  Online published: 2015-07-20

Abstract

[Purpose/significance] Traditional probabilistic model of technology innovation theme identification method ignores the semantic understanding of the text. In order to identify the theme more accurately, the semantic recognition of technological innovation theme is imperative.[Method/process] This article proposes a semantic recognition method of science and technology innovation theme based LDA, uses the semantic role labeling technique to semantic index the technological innovation content of scientific literature, builds the LDA topic semantic recognition model, and identifies the science and technology innovation theme according to the probability of the hypernyms which correspond with semantic roles of keywords from technological innovation content.[Result/conclusion] The 3D printing field data experimental results show that, this method can identify the innovation theme more accurately, and form a mixed distribution cluster of scientific and technological innovation theme-scientific and technological innovation MeSH-scientific literature. It can reduce the interference of the background and other irrelevant data and avoid of the same semantic meaning's double counting problem of scientific and technological innovation MeSH.

Cite this article

Zhu Na , Wang Xiaoyue , Yang Jing , Bai Rujiang . Semantic Recognition of Technological Innovation Theme Based on LDA[J]. Library and Information Service, 2015 , 59(14) : 126 -134 . DOI: 10.13266/j.issn.0252-3116.2015.14.018

References

[1] Matsumura N, Matsuo Y, Ohsawa Y, et al. Discovering emerging topics from WWW[J]. Journal of Contingencies and Crisis Management, 2002, 10(2):73-81.
[2] 崔凯,周斌,贾焰,等. 一种基于LDA的在线主题演化挖掘模型[J]. 计算机科学,2010(11):156-159,193.
[3] 叶春蕾,冷伏海. 基于引文——主题概率模型的科技文献主题识别方法研究[J]. 情报理论与实践,2013(9):100-103.
[4] 王平. 基于层次概率主题模型的科技文献主题发现及演化[J]. 图书情报工作,2014,58(22):70-77.
[5] 邹杰利. 基于条件随机场的中文图书主题自动标引研究[D].南京:南京大学,2013.
[6] 孟令恩,李颖,何彦青,等. 基于语义角色标注的专利主题提取研究[J]. 图书情报工作,2014,58(19):19-24.
[7] Nallapati R, Mcfarland D A, Manning C D. Topic flow model:Unsupervised learning of topic-specific influences of hyperlinked documents[J].Journal of Machine Learning Research-Proceedings Track, 2011,15(1):543-551.
[8] [EB/OL]. [2015-01-10].http://blog.sina.com.cn/s/blog_4caedc7a0102eq8m.html.
[9] Havre S, Hetzler E, Whitney P, et al. ThemeRiver:Visualizing thematic changes in large document collections[J]. IEEE Transactions on Visualization and Computer Graphics, 2002, 8(1):9-20.
[10] Chi Huai-Hsin. Improving Web usability through visualization[J]. Internet Computing, 2002, 6(2):64-71.
[11] [EB/OL]. [2015-01-10].http://www.worldwidetelescope.org/.
[12] [EB/OL]. [2015-01-10].http://www.wolframalpha.com/.
[13] 杨选选,张蕾. 基于语义角色和概念图的信息抽取模型[J]. 计算机应用,2010(2):411-414.
[14] Beeferman D, Berger A, Lafferty J. Statistical models for text segmentation[J]. Machine Learning, 1999, 34(1-3):177-210.
[15] 汪红林,王红玲,周国栋. 基于依存关系的语义角色标注[J]. 计算机工程,2009(15):82-84.
[16] 李纲,戴强斌. 基于词汇链的关键词自动标引方法[J]. 图书情报知识,2011(3):67-71.
[17] Fabra J, Hernández S,Álvarez P, et al. A practical experience concerning the parallel semantic annotation of a large-scale data collection[C]//Proceedings of the 9th International Conference on Semantic Systems. New York:ACM, 2013:65-72.
[18] Alves da Silva M A, Teixeira Belloze K, Cavalcanti M C, et al. Agile semantic annotation of scientific texts at the biomedical scenario[C]//e-Science (e-Science), 2014 IEEE 10th International Conference on. Sao Paulo:IEEE, 2014:100-107.
[19] 张泽宇,李莉,谭凤,等. 基于语义的文档标注方法研究[J]. 计算机工程与科学,2013(9):151-156.
[20] 熊皓,刘群,吕雅娟. 联合语义角色标注和指代消解[J]. 中文信息学报,2013(6):58-68.
[21] 白如江,王晓笛,王效岳. 基于支持向量机和核心特征词的科技文献自动标引研究[J]. 情报理论与实践,2014(7):129-134.
[22] Tamilin A, Magnini B, Serani L. Context-driven semantic enrichment of Italian news archive[C]//Proceedings of the 7th International Conference on the Semantic Web:Research and Applications. Heraklion:Springer, 2010:364-378.
[23] Ciancarini P, Di Iorio A, Nuzzolese A G, et al. Semantic annotation of scholarly documents and citations[M]//AI* IA 2013:Advances in Artificial Intelligence. Turin:Springer, 2013:336-347.
[24] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. The Journal of Machine Learning Research, 2003(3):993-1022.
[25] 李保利,杨星. 基于LDA模型和话题过滤的研究主题演化分析[J]. 小型微型计算机系统,2012,33(12):2738-2743.
[26] Riedl M, Biemann C. TopicTiling:A text segmentation algorithm based on LDA[C]//Proceedings of ACL 2012 Student Research Workshop. Pennsylvania:ACM,2012:37-42.
[27] 胡吉明,陈果. 基于动态 LDA 主题模型的内容主题挖掘与演化[J]. 图书情报工作, 2014, 58(2):138-142.
[28] Guo Yufan, Silins I, Reichart R, et al. CRAB Reader:A tool for analysis and visualization of argumentative zones in scientific literature[C]//COLING 2012, 24th International Conference on Computational Linguistics. Mumbai:Indian Institute of Technology Bombay, 2012. 183-190.
[29] Hirohata K, Okazaki N, Ananiadou S, et al. Identifying sections in scientific abstracts using conditional random fields[C]//Proceedings of The 3rd International Joint Conference on Natural Ianguage Processing.Hyderabad:Asian Federation of Natural Language Processing,2008. 381-388.
[30] Basili R, Moschitti A, Pazienza M T. A text classifier based on linguistic processing[OL].[2015-04-20].http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.5608&rep=rep1&type=pdf.
[31] Ohsawa Y, Benson N E, Yachida M. KeyGraph:Automatic indexing by co-occurrence graph based on building construction metaphor[C]//Research and Technology Advances in Digital Libraries, 1998.Santa Barbara:IEEE, 1998:12-18.
[32] Galligan M C, Saldova R, Campbell M P, et al. Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution[J]. BMC Bioinformatics,2013,14(1):1-25.
[33] 吴平. 3D打印技术及其未来发展趋势[J]. 印刷质量与标准化,2014(1):8-10.
[34] Miller G A. WordNet:A Lexical Database for English[J]. Communications of the ACM,1995, 38(11):39-41.
[35] Graetz N. Teaching EFL students to extract structural information from abstracts:Reading for Professional Purposes:Methods and Materials in Teaching Languages[C]//The International Symposium on Language for Special Purposes.Eindhoven:ERIC,1982.
[36] Chang Chih-Chung, Lin Chih-Jen. LIBSVM:A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3):389-396.

Outlines

/