Research on Semantic Relatedness of Domain-specific Concepts Based on Chinese Wikipedia

  • Wang Juan ,
  • Cao Shujin ,
  • Jiang Lingmin ,
  • Hu Qing
Expand
  • 1. School of Information Management, Sun Yat-Sen University, Guangzhou 510275;
    2. Cisco School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510420;
    3. Computer Science Fundamentals Lab of Information Science and Technology College, Dalian Maritime University, Dalian 116026

Received date: 2014-10-10

  Revised date: 2014-11-20

  Online published: 2014-12-05

Abstract

In order to improve the accuracy of computing the relatedness of the domain-specific concepts, this paper proposes a new semantic relatedness algorithm using Chinese Wikipedia category architecture and concept interpretation content. The concepts in library and information science in concept-hierarchy of Chinese Wikipedia are taken as experiment objects, and weighted algorithm based on category and text information are compared with other algorithms only based on Chinese Wikipedia category like Relwup and Relseco or on Chinese Wikipedia article like Relstr. The experimental results show that the weighted algorithm is better than the others, and provide important technical support for application such as domain-oriented information retrieval, construction of domain ontology and so on.

Cite this article

Wang Juan , Cao Shujin , Jiang Lingmin , Hu Qing . Research on Semantic Relatedness of Domain-specific Concepts Based on Chinese Wikipedia[J]. Library and Information Service, 2014 , 58(23) : 136 -142 . DOI: 10.13266/j.issn.0252-3116.2014.23.021

References

[1] Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy[C]//Proceedings of International Conference Research on Computational Linguistics. Taipei: Association for Computational Linguistics,1997:13-33.
[2] Church K, Hanks P. Word association norms, mutual information, and lexicography[J]. Computational Linguistics, 1990,16(1):22-29.
[3] Cilibrasi R L, Vitanyi P M B. The Google similarity distance[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3):370-383.
[4] Landauer T K, Foltz P W, Laham D. An introduction to latent semantic analysis[J]. Discourse Processes, 1998, 25(2/3): 259-284.
[5] Fellbaum C. WordNet: An electronic lexical database[M]. Cambridge: MIT Press, 1998:18-19.
[6] Jarmasz M, Szpakowicz S. Roget's thesaurus and semantic similarity[C]//Proceedings of RANLP. Borovets, Bulgaria:Association for Computational Linguistics, 2003:212-219.
[7] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002,7(2):59-76.
[8] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报(信息科学版), 2010,28(6):602-608.
[9] Strube M, Ponzetto S P. WikiRelate! Computing semantic relatedness using Wikipedia[C]//Proceedings of AAAI. Boston: American Association for Artificial Intelligence, 2006: 1419-1424.
[10] Gabrilovich E, Markovitch S. Computing semantic relatedness using Wikipedia-based explicit semantic analysis[C]// Proceedings of IJCAI. Hyderabad, India:American Association for Artificial Intelligence, 2007:1606-1611.
[11] Zesch T, Gurevych I. Analysis of the Wikipedia category graph for NLP applications[C]//Proceedings of TextGraphs-2 Workshop NAACL-HLT. Rochester:Association for Computational Linguistics, 2007:1-8.
[12] Milne D, Witten I H. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links[C]//Proceedings of AAAI Workshop on Wikipedia and Artificial Intelligence. Chicago:American Association for Artificial Intelligence, 2008: 25-3.
[13] Halavais A, Lackaff D. An analysis of topical coverage of Wikipedia[J]. Journal of Computer-Mediated Communication, 2008,13(2): 429-440.
[14] 维基媒体基金会. 特殊页面: 统计信息查阅[EB/OL]. [2014-04-09]. http://zh.wikipedia.org/wiki/Wikipedia.
[15] 李赟. 基于中文维基百科的语义知识挖掘相关研究[D]. 北京:北京邮电大学, 2009.
[16] 汪祥. 基于中文维基百科的语义相关度计算的研究与实现[D]. 长沙:国防科学技术大学,2011.
[17] 涂新辉, 张红春, 周琨峰,等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3):109-115.
[18] Ponzetto S P, Strube M. WikiTaxonomy: A large scale knowledge resource[C]//Proceedings of ECAI. Patras:European Coordinating Committee for AI, 2008:751-752.
[19] Rada R, Mili H, Bicknell E, et al. Development and application of a metric to semantic nets[J]. IEEE Transactions on Systems, Man and Cybermetics, 1989,19(1):17-30.
[20] Wu Zhibiao, Palmer M. Verb semantics and lexical selection[C]//Proceedings of ACL. Las Cruces:Association for Computational Linguistics, 1994:133-138.
[21] Resnik P. Using information content to evaluate semantic similarity[C]//Proceedings of the IJCAI. Montreal:American Association for Artificial Intelligence, 1995: 448-453.
[22] Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet[C]//Proceedings of ECAI. Valencia:European Coordinating Committee for AI, 2004:1089-1090.
[23] Lesk M. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone[C]//Proceedings of 5th Annual International Conference on Systems Documentation. Toronto:Association of Computing Machinery, 1986:24-26.
[24] Banerjee S, Pedersen T. Extended gloss overlap as a measure of semantic relatedness[C]//Proceedings of IJCAI. Acapulco:American Association for Artificial Intelligence, 2003:805-810.
[25] 维基百科.分类:页面分类[EB/OL]. [2014-04-09]. http://zh.wikipedia.org/wiki/Category:%E9%A0%81%E9%9D%A2%E5%88%86%E9%A1%9E.
[26] 张华平. ICTCLAS汉语分词系统[EB/OL]. [2014-04-09]. http://ictclas.nlpir.org.
[27] Budanitsky A, Hirst G. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures[C]//Proeeding of NAACL Workshop on WordNet and Other Lexical, Pittsburgh:Association for Computational Linguistics, 2001:29-34.
[28] Spearman C. "General Intelligence" objectively determined and measured[J]. The American Journal of Psychology, 1904,15(2):201-293.

Outlines

/