Data Analysis of Wikidata Person Names Based on Association Rules Mining:A Case Study of the Theme of the Nobel Prize Winner

  • Jia Junzhi ,
  • Feng Jie
Expand
  • School of Economics and Management, Shanxi University, Taiyuan 030006

Received date: 2017-03-02

  Revised date: 2017-05-10

  Online published: 2017-06-20

Abstract

[Purpose/significance] Mining the relationship among different name data to show a domain or subject knowledge of a particular entity, which to achieve different levels, different dimensions of the knowledge system deconstruction and reconstruction, to provide a variety of needs to meet the knowledge service work has important research significance.[Method/process] This paper presents a research framework based on the association experiment of association rules of character entity operation. Through the extraction of object entity entries, preprocessing and attribute recognition and classification, the paper uses R to get the association of human entity rules, to achieve a variety of name data association, and finally extracts 113 Nobel Prize winner entity entries from the Wikidata knowledge base for empirical analysis.[Result/conclusion] The relationship between four different types of rules, such as place name, institution name, time name and subject name, is analyzed, and the relationship mining problem of different name data types is realized. This study provides a new perspective for knowledge disclosure, aggregation and association, and explores the application of data mining technology in name data.

Cite this article

Jia Junzhi , Feng Jie . Data Analysis of Wikidata Person Names Based on Association Rules Mining:A Case Study of the Theme of the Nobel Prize Winner[J]. Library and Information Service, 2017 , 61(12) : 122 -128 . DOI: 10.13266/j.issn.0252-3116.2017.12.016

References

[1] 马张华.信息组织[M].北京:清华大学出版社,2008:45-50.
[2] 刘炜,张春景,夏翠娟.万维网时代的规范控制[J].中国图书馆学报,2015(3):22-23.
[3] 石燕青.中文个人名称规范文档共享研究及语义化探索[D].太原:山西大学,2016.
[4] GALARRAGA L, SYMEONIDOU D, MOISSINAC J C. Rule mining for Semantifying Wikilinks[EB/OL].[2016-07-22]. http://events.linkeddata.org/ldow2015/papers/ldow2015_paper_02.pdf.
[5] SPITZ A, GERTZ M. Terms over LOAD:leveraging named entities for cross-document extraction and summarization of events[EB/OL].[2016-08-06]. http://dbs.ifi.uni-heidelberg.de/fileadmin/Team/aspitz/publications/Spitz_Gertz_2015_Term_over_LOAD.pdf.
[6] Wikidata main page[EB/OL].[2016-10-29]. https://www.wikidata.org.
[7] Wikidata:statistics[EB/OL].[2016-10-31]. https://www.wikidata.org/wiki/Wikidata:Statistics/Wikipedia.
[8] 贾君枝,薛秋红. Wikidata的特点、数据获取与应用[J].图书情报工作,2016,60(17):136-141,148.
[9] Wikidata sparql query[EB/OL].[2016-10-29]. https://query.wikidata.org/.
[10] AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules in large databases[J]. Journal of computer science & technology,2000,15(6):619-624.
[11] LESKOVEC J, RAJARAMAN A, ULLMAN J D. Mining of massive datasets:second edition[M]. Cambridge:Cambridge University Press,2014.
[12] HAN J, KAMBER M. Data mining concept and techniques[M]. San Francisco:Morgan Kaufmann Publishers Inc,2000.
[13] 宋旭东,翟坤,高卫东.关联规则评价指标的研究[J]. 微计算机信息,2007,23(12):174-176.
[14] R:The R Project for statistical computing[EB/OL].[2016-10-30].http://www.r-project.org/.
[15] RStudio[EB/OL].[2016-10-30].http://www.rstudio.com/.
[16] ZHAO Y. R and data mining:examples and case studies[M]. Pittsburgh:Academic Press,2012.
Outlines

/