The Identity of the Same User with Cross-social Media Based on Entity Resolution

  • Qi Linfeng
Expand
  • Department of Library, Information and Archives, Shanghai University, Shanghai 200444

Received date: 2016-12-14

  Revised date: 2017-02-20

  Online published: 2017-03-20

Abstract

[Purpose/significance] Associating entities across multiple domains has always been the subject of entity resolution, and the purpose of this paper is to find accounts that belong to the same person between different social media (cross-social media).[Method/process] Based on the traditional approximate string matching technique, the paper proposes the method of using attributes combined with links and text content in social media, and compares attribute similarity value, neighbor similarity and keyword similarity between the two different social media accounts, in order to improve the precision.[Result/conclusion] Using Facebook and Twitter as experimental datasets to test different combinations of matching function, the results show that the combination of three matching functions can get more accounts for the same user. At the same time, the precision is also high, and has reached 0.923. The successful application of the proposed method on Facebook and twitter provides a new path for the research of other social media platforms and other domains.

Cite this article

Qi Linfeng . The Identity of the Same User with Cross-social Media Based on Entity Resolution[J]. Library and Information Service, 2017 , 61(6) : 107 -114 . DOI: 10.13266/j.issn.0252-3116.2017.06.017

References

[1] BAGGA A, BALDWIN B. Entity-based cross-document coreferencing using the vector space model[C]//Proceedings of the 17th international conference on computational linguistics-Volume 1. Association for Computational Linguistics, Montreal, Quebec, Canada, 1998:79-85.
[2] HAN X, SUN L, ZHAO J. Collective entity linking in web text:a graph-based method[C]//Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, Beijing, China, 2011:765-774.
[3] RAO D, MCNAMEE P, DREDZE M. Entity linking:finding extracted entities in a knowledge base[M]. Multi-source, multilingual information extraction and summarization. Springer Berlin Heidelberg, 2013:93-115.
[4] BHATTACHARYA I, GETOOR L. Collective entity resolution in relational data[J]. ACM Transactions on knowledge discovery from data (TKDD), 2007, 1(1):1-36.
[5] CHRISTEN P. A survey of indexing techniques for scalable record linkage and deduplication[J]. IEEE transactions on knowledge and data engineering, 2012, 24(9):1537-1555.
[6] 谭明超, 刁兴春, 曹建军.实体分辨研究综述[J]. 计算机科学, 2014, 41(4):9-12, 20.
[7] 高广尚, 张智雄. 关系数据库中实体解析研究综述[J]. 现代图书情报技术, 2015, 31(Z1):37-47.
[8] 燕彩蓉, 张洋舜, 徐光伟.支持隐私保护的众包实体解析[J]. 计算机科学与探索, 2014, 8(7):802-811.
[9] 楼俊杰, 徐从富, 郝春亮.基于马尔科夫逻辑网络的实体解析改进算法[J]. 计算机科学, 2010, 37(8):243-247.
[10] BARTUNOV S, KORSHUNOV A, PARK S T, et al. Joint link-attribute user identity resolution in online social networks[C]//Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Workshop on Social Network Mining and Analysis. ACM, Beijing, China, 2012:1-9.
[11] LIU J, ZHANG F, SONG X, et al. What's in a name?:an unsupervised approach to link users across communities[C]//Proceedings of the sixth ACM international conference on Web search and data mining. ACM, Rome, Italy, 2013:495-504.
[12] VOSECKY J, HONG D, SHEN V Y. User identification across multiple social networks[C]//First International Conference on IEEE, Ostrava, 2009:360-365.
[13] ZHANG H, KAN M Y, LIU Y, et al. Online social network profile linkage[C]//Asia Information Retrieval Symposium. Springer International Publishing, AIRS, 2014:197-208.
[14] ZAFARANI R, LIU H. Connecting users across social media sites:a behavioral-modeling approach[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Chicago, Illinois, USA, 2013:41-49.
[15] 甄灵敏, 杨晓春, 王斌, 等.基于属性权重的实体解析技术[J]. 计算机研究与发展, 2013, 50(S1):281-289.
[16] BHATTACHARYA I, GETOOR L. Iterative record linkage for cleaning and integration[C]//Proceedings of the 9th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM, Paris, France, 2004:11-18.
[17] ANANTHAKRISHNA R, CHAUDHURI S, GANTI V. Eliminating fuzzy duplicates in data warehouses[C]//Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, Hong Kong, China, 2002:586-597.
[18] KALASHNIKOV D V, MEHROTRA S, CHEN Z. Exploiting Relationships for Domain-Independent Data Cleaning[C]//SIAM International Conference on Data Mining(SDM). 2005:262-273.
[19] GETOOR L, DIEHL C P. Link mining:a survey[J]. ACM SIGKDD explorations newsletter, 2005, 7(2):3-12.
[20] PELED O, FIRE M, ROKACH L, et al. Entity matching in online social networks[C]//2013 International Conference on Social Computing (SocialCom).IEEE, 2013:339-344.
[21] COHEN W, RAVIKUMAR P, FIENBERG S. A comparison of string metrics for matching names and records[C]//Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web. 2003, 3:73-78.
[22] JOACHIMS T. Making large-scale support vector machine learning practical[C]//Advances in kernel methods. MIT Press, 1999:169-184.
[23] CAMPBELL W M, LI L, DAGLI C, et al. Cross-Domain Entity Resolution in Social Media[J]. The 4th International Workshop on Natural Language Processing for Social Media, 2016:1-7.
Options
Outlines

/