Analysis and Application of Bayes Classification Algorithm in the Social Networking Site Information Filtering

  • Li Zhiyi ,
  • Shen Zhirui ,
  • Yi Meilian
Expand
  • Economic & Management College, South China Normal University, Guangzhou 510006

Received date: 2014-04-14

  Revised date: 2014-05-28

  Online published: 2014-07-05

Abstract

The classification of the document and identify the spam is a very valuable research field. More and more websites began to pay attention to this technology. This paper uses the intelligent algorithm to effectively analyze the garbage information, looking for spammers; through web logs and the published content, determine which advertisers and garbage information promulgator, and delete it. Screening for spam is in fact a process of dividing information into useful information and useless information, the paper attempts to use Bayes classification algorithm to put information into different categories, so the information can be filtered to different classes. The main contribution of the article is aiming at the defects of classification based on rules and method to weed out spam through the analysis of the advertising links, and gives the Bayes classification algorithm and machine learning methods. The experiment results show that, this method is superior to the one based on classification rules.

Cite this article

Li Zhiyi , Shen Zhirui , Yi Meilian . Analysis and Application of Bayes Classification Algorithm in the Social Networking Site Information Filtering[J]. Library and Information Service, 2014 , 58(13) : 100 -106 . DOI: 10.13266/j.issn.0252-3116.2014.13.017

References

[1] Turtle H, Croft W B. Inference networks for document retrieval[C]//Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Netherlands:ACM, 1989: 1-24.

[2] Callan J. Document filtering with inference networks[C]//Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. Netherlands:Springer,ACM, 1996: 262-269.

[3] Sahami M, Dumais S, Heckerman D, et al. A Bayesian approach to filtering junk e-mail[C]//Learning for Text Categorization: Papers from the 1998 Workshop. Menlo Park: the AAAI Press,1998, 62: 98-105.

[4] Hovold J. Naive bayes spam filtering using word-position-based attributes[C]//The 2nd Conference on Email and Anti-Spam. Mountain View: Betascript Publishing,2005.

[5] Metsis V, Androutsopoulos I, Paliouras C.Spam filtering with naive bayes-which naive bayes?[C]//The Third Conference on Email and Anti-Spam. Mountain View: Betascript Publishing,2006:27-28.

[6] 蒋永辉.基于贝叶斯算法的垃圾短信过滤系统的设计和实现[J].电脑知识与技术,2012(5):3665-3667.

[7] 邱齐辉.基于决策树和贝叶斯算法的垃圾网页检测的研究和实现[D].北京:北京工业大学,2012.

[8] Liu Wuying, Wang Ting. Unimodel-based multi-source portable spam filtering[C]//Fifth International Conference on Fuzzy Systems and Knowledge Discovery. Chongqing: Chongqing University of Posts and Telecommunications, 2008:540-544.

[9] 王雷.基于改进贝叶斯算法的文本分类器的研究及其在NERMS中的应用[D].长春:吉林大学,2006.

[10] 潘志方.基于朴素贝叶斯学习的电子商务网站客户兴趣分类的应用研究[J].计算机科学,2007(6):214-215,222.

[11] 许昕.基于用户隐式反馈的个性化资讯推荐系统研究与实现[D].北京:北京工业大学,2012.

[12] 李娜.基于增量学习的精准广告投放系统研究[D].太原:山西财经大学,2010.

[13] Schultz M G,Eskin E,Zadok E,et al. Data mining methods for detection of new malicious executables[C]//Titsworth F M. The Proceedings of 2001 IEEE Symposium on Security and Privacy. Florida: The Printing House, 2001: 38-49.

[14] 赖英旭,杨震.改进贝叶斯算法在未知恶意软件识别中的研究[J].北京工业大学学报,2011(5):766-772.

[15] 阮彤,冯东雷,李京.基于贝叶斯网络的信息过滤模型研究[J].计算机研究与发展,2002(12):1564-1571.

[16] 李振鹏.针对UGC数据进行的数据挖掘的研究和实现[D].北京:北京邮电大学,2012:3-35.

[17] 朱雪彤.移动社交网络中用户上下文的自动识别与共享[J].南京理工大学学报,2013,(8):500-505.

[18] 熊小兵.微博网络传播行为中的关键问题研究[D].郑州:信息工程大学,2013:2-109.

[19] 史磊.基于用户兴趣和模糊性的P2P信任机制研究[D].大连:大连理工大学,2007:1-60.

[20] 窦彦昭.社交网络中主观信息传播的研究[D].大连:大连理工大学,2011:1-53.

[21] 谢婧,刘功申,苏波,等.社交网络中的用户转发行为预测[J].上海交通大学学报,2013(4):584-588.

[22] 戴云晶.在线社交网络中用户间影响力量化研究[D].上海:上海交通大学,2013:1-66.

[23] Nusrat S, Vassileva J. Recommending services in a trust-based decentralized user modeling system[J].Advances in User Modeling Lecture Notes in Computer Science,2012(7138): 230-242.

[24] 幸莉仙,黄慧连.MapReduce 框架下的朴素贝叶斯算法并行化研究[J].计算机系统应用,2013(2):108-111.

[25] Zhou Shusen, Chen Qingcai, Wang Xiaolong. Active deep learning method for semi-supervised sentiment classification[J].Neurocomputing, 2013, 120(1):536-546.

[26] Zheng Haitao, Chen Jinyuan, Jiang Yong. An ontology-based approach to Chinese semantic advertising[J].Information Sciences, 2012, 216(3):138-154.

[27] Balahur A, Mihalcea R, Montoyo A. Computational approaches to subjectivity and sentiment analysis: Present and envisaged methods and applications[J].Computer Speech and Language, 2014, 28(1):1-6.

Outlines

/