收稿日期: 2013-08-05
修回日期: 2013-08-21
网络出版日期: 2013-09-20
Research on Collaborative Filtering Book Recommendation Based on Hadoop and Mahout
Received date: 2013-08-05
Revised date: 2013-08-21
Online published: 2013-09-20
基于Hadoop开源分布式计算框架和Mahout协同过滤推荐引擎技术构建图书推荐引擎系统,并利用云模型和Pearson系数对传统协同过滤推荐算法进行改进,改善传统单机推荐算法在高维稀疏矩阵上进行运算所导致的系统性能不佳及推荐结果不准确的问题。利用实验对分布式推荐平台的整体性能及改善后的协同过滤推荐算法进行测试评估,发现当虚拟机节点不断增加时,协同过滤推荐引擎的计算时间不断减少,这表明推荐引擎系统的总体性能较传统单机推荐引擎得到提升;利用MAE分别对原始协同过滤推荐效果和改进后的推荐算法进行测评,发现改进后的推荐引擎算法的推荐准确率较改进前提高13.1%。
奉国和 , 黄家兴 . 基于Hadoop与Mahout的协同过滤图书推荐研究[J]. 图书情报工作, 2013 , 57(18) : 116 -121 . DOI: 10.7536/j.issn.0252-3116.2013.18.020
Firstly, this paper builds a book recommendation engine system based on the Hadoop open source distributed computing framework and mahout collaborative filtering recommendation engine technology. Then it takes advantage of the cloud model and Pearson coefficient to improve the traditional collaborative filtering recommendation algorithm, and resolves the problems of poor system performance and recommendation results inaccurate of traditional stand-alone recommendation algorithm in high-dimensional sparse matrix operations. Thirdly, it experiments and evaluates the overall performance of the distributed recommendation platform and the improved collaborative filtering algorithm. It finds that: (1) when the virtual machine nodes are increasing, the computation time of collaborative filtering recommendation engine is declining in the experimental tests, which shows that the overall performance of the system has been improved. (2) it improves the mahout original collaborative filtering recommendation engine with the Pearson coefficient and evaluates the recommended effect with MAE indices of the original collaborative filtering recommendation algorithm, which finds the recommendation accuracy rate increases 13.1% and the subjectivity differences of user ratings have great impact on the recommendation accuracy.
Key words: book recommendation; Hadoop; Mahout; recommendation engine; collaborative filtering
[1] Goldberg D, Nichols D, Oki B M, et al. Using collaborative filtering to weave aninformation tapestry.[J]. Communications of the ACM,1992,35(12): 61-70.
[2] Breese I S,Heckeman D, Kadie C.Empirical analysis of predictive algorithms for collaborative filtering[C]//Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence.San Francisco:ACM Press,1998:43-52.
[3] 郁雪.基于协同过滤技术的推荐方法研究[D].天津:天津大学,2009.
[4] 奉国和,梁晓婷.协同过滤推荐研究综述[J].图书情报工作,2011,55(16):127-130.
[5] Sarwar B, Karypis G, Konstan J,et al.Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th International World Wide Web Conference. New York:ACM Press,2001:285-295.
[6] 利用Cloudera实现Hadoop[EB/OL].[2012-09-24].http://wiki.ubuntu.org.cn/利用Cloudera实现Hadoop.
[7] Apache Software Foundation[EB/OL].[2013-08-04].http://www.apache.org/.
[8] Secure Shell[EB/OL].[2012-09-19]. http://zh.wikipedia.org/wiki/Secure_Shell.
[9] JDK download[CP/OL].[2012-12-04].http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u27-download-440405.html
[10] Ziegler C N,McNee S M,Konstan J A, et al.Improving recommendation lists through topic diversification[EB/OL].[2013-08-20]. http://dl.acm.org/citation.cfm?id=1060754.
[11] Book-Crossing Dataset[EB/OL].[2012-12-09].http://www.informatik.uni-freiburg.de/~cziegler/BX/.
[12] SequenceFile[EB/OL].[2012-12-23].http://wiki.apache.org/hadoop/SequenceFile.
[13] Yu Chuan,Xu Jieping,Du Xiaoyong.Recommendation algorithm combining the user-basedclassified regression and the item-based filtering[EB/OL].[2013-08-20]. http://dl.acm.org/citation.cfm?id=1151463.
[14] Herlocker J L,Konstan J A,Terveen L G,et al.Evaluating collaborative filtering recommender system[J].ACM Transactions on Information System,2004,22(1):5-53.
[15] 刘建国,周涛,郭强,等.个性化推荐系统评价方法综述[J].复杂系统与复杂性科学,2009,6(3):1-8.
/
〈 | 〉 |