[目的/意义]针对目前使用标签推荐方法所得结果不理想的问题,改进传统相似度计算方式,并结合多种标签推荐方法,提高推荐准确性。[方法/过程]融合基于内容与协同过滤的推荐思想,利用LDA进行相似度计算得出资源与用户的近邻集合,并抽取资源内容关键词,以此构建标签混合推荐模型,最后以"豆瓣读书"为例对模型进行验证,同时与几种标签推荐方法进行比较。[结果/结论]在社会标注系统中,必须考虑用户-资源-标签3个维度,仅考虑单一角度势必会造成结果的不完整,同时在相似度计算时引入LDA能够挖掘潜在语义关系,提高推荐质量,且组合多种方法取长补短可以令推荐结果更为满意。
[Purpose/significance]For the current tag recommendation methods' results not satisfied, this paper aims to improve the traditional similarity calculation method and combine a variety of tag recommendation methods to improve the recommended accuracy.[Method/process]Based on the idea of content and collaborative filtering, LDA is used to calculate the similarity then find the neighbor of resources and users, and combine keywords which are extracted from resource contents to construct the tag hybrid recommendation model. Finally, "Douban reading" is taken as an example to verify the model's effectiveness and compared with several tag recommendation methods.[Result/conclusion]In the social tagging system,three dimensions including user, resource and tag should be considered.Only from one single angle will inevitably cause incomplete results.At the same time, the introduction of LDA in similarity calculation can exploit the potential semantic relation and improve the recommended quality. And the combination of a variety of ways to learn from each other can make the results more satisfactory.
[1] KRESTEL R,FANKHUSER P. Tag recommendation using probabilistic topic models[C/OL]//Proceedings of ECML PKDD discovery challenge(DC09),Bled,Slovenia,2009:131-141[2017-08-23].https://www.kde.cs.uni-kassel.de/ws/dc09/papers/proceedings.pdf#page=131.
[2] 金燕,陈玉.基于本体的标签控制方法研究[J].图书馆理论与实践,2010(7):26-29.
[3] BOGÁRDI-MÉSZÖLYÁ,RÖVID A,ISHIKAWA H,et al.Tag and topic recommendation systems[J].Acta polytechnica hungarica, 2013,10(10):171-191.
[4] 范永全,刘艳,陆园.社会化推荐系统的研究进展综述[J].现代计算机:普及版,2014 (10):29-33.
[5] 张引.社会标注系统中标签推荐方法研究[D].沈阳:东北大学,2012.
[6] 乔绿茵,张敏.我国基于Folksonomy的标签推荐方法研究综述[J].信息资源管理学报,2012(4):41-46.
[7] 刘志丽.基于内容的社会标签推荐技术研究[D].哈尔滨:哈尔滨工程大学,2012.
[8] 王国霞,刘贺平.个性化推荐系统综述[J].计算机工程与应用,2012,48(7):66-76.
[9] CAI Y,LEUNG H,LI Q,et al.Typicality-based collaborative filtering recommendation[J].IEEE international conference on tools with artificial intelligence,2010,2(3):97-104.
[10] TATU M,SRIKANTH M,SILVA T.Tag recommendations using bookmark content[C/OL]//Proceedings of the ECML PKDD discovery challenge at 18th European conference on Machine Learning,Antwerp, Belgium,2008:96-107[2017-08-23]. https://www.researchgate.net/profile/Antal_Van_Den_Bosch2/publication/228075659_Using_Language_Models_for_Spam_Detection_in_Social_Bookmarking/links/09e4150b273637375e00000 0/Using-Language-Models-for-Spam-Detection-in-Social-Bookmarking.pdf#page=104.
[11] MISHNE G.AutoTag:a collaborative approach to automated tag assignment for weblog posts[C]//Proceedings of the 15th international conference on World Wide Web.New York:ACM Press,2006:953-954.
[12] MARINHO L,SCHRNIDTTHIEME L.Collaborative tag recommendations[C]//Data Analysis,Machine Learning -Proceedings of the 31st Annual conference of the German classification society,Albert-Ludwigs-Universität Freiburg,German,2008:533-540[2017-08-23]. https://link.springer.com/chapter/10.1007%2F978-3-540-78246-9_63.
[13] HOTHO A,JASCHKE R,SCHMIZT C, et al.InformationRetrieval in folksonomies: search and ranking[J].Lecture notes in computer science,2006,4011:411-426.
[14] 宋洪鑫.基于标签与内容的blog检索实验系统研究与实现[D].北京:北京邮电大学,2011.
[15] 高兵.问答式社区的标签推荐技术研究[D].哈尔滨:哈尔滨工业大学,2009.
[16] 王传豹.基于协同过滤和文本相似度的标签推荐及搜索优化[D].保定:河北大学,2011.
[17] 安志伟.社会标签推荐张量分解方法研究[D].长沙:中南大学,2011.
[18] 张亮.基于LDA主题模型的标签推荐方法研究[J].现代情报,2016,36(2):53-56.
[19] 李慧宗,胡学钢,杨恒宇,等.基于LDA的社会化标签综合聚类方法[J].情报学报,2015,34(2):146-155.
[20] 邸亮,杜永萍.LDA模型在微博用户推荐中的应用[J].计算机工程,2014,40(5):1-6.
[21] yhao2014.通俗理解LDA主题模型[EB/OL].[2017-06-21].http://blog.csdn.net/yhao2014/article/details/51098037.
[22] 张培晶,宋蕾.基于LDA的微博文本主题建模方法研究述评[J].图书情报工作,2012,56 (24):120-126.
[23] 王振振,何明,杜永萍.基于LDA主题模型的文本相似度计算[J].计算机科学,2013,40 (12):229-232.
[24] 钟青燕,苏一丹,梁胜勇.基于层次聚类和语义的标签推荐研究[J].微计算机信息,2010,26(36):199-203.
[25] 王茜,王均波.一种改进的协同过滤推荐算法[J].计算机科学,2010,37(6):226-228.
[26] 熊回香.面向Web3.0的大众分类研究[D].武汉:华中师范大学,2011.
[27] 豆瓣读书[EB/OL].[2017-06-05].https://book.douban.com/.
[28] 施聪莺,徐朝军,杨晓江.TFIDF算法研究综述[J].计算机应用,2009,29(6):167-170.
[29] ANAND R,JEFFREY D.大数据·互联网大规模数据挖掘与分布式处理[M].北京:人民邮电出版社,2012:6-7.