[目的/意义] 在主题挖掘的基础上融入典型城市差异分析视角,帮助共享住宿平台因地制宜地改善用户体验、提高用户粘性,从而推动典型城市平台用户管理精细化、科学化,实现平台的可持续发展。[方法/过程] 以小猪短租平台为例,爬取北京、上海、成都、广州和三亚5座典型城市的15 040条在线评论,通过LDA主题模型提取用户关注主题,基于Stacking集成学习算法和IPA分析工具从重要性和绩效两个维度分析用户对不同城市主题的关注度与满意度差异。[结果/结论] 结果发现,用户住宿体验过程中关注主题主要包括人员服务、周边交通、基础设施、感官认知、经济价值、风景建筑、主题特色和餐饮体验8类;同时,结论进一步明确了各城市处于优势区、劣势区、改进区和保持区的主题差异性,实现对用户关注主题的跨城市分析。研究为主题挖掘学术研究提供新的研究思路,也为共享住宿平台有效配置资源提供实践指导。
[Purpose/Significance] Integrating the differences analysis perspective of typical cities on the basis of topic mining will help the shared accommodation platform to improve the user experience and enhance user stickiness according to local conditions, thus promoting refinement and scientific platform user management in target cities and achieving sustainable development of the platform. [Method/Process] Taking Xiaozhu as an example, this paper crawled 15040 online reviews from five typical cities, including Beijing, Shanghai, Chengdu, Guangzhou and Sanya and then used LDA to mine the user focus topics. Based on the stacking ensemble learning algorithm and IPA tool, the differences in users’ attention and satisfaction to different city topics were analyzed from both importance and performance dimensions. [Result/Conclusion] The results show that the main topics users focus during accommodation experience include eight categories: personnel services, surrounding traffic, infrastructure, sensory cognition, economic value, scenic architecture, theme features and catering experience. At the same time, the conclusion further clarifies the topic differences of each city in the advantageous area, the inferior area, the improvement area and the maintenance area, and realizes the cross-city analysis of the users’ focus topics. This research provides a new research idea for academic research of topic mining, and also provides practical guidance for effective resource allocation of shared accommodation platform.
[1] 裘惠麟, 邵波. 多源数据环境下科研热点识别方法研究[J]. 图书情报工作, 2020, 64(5):78-88.
[2] 谭春辉, 熊梦媛. 基于LDA模型的国内外数据挖掘研究热点主题演化对比分析[J]. 情报科学, 2021, 39(4):174-185.
[3] 廖海涵, 王曰芬, 关鹏. 微博舆情传播周期中不同传播者的主题挖掘与观点识别[J]. 图书情报工作, 2018, 62(19):77-85.
[4] 岳丽欣, 刘自强, 胡正银. 面向趋势预测的热点主题演化分析方法研究[J]. 数据分析与知识发现, 2020, 4(6):22-34.
[5] 池毛毛, 潘美钰, 王伟军. 共享住宿与酒店用户评论文本的跨平台比较研究:基于LDA的主题社会网络和情感分析[J]. 图书情报工作, 2021, 65(2):107-116.
[6] CHENG M, JIN X. What do Airbnb users care about? an analysis of online review comments[J]. International journal of hospitality management, 2019, 76:58-70.
[7] CHENG X, FU S, SUN J, et al. An investigation on online reviews in sharing economy driven hospitality platforms:a viewpoint of trust[J]. Tourism management, 2019, 71:366-377.
[8] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3(4/5):993-1022.
[9] 李湘东, 张娇, 袁满. 基于LDA模型的科技期刊主题演化研究[J]. 情报杂志, 2014, 33(7):115-121.
[10] 王燕鹏. 国内基于主题模型的科技文献主题发现及演化研究进展[J]. 图书情报工作, 2016, 60(3):130-137.
[11] PHAN X H, NGUYEN L M, HORIGUCHI S. Learning to classify short and sparse text & web with hidden topics from large-scale data collections[C]//Proceedings of the 17th international conference on World Wide Web. New York:Association for Computing Machinery, 2008:91-100.
[12] 刘逸, 孟令坤, 保继刚, 等. 人工计算模型与机器学习模型的情感捕捉效度比较研究——以旅游评论数据为例[J]. 南开管理评论, 2021, 24(5):63-74.
[13] 张志武, 薛娟, 陈国兰. 深度学习框架下类别不平衡数据情感分析[J]. 现代情报, 2021, 41(10):75-82.
[14] ONAN A, KORUKOGLU S, BULUT H. Ensemble of keyword extraction methods and classifiers in text classification[J]. Expert systems with applications, 2016, 57:232-247.
[15] 蔡毅, 朱秀芳, 孙章丽, 等. 半监督集成学习综述[J]. 计算机科学, 2017, 44(S1):7-13.
[16] 邹权, 宋莉, 陈文强, 等. 基于集成学习和分层结构的多分类算法[J]. 模式识别与人工智能, 2015, 28(9):781-787.
[17] 冉亚鑫,韩红旗,张运良,等.基于Stacking集成学习的大规模文本层次分类方法[J].情报理论与实践, 2020, 43(10):171-176, 182.
[18] 高欢, 那日萨, 杨凡. 基于集成学习的在线评论情感倾向分析[J]. 情报科学, 2019, 37(11):48-52, 111.
[19] DIAB D M, EI HINDI K M. Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification[J]. Applied soft computing, 2017, 54:183-199.
[20] MARTILLA J A, JAMES J C. Importance-performance analysis[J]. Journal of marketing, 1977, 41(1):77-79.
[21] CHEN K Y. Improving importance-performance analysis:the role of the zone of tolerance and competitor performance. The case of Taiwan's hot spring hotels[J]. Tourism management, 2014, 40:260-272.
[22] AZZOPARDI E, NASH R. A critical evaluation of importance- performance analysis[J]. Tourism management, 2013, 35:222-233.
[23] 王丹丹. 基于LIBQUAL +的高校图书馆服务质量之IPA分析[J]. 情报科学, 2008(9):1349-1352.
[24] 孙玲 玲, 胡彦 蓉, 刘洪 久. 基LSTM-LDA算法 和IPA分析的在线品牌社群用户关注热点研究[J]. 情报杂志, 2021, 40(9):178-186.
[25] OLIVER R L. A cognitive model of the antecedents and consequences of satisfaction decisions[J]. Journal of marketing research, 1980, 17(4):460-469.
[26] 信产部. 国家信息中心分享经济研究中心发布《中国共享住宿发展报告2020》[EB/OL].[2022-12-31]. http://www.sic.gov. cn/News/568/10548.htm.