[目的/意义] 学术社交网络所提供的问答服务已成为学者们快速获取学术信息、解决学术问题的重要途径,实现基于机器学习的问答质量智能评价和服务优化对学术社交网络中优质内容传播具有重要意义。[方法/过程] 以ResearchGate问答服务为研究对象,从结构化特征、内容特征、其他特征以及回答者特征4个维度构建答案质量评价体系,利用机器学习方法和数据增强技术进行答案质量分类预测。[结果/结论] SMOTE算法在处理不平衡样本时具备有效性;支持向量机在单一模型预测中,取得出色的分类效果;组合模型使预测精度得到进一步提升,基于随机森林、支持向量机、BP神经网络构建的组合模型分类性能最佳,以此为基础可通过搭建问答质量智能评价系统实现学术社交网络问答服务优化。
[Purpose/significance] The Q&A service provided by academic social networking site has become an important way for scholars to access academic information quickly and solve academic problems. It is of great significance for the dissemination of high-quality content in academic social networking site to implement the intelligent evaluation of Q&A quality and the service optimization based on machine learning. [Method/process] This paper took ResearchGate as the research object, constructed an answer quality evaluation system based on four dimensions of structural features, content features, respondent characteristics and other characteristics of answers, and then used machine learning methods and data augmentation technology to perform the automatic answer quality classification prediction. [Result/conclusion] The results show that SMOTE algorithm is effective in dealing with unbalanced samples; In the first mock exam, support vector machine (SVM) achieves excellent classification performance; The combined model can further improve the prediction accuracy, and the combined model based on random forest, SVM and BP neural network has the best classification performance. On this basis, the academic social network Q&A service can be optimized by building the intelligent quality evaluation system.
[1] 李蕾. 学术型社会化问答平台上答案质量评估研究[D].南京:南京理工大学,2018.
[2] SHAH C, OH J S, OH S. Exploring characteristics and effects of user participation in online social Q&A sites[J/OL].First monday,2008,13(9).[2020-09-25].https://firstmonday.org/article/view/2182/2028.
[3] 张宁, 袁勤俭. 学术社交网络信息质量的治理和提升[J].图书情报工作,2019,63(23):79-86.
[4] 姜雯, 许鑫. 在线问答社区信息质量评价研究综述[J].现代图书情报技术,2014(6):41-50.
[5] YAO Y, TONG H, XIE T, et al. Detecting high-quality posts in community question answering sites[J]. Information sciences,2015,302(1):70-82.
[6] WANG R Y, STOREY V C, FIRTH C P. A framework for analysis of data quality research[J].IEEE transactions on knowledge & data engineering,1995,7(4):623-640.
[7] TENOPIR C, LEVINE K, ALLARD S, et al. Trustworthiness and authority of scholarly information in a digital age:results of an international questionnaire[J].Journal of the association for information science & technology,2016,67(10):2344-2361.
[8] WANG R Y, STRONG D M. Beyond accuracy:what data quality means to data consumers[J].Journal of management information systems,1996,12(4):5-33.
[9] American Public Health Association. Criteria for assessing the quality of health information on the Internet.[J].American journal of public health,2001,91(3):513-514.
[10] 孙晓宁, 赵宇翔, 朱庆华. 基于SQA系统的社会化搜索答案质量评价指标构建[J].中国图书馆学报,2015,41(4):65-82.
[11] DAISUKE I N K, SAKAI O T. What makes a good answer in community question answering? An analysis of assessors' criteria[EB/OL].[2021-01-05].https://www.researchgate.net/publication/228449185.
[12] 吴雅威, 张向先, 陶兴, 等. 基于用户感知的学术问答社区答案质量评价指标构建[J].情报科学,2020,38(10):141-147.
[13] 张煜轩. 基于外部线索的社会化问答平台答案信息质量感知研究[D].武汉:华中师范大学,2016.
[14] CAI Y Z, CHAKRAVARTHY S. Predicting answer quality in Q/A social networks:using temporal features[R].Arlington:University of Texas at Arlington, 2011.
[15] 孔维泽, 刘奕群, 张敏, 等. 问答社区中回答质量的评价方法研究[J].中文信息学报,2011,25(1):3-8.
[16] 姜雯, 许鑫, 武高峰. 附加情感特征的在线问答社区信息质量自动化评价[J].图书情报工作,2015,59(4):100-105.
[17] 郭顺利, 张向先, 陶兴, 等. 社会化问答社区用户生成答案质量自动化评价研究——以"知乎"为例[J]. 图书情报工作,2019,63(11):118-130.
[18] LI L, HE D, JENG W, et al. Answer quality characteristics and prediction on an academic Q&A site:a case study on ResearchGate[C]//24th international conference on World Wide Web. Florence:ACM, 2015:1453-1458.
[19] LE L T, SHAH C, CHOI E. Assessing the quality of answers autonomously in community question-answering[J].International journal on digital libraries,2019,20(4):351-367.
[20] VEKARIYA D V, LIMBASIYA N R. A novel approach for semantic similarity measurement for high quality answer selection in question answering using deep learning methods[C]//6th international conference on advanced computing and communication systems (ICACCS). Coimbatore:IEEE, 2020:518-522.
[21] 贺勋. 在线中文问答社区答案质量预测研究[D].济南:齐鲁工业大学,2020.
[22] GOODWIN S, JENG W, HE D. Changing communication on ResearchGate through interface updates[EB/OL].[2021-01-05].https://www.researchgate.net/publication/273664849.
[23] LI L, ZHANG C, HE D. Factors influencing the importance of criteria for judging answer quality on academic social Q&A platforms[J].Aslib journal of information management,2020,72(6):887-907.
[24] LI L, ZHANG C, HE D, et al. Researchers' judgment criteria of high-quality answers on academic social Q&A platforms[J].Online information review,2020,44(3):603-623.
[25] 任平平. ResearchGate实现学术社交网络国际化[J].国际人才交流,2020(5):52-53.
[26] 王伟, 冀宇强, 王洪伟, 等. 中文问答社区答案质量的评价研究:以知乎为例[J].图书情报工作,2017,61(22):36-44.
[27] 李展, 巢文涵, 陈小明, 等. 中文社区问答中问题答案质量评价和预测[J].计算机科学,2011,38(6):230-236.
[28] LI Y, MA S, ZHANG Y, et al. An improved mix framework for opinion leader identification in online learning communities[J].Knowledge-based systems,2013,43(2):43-51.
[29] 刘永恒. 基于神经网络和时间序列的汽车销量预测研究[D].南昌:南昌大学,2019.
[30] ZHU Z M, BERNHARD D, GUREVYCH I. A multi-dimensional model for assessing the quality of answers in social Q&A sites[EB/OL].[2020-09-30].https://tuprints.ulb.tu-darmstadt.de/1940/.
[31] 贾佳, 宋恩梅, 苏环. 社会化问答平台的答案质量评估——以"知乎"?"百度知道"为例[J].信息资源管理学报,2013,3(2):19-28.
[32] 刘乙蓉, 刘芸. 问答平台中的答案聚合及其优化:以Quora为例[J].图书馆学研究,2017(6):48-56,13.
[33] 袁红, 张莹. 问答社区中询问回答的质量评价——基于百度知道与知乎的比较研究[J].数字图书馆论坛,2014(9):43-49.
[34] 周志华. 机器学习[M].北京:清华大学出版社,2016.
[35] 李宵宵. 随机森林方法在个人信用风险分析中的应用[D].昆明:云南大学,2019.
[36] 周琪. 类别不平衡数据的个人信用风险评估算法研究[D].保定:河北大学,2020.
[37] MCLAUGHLIN G. SMOG grading-a new readability formula[J].Journal of reading,1969,12(8):639-646.
[38] 张海涛, 孙彤, 张鑫蕊, 等. 社会化问答社区用户角色转变的动力机理研究[J].现代情报,2020,40(9):32-41.