A Review on Chatbot Assessment: Indicators, Methods and Application

  • Ren Mudan ,
  • Geng Qian ,
  • Wu Yirong
Expand
  • 1 School of Government Management, Beijing Normal University, Beijing 100875;
    2 Institute of Advanced Studies in Humanities and Social Sciences, Beijing Normal University, Zhuhai 519087

Received date: 2023-06-14

  Revised date: 2023-07-24

  Online published: 2023-12-16

Abstract

[Purpose/Significance] This paper systematically analyzes the current application and assessment of chatbot at home and abroad, judges the existing problems and impossible application scenarios, so as to promote chatbot assessment and application activities. [Method/Process] In this paper, with "Web of Science" and CNKI (CNKI) as the main data source, supplemented by Panda Academic, Google Scholar and Baidu Academic, 662 research papers were selected as the original samples. After flow chart analysis, 66 papers were obtained for full-text analysis. And through induction, the chatbot assessment contents were summarized into three aspects:assessment indicator, assessment method and assessment application. [Result/Conclusion] The research of assessment indicators mainly focuses on its function, usage and user experience. However, there is still no standard evaluation index system for chatbots. The assessment methods are mainly divided into subjective and objective evaluation. Although the selection method is relatively simple and lacks cross-comprehensive evaluation, it can make up for the defects between human factors and technical factors. The applications mainly focus on education, medical treatment, mental health, while in government management and social service, it is still to be explored. Finally, this paper provides reference for domestic research in three aspects:accelerating the construction and research of chatbot assessment indicator system, broadening the application field and scene mode to achieve cross-platform linkage, and strengthening the ethical governance norms of chatbot.

Cite this article

Ren Mudan , Geng Qian , Wu Yirong . A Review on Chatbot Assessment: Indicators, Methods and Application[J]. Library and Information Service, 2023 , 67(22) : 140 -148 . DOI: 10.13266/j.issn.0252-3116.2023.22.014

References

[1] 华经产业研究. 2023年中国智能语音相关政策、市场规模及竞争格局分析[EB/OL].[2023-10-05]. https://baijiahao.baidu.com/s?id=1764226184868441436&wfr=spider&for=pc. (Huajing Industry Research. Analysis of China's intelligent voice related policies, market scale, and competitive landscape in 2023[EB/OL].[2023-10-05]. https://baijiahao.baidu.com/s?id=1764226184868441436&wfr=spider&for=pc.)
[2] 临洮那些事. 想刁难ChatGPT没想到破防的是小编!权威专家科普来了[EB/OL].[2023-10-08]. https://baijiahao.baidu.com/s?id=1757498596050233689&wfr=spider&for=pc. (Things about Lintao. Trying to spite ChatGPT, the editor shocked! Experts' explanation comes[EB/OL].[2023-10-08]. https://baijiahao.baidu.com/s?id=1757498596050233689&wfr=spider&for=pc.)
[3] 张海刚. ChatGPT:聊天机器人的新时代开启智能对话新纪元[EB/OL].[2023-10-11]. https://baijiahao.baidu.com/s?id=1760032472687552285&wfr=spider&for=pc. (ZHANG H G. ChatGPT:Chatbots begins intelligent conversation in a new era[EB/OL].[2023-10-11]. https://baijiahao.baidu.com/s?id=1760032472687552285&wfr=spider&for=pc.)
[4] 彭黔平, 江洁羽. 人工智能助理能力等级评估团体标准解读[J]. 信息技术与标准化, 2020(Z1):30-34. (PENG Q P, JIANG J Y. Interpretation of association standard "classified assessment on artificial intelligence assistants[J]. Information technology and standardization, 2020(Z1):30-34.)
[5] BENYON D, GAMBACK B, HANSEN P, et al. How was your day? evaluating a conversational companion[J]. IEEE transactions on affective computing, 2013, 4(3):299-311.
[6] ELERA R G, GRANT D C. Interacting with intelligent assistants to predict consumer satisfaction[C]//Atiner 14th annual international conference on information technology & computer science. Athens:ResearchGate, 2018:1-14.
[7] LOPEZ G, QUESADA L, GUERRERO L A, et al. Google assistant:a comparison of speech-based natural user interfaces[C]//International conference on applied human factors and ergonomics. Cham:Springer, 2017:241-250.
[8] SUBRAMANIAN M, SEHGAL S, RANGASWAMY N. From assistants to friends:investigating emotional intelligence of IPAs in Hindi and English[J]. ArXiv preprint arxiv:2112.03882, 2021.
[9] 赵一鸣, 朱奕蓉, 吴林容. 智能语音助手的知识服务能力评价研究[J]. 图书与情报, 2019(4):132-140. (ZHAO Y M, ZHU Y R, WU L R. Evaluating the knowledge service capability of intelligent voice assistants[J]. Library and information, 2019(4):132-140.)
[10] 吴忭, 王戈, 胡艺龄, 等. 基于会话代理的协作问题解决能力测评工具设计与效果验证[J]. 远程教育杂志, 2019, 37(6):91-99. (WU X, WANG G, HU Y L. Design and effect verification study of collaborative problem-solving ability assessment tool based on conversational agent[J]. Journal of distance education, 2019, 37(6):91-99.)
[11] 王艳秋, 管浩言, 张彤. 聊天机器人的分类标准和评估标准综述[J]. 软件工程, 2021, 24(2):2-8. (WANF Y Q, GUAN H Y, ZHANG T. Research on taxonomy and evaluation criteria of chatbots[J]. Software engineering, 2021, 24(2):2-8.)
[12] DIZON G. Evaluating intelligent personal assistants for L2 listening and speaking development[J]. Language learning & technology, 2020, 24(1):16-26.
[13] 최원경. Process-oriented speaking assessment of primary English using AI chatbots:possibilities and limitations[J]. Primary English education, 2020, 26(1):131-152.
[14] SMUTNY P, SCHREIBEROVA P. Chatbots for learning:A review of educational chatbots for the Facebook Messenger[J]. Computers & education, 2020, 151:103862.
[15] BALEL Y. Can ChatGPT be used in oral and maxillofacial surgery?[J]. Journal of stomatology, oral and maxillofacial surgery, 2023:101471.
[16] DE PENNINGTON N, MOLE G, LIM E, et al. Safety and acceptability of a natural language artificial intelligence assistant to deliver clinical follow-up to cataract surgery patients:proposal[J]. JMIR research protocols, 2021, 10(7):e27227.
[17] ABD-ALRAZAQ A A, RABABEH A, ALAJLANI M, et al. Effectiveness and safety of using chatbots to improve mental health:systematic review and meta-analysis[J]. Journal of medical internet research, 2020, 22(7):e16021.
[18] BRESO A, MARTINEZ-MIRANDA J, BOTELLA C, et al. Usability and acceptability assessment of an empathic virtual agent to prevent major depression[J]. Expert systems, 2016, 33(4):297-312.
[19] DENECKE K, ABD-ALRAZAQ A, HOUSEH M, et al. Evaluation metrics for health chatbots:a Delphi study[J]. Methods of information in medicine, 2021, 60(5/6):171-179.
[20] DENECKE K. Framework for guiding the development of highquality conversational agents in healthcare[J]. Healthcare, 2023, 11(8):1061.
[21] CHAGAS B A, PAGANO A S, PRATES R O, et al. Evaluating user experience with a chatbot designed as a public health response to the COVID-19 pandemic in Brazil:mixed methods study[J]. JMIR human factors, 2023, 10(1):e43135.
[22] BABEL M, MCGUIRE G, KING J. Towards a more nuanced view of vocal attractiveness[J]. PLoS one, 2014, 9(2):e88616.
[23] TUBIN C, MAZUCO RODRIGUEZ J P, DE MARCHI A C B. User experience with conversational agent:a systematic review of assessment methods[J]. Behaviour & information technology, 2022, 41(16):3519-3529.
[24] SIDAOUI K, JAAKKOLA M, BURTON J. Al feel you:customer experience assessment via chatbot interviews[J]. Journal of service management, 2020, 31(4):745-766.
[25] YOO C R, KIM S H, KIM J W. A comparative study of the use of intelligent personal assistant services experiences:Siri, Google assistant, Bixby[J]. Science of emotion and sensibility, 2020, 23(1):69-78.
[26] 谭孟华, 潘晓彦. 文本聊天机器人对话回复策略研究[J]. 软件, 2020, 41(9):51-55. (TAN M H, PAN X Y. Research on text chatbot conversation reply strategy[J]. Software, 2020, 41(9):51-55.)
[27] WEI Z, LANDAY J A. Evaluating speech-based smart devices using new usability heuristics[J]. IEEE pervasive computing, 2018, 17(2):84-96.
[28] SANCHEZ-ADAME L M, MENDOZA S, URQUIZA J, et al. Towards a set of heuristics for evaluating chatbots[J]. IEEE Latin America transactions, 2021, 19(12):2037-2045.
[29] LIU Q, HUANG J, WU L, et al. CBET:design and evaluation of a domain-specific chatbot for mobile learning[J]. Universal access in the information society, 2020, 19:655-673.
[30] TOBISCH V, FUNK M, EMFIEID A. Dealing with input uncertainty in automotive voice assistants[C]//12th international conference on automotive user interfaces and interactive vehicular applications. New York:ACM Digital Library, 2020:161-168.
[31] REIS A, PAULINO D, PAREDES H, et al. Using intelligent personal assistants to assist the elderlies an evaluation of Amazon Alexa, Google assistant, Microsoft Cortana, and Apple Siri[C]//20182nd international conference on technology and innovation in sports, health and wellbeing. Thessaloniki, Greece, 2018:1-5.
[32] DERIU J, RODRIGO A, OTRGI A, et al. Survey on evaluation methods for dialogue systems[J]. Artificial intelligence review, 2021, 54:755-810.
[33] ZWAKMAN D S, PAL D, ARPNIKANONDT C. Usability evaluation of artificial intelligence-based voice assistants:the case of Amazon Alexa[J]. SN computer science, 2021, 2(1):1-16.
[34] VERMA P, MURARI S. Interpreting voice assistant interaction quality from unprompted user feedback[C]//NeurIPS 2021 Workshop on Human Centered AI (HCAI). New York:Curran Associates, 2021:1-6.
[35] JIANG J, HASSAN AWADALLAH A, JONES R, et al. Automatic online evaluation of intelligent assistants[C]//Proceedings of the 24th International conference on World Wide Web. Republic and Canton of Geneva, CHE:international World Wide Web conferences steering committee, 2015:506-516.
[36] KLIMOVA B, IBNA SERAJ P M. The use of chatbots in university EFL settings:research trends and pedagogical implications[J]. Frontiers in psychology, 2023, 14:1146.
[37] LEE Y, SHIN D. A study on the online assessment using artificial intelligence for distance education[J]. Journal of learner-centered curriculum and instruction, 2020, 20(14):389-407.
[38] YEUB C S, GI M D. A study on the development of an automated algorithm using natural language toolkit (NLTK) and artificial intelligence (AI) chatbot for primary English vocabulary assessment[J]. Primary English education, 2020, 26(2):55-80.
[39] MIHALACHE A, POPOVIC M M, MUNI R H. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment[J]. JAMA ophthalmology, 2023, 141(6):589-597.
[40] GILSON A, SAFRANEK C W, HUANG T, et al. How does ChatGPT perform on the United States medical licensing examination? the implications of large language models for medical education and knowledge assessment[J]. JMIR medical education, 2023, 9(1):e45312.
[41] RUGGIANO N, BROWN E L, ROBERTS L, et al. Chatbots to support people with dementia and their caregivers:systematic review of functions and quality[J]. Journal of medical internet research, 2021, 23(6):e25006.
[42] JABIR A I, MARTINENGO L, LIN X, et al. Evaluating conversational agents for mental health:scoping review of outcomes and outcome measurement instruments[J]. Journal of medical Internet research, 2023, 25:e44548.
[43] CASTILLA E, ESCOBAR J J, VILLALONGA C, et al. HIGEA:an intelligent conversational agent to detect caregiver burden[J]. International journal of environmental research and public health, 2022, 19(23):16019.
[44] HUNGERBUEHLER I, DALEY K, CAVANAGH K, et al. Chatbot-based assessment of employees' mental health:Design process and pilot implementation[J]. JMIR formative research, 2021, 5(4):e21678.
[45] BERREZUETA-GUZMAN J, PAU I, MARTÍN-RUIZ M L, et al. Assessment of a robotic assistant for supporting homework activities of children with ADHD[J]. IEEE access, 2021, 9:93450-93465.
[46] KIM J, KIM I. Usability assessment of FHIR-based geriatric depression scale questionnaire using chatbot[J]. Journal of Kiise, 2020, 47(7):650-654.
Outlines

/