[目的/意义] 针对实践中数据故事应包含哪些内容、创作流程是什么等问题,提出一种数据故事生成方法,以期为数据故事的创作提供理论指导。[方法/过程] 在前人的研究基础上,基于数据科学、认知科学、自然语言处理和可解释性机器学习等理论,提出一种面向局部可解释性机器学习的数据故事生成方法,该方法对数据故事的生成步骤和创作方式进行详细的阐述和说明。同时对LIME算法的输出进行改进,使其更易理解。在此基础上对提出的数据故事化方法进行案例实现,以验证方法的可行性。[结果/结论] 提出的数据故事生成方法有助于丰富数据故事化研究的理论体系,同时为数据故事的生成研究和数据故事化工具的研发提供一定的启示。
[Purpose/Significance] Data story has aroused extensive attention and application. Current research mainly focuses on theory such as the meaning or the model of data story, while there are lack of attention to practical problems such as what the data story should contain and what the creation process is. Therefore, this paper proposes a data story generation method so as to provide theoretical guidance for the creation of data stories.[Method/Process] Based on previous research, and according to theories of data science, cognitive science, natural language processing and interpretable machine learning, a method of data story generation for local interpretable machine learning was proposed and this method explained the generating steps of data story and creating methods in detail. At the same time, the output of the LIME algorithm has been improved to make it easier to understand. On this basis, a case implementation of the proposed data storytelling method was carried out to verify the feasibility of the method.[Result/Conclusion] The data story generation method proposed in this paper enriches the theory system of data storytelling research, and provides some enlightenment for the research on the generation of data stories and the development of data storytelling tools.
[1] 朝乐门.数据科学理论与实践[M].北京:清华大学出版社, 2017.
[2] Oxford English dictionary.Story[EB/OL].[2022-01-23].http://www.oed.com/view/Entry/190981?rskey=Wrp9f3&result=1.
[3] Literaryterms.Story[EB/OL].[2022-01-23].https://literaryterms.net/story/.
[4] MILLER E.Theories of story and storytelling[EB/OL].[2022-01-23].https://www.storytellinginstitute.org/.
[5] ZACKS J M, TVERSKY B.Event structure in perception and conception[J].Psychological bulletin, 2001, 127(1):3-21.
[6] MANNING C.Ergativity:argument structure and grammatical relations[J].Dissertation abstracts international, 1995, 56(1):178.
[7] YANG Y, CARBOMELL J G, BROWN R D, et al.Learning approaches for detecting and tracking news events[J].IEEE intelligent systems and their applications, 1999, 14(4):32-43.
[8] 刘宗田, 黄美丽, 周文, 等.面向事件的本体研究[J].计算机科学, 2009, 36(11):189-192.
[9] 朝乐门.数据故事的自动生成与工程化研发[J].情报资料工作, 2021, 42(2):53-62.
[10] LEE B, RICHE N H, ISENBERG P, et al.More than telling a story:transforming data into visually shared stories[J].IEEE computer graphics and applications, 2015, 35(5):84-90.
[11] DYKES B.Data storytelling:the essential data science skill everyone needs[EB/OL].[2022-10-04].https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-datascience-skill-everyone-needs/?sh=7624b9e752ad.
[12] 朝乐门, 张晨.数据故事化:从数据感知到数据认知[J].中国图书馆学报, 2019, 45(5):61-78.
[13] Nugit Ltd Pte.What is data story telling[EB/OL].[2022-10-04].https://www.nugit.co/what-is-data-storytelling/#:~:text=Data%20storytelling%20is%20a%20methodology%20 for%20communicating%20information%2C, data%20analysis%20 and%20arguably%20the%20most%20important%20aspect.
[14] Microsoft.What is data storytelling[EB/OL].[2022-09-27].https://powerbi.microsoft.com/en-us/data-storytelling/.
[15] Narrative science.What is data storytelling and why should you care[EB/OL].[2022-09-27].https://narrativescience.com/datastorytelling.
[16] FRENCH K.Why data storytelling is marketing gold for your brand[EB/OL].[2022-09-27].https://www.columnfivemedia.com/data-storytelling-brands-data-visualization/.
[17] KOCAMAN-KAROGLU A.Telling stories digitally:an experiment with preschool children[J].Educational media international, 2015, 52(4):340-352.
[18] GAYLE C, KORI S, LAUREL J.Storytelling in the digital age:engaging learners for cognitive and affective gains[J].The international journal of technology, knowledge, and society, 2012(8):113-119.
[19] JEREMY B, FRANCOISE D, JEAN-DANIEL F.Storytelling in information visualizations:does it engage users to explore data[C]//Proceedings of the 33rd annual ACM conference on human factors in computing systems.Seoul:ACM, 2015:1449-1458.
[20] SEGEL E, HEER J.Narrative visualization:telling stories with data[J].IEEE transactions on visualization and computer graphics, 2011, 16(6):1139-1148
[21] STOLPER C D, LEE B, et al.Emerging and recurring data-driven storytelling techniques:analysis of a curated collection of recent stories[EB/OL].[2022-10-20].https://www.microsoft.com/en-us/research/publication/emerging-and-recurring-datadriven-storytelling-techniques-analysis-of-a-curated-collectionof-recent-stories/.
[22] FREYTAG G.Technique of the drama:an exposition of dramatic composition and art[M].California:University Press of the Pacific, 2004.
[23] ZAWADZKI J.Storytelling for data scientists:turn data into stories to persuade your audience[EB/OL].[2022-09-27].https://towardsdatascience.com/storytelling-for-data-scientists-317c2723aa31.
[24] CHIP H.Made to Stick:Why some ideas survive and others die[M].New York:Random House, 2007.
[25] REN D, BREHMER M, LEE B, et al.Chartaccent:annotation for data-driven storytelling[C]//IEEE pacific visualization symposium.Seoul:IEEE, 2017:230-239.
[26] AMINI F, RICHE N.H, LEE B, et al.Authoring data-driven videos with DataClips[J].IEEE transactions on visualization and computer graphics, 2017, 23(1):501-510.
[27] BREHMER M, LEE B.Timeline storyteller:the design & deployment of an interactive authoring tool for expressive timeline narratives[C/OL].[2022-10-20].https://www.microsoft.com/en-us/research/uploads/prod/2018/12/TSCJ2019.pdf.
[28] KIM N, RICHE N, BACH B, et al.DataToon:drawing dynamic network comics with pen touch interaction[C/OL].[2022-10-20].https://vcg.seas.harvard.edu/publications/datatoon-drawing-dynamicnetwork-comics-with-pen-touch-interaction-nam-wook-kim.
[29] SHI D, XU X, SUN F, et al.Calliope:automatic visual data story generation from a spreadsheet[J].IEEE trans vis comput graph, 2021, 27(2):453-463.
[30] SATYANARAYAN A, HEER J.Authoring narrative visualizations with ellipsis[J].Computer graphics forum:journal of the european association for computer graphics, 2014, 33(3):361-370.
[31] HUMPHREY O, OBIE, CASLON C, et al.Authoring logically sequenced visual data stories with gravity[J].Journal of computer languages, 2020, 58:13.
[32] 朝乐门, 张晨, 孙智中.数据科学进展:核心理论与典型实践[J].中国图书馆学报, 2022, 48(1):77-93.
[33] 纪守领, 李进锋, 杜天宇, 等.机器学习模型可解释性方法、应用与安全研究综述[J].计算机研究与发展, 2019, 56(10):2071-2096.
[34] CHRISTOPH M.Interpretable machine learning[EB/OL].[2022-09-05].https://christophm.github.io/interpretable-ml-book/.
[35] 杨心德, 王小康.认知心理学视野中的认知负荷理论[J].宁波大学学报:教育科学版, 2007, 29(3):5.
[36] CHARLES J F, CHRISTOPHER R J, et al.Background to framenet[J].International journal of lexicography, 2003, 16(3):235-250.
[37] TOMKINS, SILVAN S.Script theory:differential magnification of affects[J].Nebraska symposium on motivation, 1978, 26:201-236.
[38] CHAMBERS N, JURAFSKY D.Unsupervised learning of narrative event chains[C]//Proceedings of ACL-08.Columbus, Ohio:Association for Computational Linguistics, 2008:789-797.
[39] DING X, LI Z, LIU T, et al.ELG:an event logic graph[J].arXiv:1907.08015, 2019.[2022-10-20].http://arxiv.org/pdf/1907.0805v2.
[40] 张晨, 朝乐门, 孙智中.数据故事叙述的关键技术研究[J].情报资料工作, 2021, 42(2):73-80.
[41] 黄希庭.心理学导论(第2 版)(普通高等教育十一五国家级规划教材)[M].北京:人民教育出版社, 2007.
[42] CLARK J M, PAIVIO A.Dual coding theory and education[J].Educational psychology review, 1991, 3(3):149-170.
[43] Kaggle.HR analytics:job change of data scientists[EB/OL].[2022-09-07].https://www.kaggle.com/arashnic/hr-analyticsjob-change-of-data-scientists.