理论研究

计算领域的论文数据共享与可重复性问题分析

  • 于倩倩 ,
  • 孟银涛 ,
  • 钱力 ,
  • 刘会洲
展开
  • 1 中国科学院文献情报中心 北京 100190;
    2 中国科学院大学经济与管理学院信息资源管理系 北京 100190;
    3 国家新闻出版署学术期刊新型出版与知识服务重点实验室 北京 100090;
    4 国际关系学院图书馆 北京 100091;
    5 中国科学院过程工程研究所 北京 100190
于倩倩,副研究馆员,博士研究生;孟银涛,副研究馆员,硕士;钱力,正高级工程师,博士,通信作者,E-mail:qianl@mail.las.ac.cn;刘会洲,研究员,博士。

收稿日期: 2023-12-18

  修回日期: 2024-02-29

  网络出版日期: 2024-09-12

基金资助

本文系中国科学院文献情报中心青年骨干人才项目“面向科技文献内容挖掘的通用实体语料库建设研究”(项目编号:E3550201)研究成果之一。

Analysis of Data Sharing and Reproducibility Problem in Computing Field*

  • Yu Qianqian ,
  • Meng Yintao ,
  • Qian Li ,
  • Liu Huizhou
Expand
  • 1 National Science Library, Chinese Academy of Sciences, Beijing 100190;
    2 Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190;
    3 Key Laboratory of New Publishing and Knowledge Services for Scholarly Journals, Beijing 100190;
    4 University of International Relations Library, Beijing 100091;
    5 Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190

Received date: 2023-12-18

  Revised date: 2024-02-29

  Online published: 2024-09-12

Supported by

This work is supported by Youth Backbone Talents Project of National Science Library, CAS titled “ Research on the Construction of General Entity Corpus for Scientific Content Mining “ (Grant No. E3550201)

摘要

[目的/意义] 计算可重复是可靠和可信研究的基石,对计算领域的论文数据共享政策、数据可用性和可重复性方法进行调查分析,为促进数据共享及解决计算可重复问题提供参考。[方法/过程] 采用网站调研法和内容分析法,分析期刊、会议的数据共享政策;采用网络爬虫法,获取期刊论文数据的可用性声明,对数据可用性进行分析;并梳理和总结计算可重复的方法。[结果/结论] 计算领域大多数期刊、超半数会议有数据共享政策,但数据共享态度的强度还有待提升。越是高水平的期刊或会议,越可能有数据共享政策。相较于期刊关注数据共享问题,会议更关注计算可重复问题。数据可用性声明促进了数据共享,但数据共享实践与数据共享政策要求还存在差距。计算可重复方法包括鼓励或要求数据共享、专家审查、设置奖励、设置论文提交清单、征集可重复性论文等。

本文引用格式

于倩倩 , 孟银涛 , 钱力 , 刘会洲 . 计算领域的论文数据共享与可重复性问题分析[J]. 图书情报工作, 2024 , 68(17) : 3 -15 . DOI: 10.13266/j.issn.0252-3116.2024.17.001

Abstract

[Purpose/Significance] Computational reproducibility is the cornerstone of reliable and credible research. To investigate and analyze data policies, data availability and reproducibility methods of journals and conferences in computing field can provide references for promoting data sharing and solving computational reproducibility issues. [Method/Process] With web survey research and content analysis method, it analyzed journal data policies and conference data policies. It used web crawler to obtain data availability statement of journal article and analyze the current situation of data availability. Then, it summarized computational reproducibility methods. [Result/Conclusion] Most journals and more than half of the conferences in computing field have data policies, but the intensity of data sharing attitudes still needs to be improved. Journals or conferences with higher level are more likely to have data policies. Compared with journals focusing on data sharing issues, conferences pay more attention to the problem of computational reproducibility. The data availability statement promotes data sharing, but there is still a gap between data sharing practices and data sharing policy requirements. Computational reproducibility methods include data sharing, expert review, setting rewards, paper submission checklist, calling for reproducibility papers and so on.

参考文献

[1] ZIEMANN M, POULAIN P, BORA A. The five pillars of computational reproducibility: bioinformatics and beyond[J]. Briefings in bioinformatics, 2023, 24(6): 1-13.
[2] Artificial intelligence faces reproducibility crisis[EB/OL]. [2024-04-07]. https://www.science.org/doi/10.1126/science.359.6377.725.
[3] Moving towards reproducible machine learning[J]. Nature computational science, 2021, 1(10): 629-630.
[4] HARDWICKE T E, MATHUR M B, MACDONALD K, et al. Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition[J]. Royal society open science, 2018, 5(8): 180448.
[5] MCGUINNESS L A, SHEPPARD A L. A descriptive analysis of the data availability statements accompanying medRxiv preprints and a comparison with their published counterparts[J]. PLoS one, 2021, 16(5): e0250887.
[6] ROUSI A M, LAAKSO M. Journal research data sharing policies: a study of highly-cited journals in neuroscience, physics, and operations research[J]. Scientometrics, 2020, 124: 131-152.
[7] JACKSON B. Open data policies among library and information science journals[J]. Multidisciplinary digital publishing institute, 2021, 9(2): 25.
[8] KIM J, BAI S Y. Status and factors associated with the adoption of data sharing policies in Asian journals[J]. Science editing, 2022, 9(2): 97-104.
[9] 秦长江, 吴思洁, 王丹丹. 社科学术期刊科研数据政策研究——基于国外六个学科代表性期刊的分析[J]. 中国科技期刊研究, 2022, 33(3): 338-344. (QIN C J, WU S J, WANG D D. Research data policies of social science journals: an investigation of foreign representative journals of six disciplines[J]. Chinese journal of scientific and technical periodicals, 2022, 33(3): 338-344.)
[10] JONES L, GRANT R, HRYNASZKIEWICZ I. Implementing publisher policies that inform, support and encourage authors to share data: two case studies[J]. Insights: the UKSG journal, 2019, 32(1): 1-11.
[11] 王丹丹, 刘清华, 葛力云. Springer Nature科研数据政策标准化工作实践及启示[J]. 图书情报工作, 2020, 64(18): 137-145. (WANG D D, LIU Q H, GE L Y. Springer Nature’s practice for research data policy standardization and its inspiration[J]. Library and information service, 2020, 64(18): 137-145.)
[12] 王德庄, 姜鑫.国外学术期刊科学数据政策质性分析与内容要素研究[J]. 中国科技期刊研究, 2022, 33(8): 1088-1097. (WANG D Z, JIANG X. Qualitative analysis and content elements of scientific data policies of foreign academic journals[J]. Chinese journal of scientific and technical periodicals, 2022, 33(8): 1088-1097.)
[13] 周志超, 郑洁, 黄应申, 等. 国内外科研数据共享和重用政策文本对比分析[J]. 图书馆学研究, 2023(10): 41-51, 26. (ZHOU Z C, ZHENG J, HUANG Y S, et al. A comparative analysis of domestic and foreign research data sharing and reuse policy texts[J]. Research on library science, 2023(10): 41-51, 26.)
[14] 梁静, 文奕. 基于文献出版视角的文献代码关联发布现状研究[J]. 图书情报工作, 2022, 66(15): 140-147. (LIANG J, WEN Y. Research on the current situation of related release of literature and codes based on the perspective of document publishing[J]. Library and information service, 2022, 66(15): 140-147.)
[15] KIM J, KIM S, CHO H M, et al. Data sharing policies of journals in life, health, and physical sciences indexed in Journal Citation Reports[J]. PeerJ, 2020, 8: e9924.
[16] WANG Y, CHEN B, ZHAO L, et al. Research data policies of journals in the Chinese science citation database based on the language, publisher, discipline, access model and metrics[J]. Learned publishing, 2022, 35(1): 30-45.
[17] 汪汇源, 赵云龙, 陈希用, 等. 国内外英文农业科技期刊数据出版政策分析与启示[J]. 中国科技期刊研究, 2023, 34(11): 1458-1466. (WANG H Y, ZHAO Y L, CHEN X Y, et al. Analysis and enlightenment of data publication policies of English journals of agricultural science in China and abroad[J]. Chinese journal of scientific and technical periodicals, 2023, 34(11): 1458-1466.)
[18] 孔丽华, 习妍, 姜璐璐. 科技期刊关联数据开放共享及出版政策研究[J]. 中国科技期刊研究, 2022, 33(2): 192-199. (KONG L H, XI Y, JIANG L L. Open sharing and publishing policies for research data of scientific journals[J]. Chinese journal of scientific and technical periodicals, 2022, 33(2): 192-199.)
[19] GRAF C, FLANAGAN D, WYLIE L, et al. The open data challenge: an analysis of 124,000 data availability statements and an ironic lesson about data management plans[J]. Data intelligence, 2020, 2(4): 554-568.
[20] JENKINS T, PERSAUD B, COWGER W, et al. Current state of microplastic pollution research data: trends in availability and sources of open data[J]. Frontiers in environmental science, 2022, 10: 912107.
[21] GABELICA M, BOJCIC R, PULJAK L. Many researchers were not compliant with their published data sharing statement: a mixed-methods study[J]. Journal of clinical epidemiology, 2022, 150: 33-41.
[22] JIAO C, LI K, FANG Z. Data sharing practices across knowledge domains: a dynamic examination of data availability statements in Plos one publications[J]. Journal of information science, 2022: 01655515221101830.
[23] 刘桂锋, 王清炫, 韩牧哲. 期刊论文支撑数据FAIR原则的应用评估与案例分析[J]. 现代情报, 2024, 44(2): 17-29. (LIU G F, WANG Q X, HAN M Z. Application evaluation and case analysis of fair principle for supporting data in journal papers[J]. Journal of modern information, 2024, 44(2): 17-29.)
[24] 秦长江, 吴思洁, 王丹丹. 我国社科学术期刊科研数据状况分析——国家社会科学基金资助的CSSCI论文的调查[J]. 中国科技期刊研究, 2022, 33(4): 478-486. (QIN C J, WU S J, WANG D D. Status quo of research data of Chinese social science journals: an investigation of CSSCI papers funded by National Social Science Foundation of China[J]. Chinese journal of scientific and technical periodicals, 2022, 33(4): 478-486.)
[25] COLLBERG C, PROEBSTING T A. Repeatability in computer systems research[J]. Communications of the ACM, 2016, 59(3): 62-69.
[26] STODDEN V, SEILER J, MA Z. An empirical analysis of journal policy effectiveness for computational reproducibility[J]. Proceedings of the National Academy of Sciences, 2018, 115(11): 2584-2589.
[27] GUNDERSEN O E, KJENSMO S. State of the art: reproducibility in artificial intelligence[C]//AAAI. Proceedings of the AAAI conference on artificial intelligence. Palo Alto: AAAI Press, 2018: 1644-1651.
[28] SEIBOLD H, CZERNY S, DECKE S, et al. A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses[J]. PLOS ONE, 2022, 17(5): e0269047.
[29] SHENOUDA J, BAJWA W U. A guide to computational reproducibility in signal processing and machine learning[J]. IEEE signal processing magazine, 2021, 40(2): 141-151.
[30] 严长奇. 可重复性度量研究—解决可重复性危机的新进路[D]. 泉州: 华侨大学, 2023. (YAN C Q. The studies on measurement of replicability: a new approach to solve replicability crisis[D]. Quanzhou: Huaqiao University, 2023.)
[31] 王阳, 肖昆. 论控制偏见的编辑制度革命——关于预注册遏制可重复性危机的机理研究[J]. 科学学研究, 2022, 40(4): 594-601, 664. (WANG Y, XIAO K. Collaboration and evolution of intellectual property management capabilities in patent - intensive industries[J]. Studies in science of science, 2022, 40(4): 594-601, 664.)
[32] 陈序文, 姚长青, 雷雪. 学术出版视角下科研成果可重复性保障机制研究[J]. 中国科技期刊研究, 2023, 34(11): 1451-1457. (CHEN X W, YAO C Q, LEI X. Reproducibility safeguard mechanism of scientific research results from the perspective of academic publishing[J]. Chinese journal of scientific and technical periodicals, 2023, 34(11): 1451-1457.)
[33] Research data guidelines [EB/OL]. [2024-04-23]. https://beta.elsevier.com/researcher/author/tools-and-resources/research-data/data-guidelines?trial=true.
[34] Data-sharing-policies-in-Chinese[EB/OL]. [2024-04-23]. https://authorservices.taylorandfrancis.com/wp-content/uploads/2019/04/Data-sharing-policies-in-Chinese.pdf.
[35] Data sharing policies[EB/OL]. [2024-04-23]. https://authorservices.wiley.com/author-resources/Journal-Authors/open-access/data-sharing-citation/data-sharing-policy.html.
[36] Research reproducibility[EB/OL]. [2024-04-23]. https://journals.ieeeauthorcenter.ieee.org/create-your-ieee-journal-article/research-reproducibility/.
[37] Artifact review and badging version 1.1[EB/OL]. [2024-04-23]. https://www.acm.org/publications/policies/artifact-review-and-badging-current.
[38] Call for research track papers [EB/OL]. [2024-04-23]. https://iswc2023.semanticweb.org/call-for-research-track-papers/.
[39] PVLDB volume 16-- submission guidelines[EB/OL]. [2023-12-08]. https://vldb.org/pvldb/volumes/16/submission/.
[40] Research data[EB/OL]. [2023-12-08]. https://www.sciencedirect.com/journal/neurocomputing/publish/guide-for-authors.
[41] Author guidelines[EB/OL]. [2023-12-09]. https://ietresearch.onlinelibrary.wiley.com/hub/journal/17519640/homepage/author-guidelines.
[42] ZHANG Y, SHIRAKAWA M, HARA T. Predicting temporary deal success with social media timing signals[J]. Journal of intelligent information systems, 2022, 59(1): 1-19.
[43] XU M, YUE P, YU F, et al. Multi-agent reinforcement learning to unify order-matching and vehicle-repositioning in ride-hailing services[J]. International journal of geographical information science, 2023, 37(2): 380-402.
[44] SHU H, PEI T, SONG C, et al. Density-based clustering for bivariate-flow data[J]. International journal of geographical information science, 2022, 36(9): 1809-1829.
[45] Reviewer guidelines[EB/OL]. [2024-04-23]. https://2023.ecmlpkdd.org/organisation/reviewer-guidelines/.
[46] ACM SIGMOD ARI [EB/OL]. [2024-04-23]. https://reproducibility.sigmod.org/#process.
[47] PVLDB reproducibility[EB/OL]. [2024-04-23]. https://vldb.org/pvldb/reproducibility/.
[48] ACM SIGMOD best artifact award[EB/OL]. [2024-04-24]. https://reproducibility.sigmod.org/#awards.
[49] Industry impact award[EB/OL]. [2024-04-24]. https://ecir2022.org/industry-impact-award/.
[50] Data, source code, and reproducibility[EB/OL]. [2024-04-24]. https://icdm22.cse.usf.edu/calls/Papers.html.
[51] Reproducibility checklist[EB/OL]. [2024-04-24]. https://aaai.org/conference/aaai/aaai-23/reproducibility-checklist/.
[52] Call for reproducibility papers[EB/OL]. [2024-04-24]. https://ecir2022.org/calls/reproducibility/.
[53] Call for reproducibility track papers[EB/OL]. [2024-04-24]. https://sigir.org/sigir2023/submit/call-for-reproducibility-track-papers/.
文章导航

/