Research on Data Quality Governance in Open Sharing of Scientific Data

  • Sheng Xiaoping ,
  • Tian Jing ,
  • Xiang Guilin
Expand
  • 1 School of Library, Information and Archives, Shanghai University, Shanghai 200444;
    2 Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101

Received date: 2020-06-09

  Revised date: 2020-07-21

  Online published: 2020-11-20

Abstract

[Purpose/significance] In order to promote the effective implementation of open sharing of scientific data, this paper explores the data quality problems in open sharing of scientific data and its governance countermeasures.[Method/process] By means of normative analysis and causal analysis, this paper analyzed the data quality problems in open sharing of scientific data and the root causes of the problems, then constructd the governance model of open sharing of scientific data, finally proposed four types of governance countermeasures from the perspective of inducements.[Result/conclusion] The problems of data quality in open sharing of scientific data involve the accuracy, completeness, consistency, timeliness, reliability, relevance and open accessibility of scientific data. In order to solve the problems of scientific data quality and further promote the implementation of open sharing of scientific data, countermeasures for scientific data quality governance can be formulated from four aspects of policies and regulations, organizational managements, technologies and platforms, and stakeholders.

Cite this article

Sheng Xiaoping , Tian Jing , Xiang Guilin . Research on Data Quality Governance in Open Sharing of Scientific Data[J]. Library and Information Service, 2020 , 64(22) : 11 -24 . DOI: 10.13266/j.issn.0252-3116.2020.22.002

References

[1] LAMPOLTSHAMMER T J, SCHOLZ J. Open data as social capital in a digital society[M]//KAPFERER E, GSTACH I, KOCH A, et al. Rethinking social capital:global contributions from theory and practice. Newcastle upon Tyne:Cambridge Scholars Publishing, 2017:137-150.
[2] 张坦,黄伟,石勇.ISO 8000(大)数据质量标准及应用[J].大数据,2017(1):3-11.
[3] G8.G8 open data charter[EB/OL].[2020-06-06]. http://opendatacharter.net/wp-content/uploads/2015/10/opendatacharter-charter_F.pdf.
[4] 国务院办公厅.国务院办公厅关于印发科学数据管理办法的通知[EB/OL].[2020-06-06]. http://www.most.gov.cn/mostinfo/xinxifenlei/fgzc/gfxwj/gfxwj2018/201804/t20180404_139023.htm.
[5] 王志强,杨青海.科学数据质量及其标准化研究[J].标准科学,2019(3):25-30.
[6] MADINICK S, WANG R, XIAN X. The design and implementation of a corporate householding knowledge processor to improve data quality[J]. Journal of management information systems, 2004(1):41-49.
[7] ZUIDERWIJK A, JANSSEN M, CHOENNI S, et al. Socio-technical impediments of open data[J]. Electronic journal of e-government, 2012, 10(2):156-172.
[8] JANSSEN M, CHARALABIDIS Y, ZUIDERWIJK A. Benefits, adoption barriers and myths of open data and open government[J].Information systems management,2012,29(4):258-268.
[9] GEIGER J G. Data quality management:the most critical initiative you can implement[EB/OL].[2020-06-06]. https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/098-29.pdf.
[10] MORBEY G. Data quality for decision makers[M].2nd ed. Wiesbaden:Springer Gabler, 2013.
[11] 刘冰,庞琳.国内外大数据质量研究述评[J].情报学报,2019,38(2):217-226.
[12] KULIKOWSKI J L. Data quality assessment:problems and methods[J]. International journal of organizational and collective intelligence, 2014, 4(1):24-36.
[13] ISO/IEC 25012[EB/OL].[2020-06-06]. https://iso25000.com/index.php/en/iso-25000-standards/iso-25012.
[14] LEE W Y,PIPINO L L,FUNK J D, et al. Journey to data quality[M]. Cambridge:The MIT Press, 2006.
[15] HAUG A, ARLBJØRN J S. Barriers to master data quality[J]. Journal of enterprise information management, 2011, 24(3):288-303.
[16] OLSON J E. Data quality:the accuracy dimension[M].San Francisco:Morgan Kaufmann Publishers, 2003.
[17] WAND Y, WANG R. Anchoring data quality dimensions in ontological foundations[J]. Communications of the ACM, 1996, 39(11):86-95.
[18] BATINI C, SCANNAPIECO M. Data and information quality:dimensions, principles and techniques[M].Cham:Springer International Publishing AG, 2016.
[19] LI X, ZHAI J, ZHENG G, et al. Quality assessment for open government data in China[EB/OL].[2020-06-06]. https://dl.acm.org/doi/pdf/10.1145/3285957.3285962.
[20] KIM W, CHOI B J, HONG E, et al. A taxonomy of dirty data[J]. Data mining and knowledge discovery, 2003, 7(1):81-99.
[21] 李晓彤,翟军,郑贵福.我国地方政府开放数据的数据质量评价研究——以北京、广州和哈尔滨为例[J].情报杂志, 2018,37(6):141-145.
[22] CSÁKI C.Towards open data quality improvements based on root cause analysis of quality issues[J]. Lecture notes in computer science, 2018,11020:208-220.
[23] TAYI G K, BALLOU D P. Examining data quality[J]. Communications of the ACM, 1998, 41(2):54-57.
[24] 温亮明,张丽丽,黎建辉.大数据时代科学数据共享伦理问题研究[J].情报资料工作,2019,40(2):38-44.
[25] STAGARS M. Open data in Southeast Asia[M].Singapore:Palgrave Macmillan, 2016.
[26] 夏姚璜,邢文明.开放政府数据评估框架下的数据质量调查与启示[J].情报理论与实践, 2019, 42(8):44-49,66.
[27] DAMA International. Data management body of knowledge[M].2nd ed. Basking Ridge:Technics Publications, 2017.
[28] ZAVERI A, KONTOKOSTAS D, SHERIF M A, et al. User-driven quality evaluation of dbpedia[EB/OL].[2020-06-06]. http://svn.aksw.org/papers/2013/ISemantics_DBpediaDQ/public.pdf.
[29] BEHKAMAL B, KAHANI M, BAGHERI E, et al.A metrics-driven approach for quality assessment of linked open data[J]. Journal of theoretical and applied electronic commerce research, 2014,9(2):64-79.
[30] 王春山.数据质量管理在银行信用卡数据管理中的应用[D].广州:华南理工大学,2005.
[31] LARANJEIRO N, SOYDEMIR S N, Bernardino J. A survey on data quality:classifying poor data[EB/OL].[2020-06-06]. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7371861.
[32] 温芳芳.我国政府数据开放的政策体系构建研究[D].武汉:武汉大学,2019.
[33] SAMUEL-ROSA A, DALMOLIN R S D, MOURA-BUENO J M, et al. Open legacy soil survey data in Brazil:geospatial data quality and how to improve it[EB/OL].[2020-06-06]. http://www.revistas.usp.br/sa/article/view/160727/154973.
[34] SCHMIDT B, GEMEINHOLZER B, TRELOAR A. Open data in global environmental research:the Belmont Forum's open data survey[EB/OL].[2020-06-06]. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0146695.
[35] LEI Y, NIKOLOV A, UREN V, et al. Detecting quality problems in semantic metadata without the presence of a gold standard[EB/OL].[2020-06-06]. http://ceur-ws.org/Vol-329/paper06.pdf.
[36] 郭路生,刘春年.大数据时代应急数据质量治理研究[J].情报理论与实践,2016,39(11):101-105.
[37] 夏姚璜,邢文明.开放政府数据评估框架下的数据质量调查与启示[J].情报理论与实践, 2019, 42(8):44-49,66.
[38] CONRADIE P, CHOENNI S. On the barriers for local government releasing open data[J].Government information quarterly, 2014,31(S1):10-17.
[39] 盛小平,吴红,胡冰洁.科学数据开放共享障碍的实证分析[J].图书情报工作,2019,63(17):23-30.
[40] CAI L, ZHU Y. The challenges of data quality and data quality assessment in the big data era[EB/OL].[2020-06-06].https://datascience.codata.org/articles/10.5334/dsj-2015-002/.
[41] NI K, CHU H, ZENG L, et al.Barriers and facilitators to data quality of electronic health records used for clinical research in China:a qualitative study[EB/OL].[2020-06-06]. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6609143/pdf/bmjopen-2019-029314.pdf.
[42] 闫桂勋,刘蓓,程浩,等.数据共享安全框架研究[J].信息安全研究,2019,5(4):309-317.
[43] LEVIN N, LEONELLI S, WECKOWSKA D,et al. How do scientists define openness? exploring the relationship between open science policies and research practice[J].Bulletin of science, technology & society, 2016,36(2):128-141.
[44] 洪学海,王志强,杨青海.面向共享的政府大数据质量标准化问题研究[J].大数据,2017,3(3):44-52.
[45] KUULA A, BORG S. Open access to and reuse of research data-the state of the art in Finland[M]. Tampere:Finnish Social Science Data Archive,2008.
[46] PANHUIS W G V, PAUL P,EMERSON C, et al. A systematic review of barriers to data sharing in public health[EB/OL].[2020-06-06]. https://bmcpublichealth.biomedcentral.com/articles/10.1186/1471-2458-14-1144.
[47] 林焱.我国政府数据开放的元数据管理研究[D].武汉:武汉大学,2018:57.
[48] GADE S. LinkWiper-a system for data quality in linked open data[D]. Dearborn:University of Michigan-Dearborn, 2016.
[49] WRIGHL S, GENAL O. Data quality assessment[M].Bradley Beach:Technics Publications, LLC, 2007.
[50] ZUIDERWIJK A,JANSSEN M. Open data policies, their implementation and impact:a framework for comparison[J].Government information quarterly,2014,31(1):17-29.
[51] 王娟.国内外政府开放数据质量研究述评[J].图书馆理论与实践,2019(12):27-31.
[52] TSOUKALA V, ANGELAKI M, KALAITZI V, et al. Policy recommendations for open access to research data in Europe[EB/OL].[2020-06-06]. https://ec.europa.eu/newsroom/dae/document.cfm?doc_id=9958.
[53] VOULGARIS Z. Data scientist:the definitive guide to becoming a data scientist[M]. Basking Ridge:Technics Publications, 2014.
[54] OTTO B. On the evolution of data governance in firms:the case of Johnson & Johnson consumer products North America[M]//SADIQ S. Handbook of data quality:research and practice. Berlin:Springer-Verlag,2013:93-118.
[55] ZHANG J. Operationalizing data quality through data governance[M]//BHANSALI N. Data governance:creating value from information assets. Boca Raton:CRC Press,2014:65-92.
[56] IBM.What is data governance?[EB/OL].[2020-06-06]. https://www.ibm.com/analytics/data-governance.
[57] KOLTAY T. Quality of open research data:values, convergences and governance[EB/OL].[2020-06-06]. https://www.mdpi.com/2078-2489/11/4/175/pdf.
[58] FinDLaw Attorney Writers. Federal agencies subject to data quality act[EB/OL].[2020-06-06]. https://corporate.findlaw.com/law-library/federal-agencies-subject-to-data-quality-act.html.
[59] 宋立荣,彭洁.美国政府"信息质量法"的介绍及其启示[J].情报杂志,2012,31(2):12-18.
[60] Digital accountability and transparency act of 2014[EB/OL].[2020-06-06]. https://www.govinfo.gov/content/pkg/PLAW-113publ101/pdf/PLAW-113publ101.pdf.
[61] The Data Foundation. Data act 2022:changing technology, changing culture[EB/OL].[2020-06-06]. https://www2.deloitte.com/content/dam/Deloitte/us/Documents/public-sector/us-ps-data-act-2022.pdf.
[62] Foundations for evidence-based policymaking act of 2018[EB/OL].[2020-06-06]. https://www.congress.gov/115/plaws/publ435/PLAW-115publ435.pdf.
[63] 翟军,李昊然,孙小荃,等.美国《开放政府数据法》及实施研究[EB/OL].[2020-06-06]. http://kns.cnki.net/kcms/detail/11.1762.G3.20200317.1150.002.html.
[64] 国务院.国务院关于印发促进大数据发展行动纲要的通知[EB/OL].[2020-06-06]. http://www.gov.cn/zhengce/content/2015-09/05/content_10137.htm.
[65] 工业和信息化部.工业和信息化部关于印发大数据产业发展规划(2016-2020年)的通知[EB/OL].[2020-06-06]. http://www.miit.gov.cn/n1146285/n1146352/n3054355/n3057656/n5340632/c5465614/part/5465622.doc.
[66] 国家卫生健康委员会.关于印发国家健康医疗大数据标准、安全和服务管理办法(试行)的通知[EB/OL].[2020-06-06].http://www.cac.gov.cn/2018-09/15/c_1123432498.htm.
[67] 国防科工局,国家航天局.国防科工局国家航天局关于印发《月球与深空探测工程科学数据管理办法》的通知[EB/OL].[2020-06-06]. http://www.sastind.gov.cn/n4235/c6807016/content.html.
[68] 科技部基础研究司.科技基础性工作专项项目科学数据汇交管理办法(试行)[EB/OL].[2020-06-06]. http://www.most.gov.cn/tztg/201406/W020140625319357180895.doc.
[69] LOSHIN D. The practitioner's guide to data quality improvement[M]. Burlington:Morgan Kaufmann, 2011:120-121.
[70] MCGILVRAY D. Executing data quality projects:ten steps to quality data and trusted information[M]. San Francisco:Morgan Kaufmann, 2008.
[71] What is data profiling? process, best practices and tools[EB/OL].[2020-06-06]. https://panoply.io/analytics-stack-guide/data-profiling-best-practices/.
[72] KOOK Y, LEE J, PARK M, et al. Data quality management based on data profiling in e-government environments[C]//KIM T, ADELI H, ROBLES R J, et al.Advanced communication and networking. Berlin:Springer-Verlag,2011.
[73] ABEDJAN Z. Data profiling[M]//SAKR S, ZOMAYA A Y. Encyclopedia of big data technologies[M]. Basel:Springer Nature Switzerland AG, 2019:563-568.
[74] DAI W,WARDLAW I,CUI Y,et al. Data profiling technology of data governance regarding big data:review and rethinking[C]//DAI W,WARDLAW I,CUI Y,et al. Information technology:new generations. Berlin:Springer-Verlag, 2016:439-450.
[75] BERNERS-LEE T. Linked data[EB/OL].[2020-06-06]. http://www.w3.org/DesignIssues/LinkedData.html.
[76] BAUER F, KALTENBÖCK M. Linked open data:the essentials[EB/OL].[2020-06-06]. https://www.reeep.org/LOD-the-Essentials.pdf.
[77] Springer Nature. SN SciGraph:a linked open data platform for the scholarly domain[EB/OL].[2020-06-06]. https://www.springernature.com/gp/researchers/scigraph.
[78] HADHIATMA A.Improving data quality in the linked open data:a survey[EB/OL].[2020-06-06]. https://iopscience.iop.org/article/10.1088/1742-6596/978/1/012026/pdf.
[79] 盛小平,王毅.利益相关者在科学数据开放共享中的责任与作用[J].图书情报工作,2019, 63(17):31-39.
[80] 中国科学院.中国科学院关于印发《中国科学院科学数据管理与开放共享办法(试行)》的通知[EB/OL].[2020-06-06]. http://www.go.cas.cn/gzzd/xxhgz/201911/t20191101_4722182.html.
[81] NSF. Proposal & award policies & procedures guide (PAPPG)[EB/OL].[2020-06-06]. https://www.nsf.gov/pubs/policydocs/pappg20_1/nsf20_1.pdf.
[82] FOSTER J, MCLEOD J, NOLIN J, et al. Data work in context:value, risks, and governance[J]. Journal of the association for information science and technology, 2018, 69(12):1414-1427.
Outlines

/