图书情报工作 ›› 2022, Vol. 66 ›› Issue (15): 66-75.DOI: 10.13266/j.issn.0252-3116.2022.15.007

• 情报研究 • 上一篇    下一篇

政府数据中敏感数据识别与隐私计量研究

臧国全1,2, 王家振1, 毕崇武1,2, 耿瑞利1,2   

  1. 1 郑州大学信息管理学院 郑州 450001;
    2. 郑州市数据科学研究中心 郑州 450001
  • 收稿日期:2022-01-13 修回日期:2022-04-28 出版日期:2022-08-05 发布日期:2022-08-17
  • 通讯作者: 毕崇武,讲师,硕士生导师,通信作者,E-mail:767818984@qq.com。
  • 作者简介:臧国全,院长,教授,博士生导师;王家振,硕士研究生;耿瑞利,副教授,硕士生导师。
  • 基金资助:
    本文系国家社会科学基金重大项目"政府数据的隐私风险计量与保护机制创新研究"(项目编号:21&ZD338)研究成果之一。

Research on Sensitive Data Identification and Privacy Measurement in Government Data

Zang Guoquan1,2, Wang Jiazhen1, Bi Chongwu1,2, Geng Ruili1,2   

  1. 1. School of Information Management, Zhengzhou University, Zhengzhou 45001;
    2. Research Institute of Data Science, Zhengzhou City, Zhengzhou 450001
  • Received:2022-01-13 Revised:2022-04-28 Online:2022-08-05 Published:2022-08-17

摘要: [目的/意义]通过分析政府数据隐私相关文本,设计敏感数据识别方案,构建隐私计量模型,计量敏感数据的隐私值,为政府数据隐私保护提供理论依据。[方法/过程]首先筛选政府数据隐私的相关文本构建样本库;然后依据文本的句法结构,抽取敏感数据项、核心动词、程度词、否定词等词汇,构建政府数据隐私语义词表;最后以上述词汇组成的敏感数据单元为基础,构建隐私计量模型。[结果/结论]该方法基于隐私相关文本,准确析出政府数据的敏感数据,客观计量政府数据对象的隐私值,可为政府数据的隐私风险防范及隐私保护规范化提供支持。

关键词: 政府数据, 数据隐私, 个人隐私, 语义词表, 隐私计量

Abstract: [Purpose/Significance] Through the analysis of government data privacy related texts, designing sensitive data identification scheme, building a privacy measurement model, and measuring the privacy value of sensitive data, this paper provides a theoretical basis for government data privacy protection. [Method/Process] First, filtered the relevant text of government data privacy to build a sample library; Then, according to the syntactic structure of the text, words such as sensitive data items, core verbs, degree words, negative words were extracted, it constructed the semantic vocabulary of government data privacy; Finally, based on the sensitive data unit composed of the above words, it constructed privacy measurement model. [Result/Conclusion] This method is based on privacy related texts, accurately extracts the sensitive data of government data, objectively measures the privacy value of government data objects, and provides support for the privacy risk prevention and privacy protection standardization of government data.

Key words: government data, data privacy, personal privacy, semantic vocabulary, privacy measurement

中图分类号: