研究论文

个人交通数据的敏感性识别与隐私计量研究

  • 臧国全 ,
  • 梁耀娣 ,
  • 柴文科 ,
  • 张盼盼 ,
  • 张恒苗
展开
  • 1 郑州大学信息管理学院, 郑州 450001;
    2郑州市数据科学研究中心, 郑州 450001
臧国全,教授,博士,博士生导师;梁耀娣,硕士研究生;柴文科,硕士研究生,通信作者,E-mail:chaiwenke2022@163.com;张盼盼,硕士研究生;张恒苗,本科生。

收稿日期: 2024-05-06

  修回日期: 2024-09-13

  网络出版日期: 2025-01-15

基金资助

本文系国家社会科学基金重大项目“政府数据的隐私风险计量与保护机制创新研究”(项目编号:21&ZD338)研究成果之一。

Research on Sensitivity Identification and Privacy Measurement of Personal Traffic Data

  • Zang Guoquan ,
  • Liang Yaodi ,
  • Chai Wenke ,
  • Zhang Panpan ,
  • Zhang Hengmiao
Expand
  • 1 School of Information Management, Zhengzhou University, Zhengzhou 450001;
    2 Research Institute of Data Science, Zhengzhou City, Zhengzhou 450001

Received date: 2024-05-06

  Revised date: 2024-09-13

  Online published: 2025-01-15

Supported by

This work is supported by the major project of the National Social science Fund of China, titled “Innovative research on privacy risk measurement and protection mechanism of government data” (Grant No. 21&ZD338).

摘要

[目的/意义] 个人交通数据虽然根据数据泄露或非法使用对个人权益造成的危害程度划分为三个级别,但目前缺失具体的定量分级标准和明确的分级结果,本研究识别交通敏感数据项,计量个人交通数据隐私值,为其数据分级提供理论依据。[方法/过程] 首先,归纳交通隐私文本类型,构建隐私文本库作为识别来源;其次,挖掘文本库,解析文本句法结构和语义要素,建立交通敏感词表作为识别模型;最后,分析影响交通数据隐私的因素,建立隐私计量模型,计算交通敏感数据隐私值。[结果/结论] 研究表明,个人行踪轨迹数据和个人交通违法数据(交通违法记录、交通诚信记录)的隐私值最高,个人交通支付数据(运输服务购买与罚款支付、交通保险支付)和个人基础数据(标识数据、半标识数据、生物鉴别数据、网络鉴别数据、财产数据、健康数据)的隐私值次之,个人交通运营数据(客运票据、交通卡)的隐私值最低。

本文引用格式

臧国全 , 梁耀娣 , 柴文科 , 张盼盼 , 张恒苗 . 个人交通数据的敏感性识别与隐私计量研究[J]. 图书情报工作, 2025 , 69(1) : 46 -57 . DOI: 10.13266/j.issn.0252-3116.2025.01.005

Abstract

[Purpose/Significance] Although personal traffic data is classified into three levels based on the degree to which data leakage or illegal use may harm individual rights and interests, there is currently a lack of specific quantitative basis and grading results. This study identifies personal traffic data items and measures the privacy value of personal traffic data, providing a theoretical basis for data grading. [Method/Process] Firstly, this paper classified traffic privacy text types and built a privacy text library as a source of identification. Secondly, it mined privacy text library, analyzed syntactic structure and language components, and established a traffic sensitive word list as a recognition model. Finally, it analyzed the factors affecting the privacy of traffic data, established a privacy measurement model, and measured the privacy value of traffic sensitive data. [Result/Conclusion] Research shows the ranking of personal traffic data privacy values from high to low is as follows. Personal track data and personal traffic violation data (traffic violation records, traffic integrity records) have the highest privacy value. Personal transportation payment data (transportation service purchase and fine payment, transportation insurance payment) and personal basic data (identification data, semi-identification data, biometric data, network identification data, property data, health data) have a secondary privacy value. Personal transportation operation data (passenger ticket, transportation card) have the lowest privacy value.

参考文献

[1] 互联网安全内参. 盘点: 全球交通行业十大网络安全事件[EB/OL]. [2024-10-20]. https://www.secrss.com/articles/26975. (Internal parameters of Internet security. Inventory: top 10 cybersecurity incidents in the global transportation industry[EB/OL]. [2024-10-20]. https://www.secrss.com/articles/26975.)
[2] 霍峥, 孟小峰. 轨迹隐私保护技术研究[J]. 计算机学报, 2011, 34(10): 1820-1830. (HUO Z, MENG X F. A survey of trajectory privacy-preserving techniques[J]. Chinese journal of computers, 2011, 34(10): 1820-1830.)
[3] 李瑞琴, 胡晓雅, 张倨源, 等. 车联网隐私保护技术研究[J]. 信息安全学报, 2024, 9(2): 1-18. (LI R Q, HU X Y, ZHANG J Y, et al. Research on privacy protection technology of internet of vehicles[J]. Journal of cyber security, 2024, 9(2): 1-18.)
[4] 徐振强, 王家耀, 杨卫东. 面向轨迹数据发布的隐私保护技术研究进展[J]. 测绘科学技术学报, 2018, 35(1): 87-93. (XU Z Q, WANG J Y, YANG W D. Research progress in privacy-preserving techniques for trajectory publication[J]. Journal of geomatics science and technology, 2018, 35(1): 87-93.)
[5] 黄景. 移动对象轨迹数据隐私保护技术研究[D]. 广州: 广东工业大学, 2023. (HUANG J. Research on privacy protection technology of trajectory data of moving objects[D]. Guangzhou: Guangdong University of Technology, 2023.)
[6] 吴爱荣. 政府交通出行数据利用中的数据伦理问题研究[D]. 保定: 河北大学, 2022. (WU A R. Thesis for the degree of master research on data ethics in government traffic trip data utilization[D]. Baoding: Hebei University, 2022.)
[7] CHEN B W, WU L B, WANG H Q, et al. A blockchain-based searchable public-key encryption with forward and backward privacy for cloud-assisted vehicular social networks[J]. IEEE transactions on vehicular technology, 2020, 69(6): 5813-5825.
[8] PAN S Z, KONG Y, LIU Q. Privacy-preserving traffic forecast scheme for intelligent transportation system[J]. International journal of embedded systems, 2020, 12(2): 243-252.
[9] 胡德敏, 廖正佳. m叉平均树的差分隐私位置隐私保护方法[J]. 小型微型计算机系统, 2019, 40(3): 538-544. (HU D M, LIAO Z J. Differential privacy of location privacy protection method for m-Tree average tree[J]. Journal of Chinese computer systems, 2019, 40(3): 538-544.)
[10] FU Y, YU Y, WU X P. A sensitive word detection method based on variants recognition [C]//2019 International conference on machine learning, big data and business intelligence. Taiyuan: IEEE, 2019: 47-52.
[11] 吴树芳, 尹凯. 基于敏感语义和复合共现的网络敏感词典构建研究[J]. 情报科学, 2023, 41(10): 12-20, 39. (WU S F, YIN K. The construction of network sensitive dictionary based on sensitive semantic and compound co-occurrence[J]. Information science, 2023, 41(10): 12-20, 39.)
[12] 李瀛, 王冠楠. 网络新闻敏感信息识别与风险分级方法研究[J]. 情报理论与实践, 2022, 45(4): 105-112. (LI Y, WANG G N. Research on identification and risk grading method of network news sensitive information[J]. Information studies: theory & application, 2022, 45(4): 105-112.)
[13] 谢小杰, 梁英, 董祥祥. 社交网络用户敏感属性迭代识别方法[J]. 山东大学学报(理学版), 2019, 54(3): 10-17, 27. (XIE X J, LIANG Y, DONG X X. Sensitive attribute iterative inference method for social network users[J]. Journal of Shandong University(natural science), 2019, 54(3): 10-17, 27.)
[14] XIONG P, LIANG L, ZHU Y L, et al. PriTxt: a privacy risk assessment method for text data based on semantic correlation learning[J]. Concurrency and computation: practice and experience, 2022, 34(5): e6680.
[15] 李姝, 张祥祥, 于碧辉, 等. 互联网新闻敏感信息识别方法的研究[J]. 小型微型计算机系统, 2021, 42(4): 685-689. (LI S, ZHANG X X, YU B H, et al. Research on sensitive information recognition of Internet news[J]. Journal of Chinese computer systems, 2021, 42(4): 685-689.)
[16] KIM W, PARK Y, SHIN J, et al. Consumer preference structure of online privacy concerns in an IoT environment[J]. International journal of market research, 2022, 64(5): 630-651.
[17] 臧国全, 贾瑞莹. 医疗数据中病种隐私的计量与分析[J]. 现代情报, 2020, 40(5): 161-168. (ZANG G Q, JIA R Y. Measurement and analysis of disease privacy in medical data[J]. Journal of modern information, 2020, 40(5): 161-168.
[18] GOAD D, COLLINS A T, GAL U. Privacy and the internet of things-an experiment in discrete choice[J]. Information & management, 2021, 58(2): 103292.
[19] PALINSKI M. Paying with your data privacy tradeoffs in ride-hailing services[J]. Applied economics letters, 2022, 29(18): 1719-1725.
[20] TAO S Z, LIU Y Z, SUN C H. Understanding information sensitivity perceptions and its impact on information privacy concerns in e-commerce services: insights from China[J]. Computers& security, 2024, 138(3): 103646.
[21] KANG J, LAN J Y, YAN H Y, et al. Antecedents of information sensitivity and willingness to provide[J]. Marketing intelligence & planning, 2022, 40(6): 787-803.
[22] SKATOVA A, MCDONALD R, MA S N, et al. Unpacking privacy: valuation of personal data protection[J]. Plos one, 2023, 18(5): e0284581.
[23] TAUB G, ELMALECH A, AHARONY N, et al. Monetary compensation and private information sharing in augmented reality applications[J]. Information, 2023, 14(6): 325.
[24] 臧国全, 王家振, 毕崇武, 等. 政府数据中敏感数据识别与隐私计量研究[J]. 图书情报工作, 2022, 66(15): 66-75. (ZANG G Q, WANG J Z, BI C W, et al. Research on sensitive date identification and privacy measurement in government date[J]. Library and information service, 2022, 66(15): 66-75.)
[25] 肖洋, 臧国全. 个人金融数据的敏感性识别与隐私计量研究[J]. 情报理论与实践, 2023, 46(9): 105-114, 86. (XIAO Y, ZANG G Q. Research on sensitivity indentification and privacy measurement of personal financial data[J]. Information studies: theory & application, 2023, 46(9): 105-114, 86.)
[26] 臧国全, 张盼盼, 柴文科, 等. 个人通信数据的敏感性识别与隐私计量研究[J]. 图书情报知识, 2024, 41(2): 110-120. (ZANG G Q, ZAHNG P P, CHAI W K, et al. Research on sensitivity indentification and privacy measurement of personal communication data[J]. Documentation, information & knowledge, 2024, 41(2): 110-120.)
[27] 臧国全, 柴文科, 张盼盼, 等. 个人教育数据的敏感性识别与隐私计量研究[J]. 情报理论与实践, 2024, 47(8): 84-94. (ZANG G Q, CHAI W K, ZHANG P P, et al. Research on sensitivity indentification and privacy measurement of personal education data[J]. Information studies:theory & application, 2023, 46(9): 105-114, 86.)
[28] 张凯亮, 臧国全, 肖洋. 医疗信息文本中的个人隐私数据识别与计量研究[J]. 情报学报, 2024, 43(8): 936-945. (ZHANG K L, ZANG G Q, XIAO Y. Research on privacy data identification and measurement based on medical information text[J]. Journal of the China Society for Scientific and Technical Information, 2024, 43(8): 936-945.)
[29] 臧国全, 周丽媛, 张凯亮, 等. 个人人社数据的敏感性识别与隐私计量研究[J/OL]. 现代情报, 1-18[2024-10-20]. http://kns.cnki.net/kcms/detail/22.1182.G3.20240904.1620.004.html. (ZANG G Q, ZHOU L Y, ZHANG K L, et al. Research on sensitivity indentification and privacy measurement of personal human resources and social security data[J]. Journal of modern information, 1-18[2024-10-20]. http://kns.cnki.net/kcms/detail/22.1182.G3.20240904.1620.004.html.)
[30] 赵金铭. 汉语句法结构与对外汉语教学[J]. 中国语文, 2010(3): 277-286, 288. (ZHAO J M. Chinese syntactic structure and teaching Chinese as a foreign language[J]. Studies of the Chinese language, 2010(3): 277-286, 288.)
[31] 乐林. 浅议现代汉语句法结构与语义结构的关系[J]. 汉字文化, 2018(5): 47. (LE L. A brief discussion on the relationship between the syntactic and semantic structures of modern chinese[J]. Sinogram culture, 2018(5): 47.)
[32] 王心玥, 赵丹群. 引文情感识别研究进展及评述[J]. 情报理论与实践, 2024, 47(1): 173-181, 189. (WANG X Y, ZHAO D Q. Review on progress of citation sentiment identification [J]. Information studies: theory & application, 2024, 47(1): 173-181, 189.)
[33] ZIPF G K. The psycho-biology of language: an introduction to dynamic philology[M]. London: George Routledge & Sons, 1935.
[34] HERDAN G. The advanced theory of language as choice and chance[M]. Berlin: Springer-Verlag, 1969.
[35] 冯志伟. 用计量方法研究语言[J]. 外语教学与研究, 2012, 44(2): 256-269, 321. (FENG Z W. Study language in quantitative ways[J]. Foreign language teaching and research, 2012, 44(2): 256-269, 321.)
[36] 邱遥堃. 行踪轨迹信息的法律保护意义[J]. 法律适用, 2018(7): 43-50. (QIU Y K. The legal protection significance of travel trajectory information[J]. Journal of law application, 2018(7): 43-50.)
文章导航

/