[Purpose/Significance] Through the analysis of government data privacy related texts, designing sensitive data identification scheme, building a privacy measurement model, and measuring the privacy value of sensitive data, this paper provides a theoretical basis for government data privacy protection. [Method/Process] First, filtered the relevant text of government data privacy to build a sample library; Then, according to the syntactic structure of the text, words such as sensitive data items, core verbs, degree words, negative words were extracted, it constructed the semantic vocabulary of government data privacy; Finally, based on the sensitive data unit composed of the above words, it constructed privacy measurement model. [Result/Conclusion] This method is based on privacy related texts, accurately extracts the sensitive data of government data, objectively measures the privacy value of government data objects, and provides support for the privacy risk prevention and privacy protection standardization of government data.
[1] 黄如花,吴子晗.中国政府数据开放共享政策的计量分析[J].情报资料工作,2017(5):6-12.
[2] 杜荷花.我国政府数据开放平台隐私保护评价体系构建研究[J].情报杂志,2020,39(3):172-179.
[3] 赵金旭,郑跃平.中国电子政务隐私保护问题探究——基于70个大中城市政府网站的"隐私声明"调查[J].电子政务,2016(7):81-93.
[4] 商希雪,韩海庭.政府数据开放中个人信息保护路径研究[J].电子政务,2021(6):113-124.
[5] 丁红发,孟秋晴,王祥,等.面向数据生命周期的政府数据开放的数据安全与隐私保护对策分析[J].情报杂志,2019,38(7):151-159.
[6] 张聪丛,郜颍颍,赵畅,等.开放政府数据共享与使用中的隐私保护问题研究——基于开放政府数据生命周期理论[J].电子政务,2018(9):24-36.
[7] KIM J, NAM C, KIM S. The economic value of personal information and policy implication[C]// Proceedings of the 26th Europe an regional ITS conference. Los Angeles: ITS Press,2015.
[8] 臧国全,贾瑞莹.医疗数据中病种隐私的计量与分析[J].现代情报,2020,40(5):161-168.
[9] HANN I H, KAI L H, LEE T, et al. Online information privacy: measuring the cost-benefit trade-off[EB/OL]. [2021-12-30]. https://aisel.aisnet.org/icis2002/1.
[10] EGELMAN S, FELT A P, WAGNER D. Choice architecture and smartphone privacy: there's a price for that[C]// BÖHME R. The economics of information security and privacy. Heidelberg: Springer,2013:211-236.
[11] 邓胜利,赵海平.信息泄露情境下的个人信息价值评估及个体差异:基于离散选择模型的实证研究[J].情报学报,2019,38(3):266-276.
[12] 黄逸珺,陆桐,闫强.电子商务网站个人信息价值评估[J].北京邮电大学学报(社会科学版),2017,19(5):33-41.
[13] BENNDORF V, NORMANN H T. The willingness to sell personal data[J]. The scandinavian journal of economics, 2018, 120(4) : 1260-1278.
[14] 张凯亮,臧国全.泄露概率情境下的个人数据隐私计量研究[J].图书情报工作,2021,65(9):62-69.
[15] SPIEKERMANN S, BAUERC, KORUNOVSKA J. Psychology of ownership and asset defense: why people value their personal information beyond privacy[J]. Social science electronic publishing, 2012, 4(1):41-47.
[16] 臧国全,张凯亮,闫励.个人数据价值计量研究——基于改造的BDM机制[J].图书情报工作,2020,64(7):103-109.
[17] 邵卫,化柏林.基于依存句法分析的科技政策领域主题词表无监督构建[J].情报工程,2020,6(6):33-44.
[18] 郑新曼,董瑜.基于科技政策文本的程度词典构建研究[J].数据分析与知识发现,2021,5(10):81-93.
[19] LI X F, ZHAO L L, WU L H. A feature extraction method using base phrase and keyword in Chinese text[C]//Proceedings of the 2008 3rd international conference on intelligent system and knowledge engineering. Xiamen: IEEE,2008:696-700.
[20] 唐晓波,肖璐.基于依存句法网络的文本特征提取研究[J].现代图书情报技术,2014(11):31-37.
[21] 葛斌,封孝生,谭文堂,等.基于多层最大熵模型的句子主干分析[J].计算机科学,2010,37(12):156-160.
[22] 涂安龙.一种CM-RS文本特征提取方法研究[D].武汉:华中师范大学,2012.
[23] 曹钲晨.基于海量数据分析的汉语句式特征提取及应用[D].济南:山东大学,2021.
[24] DIETZ E A, VANDIC D, FRASINCAR F. Taxolearn: a semantic approach to domain taxonomy learning[C]// Proceedings of the 2012 IEEE/WIC/ACM international joint conferences on Web intelligence and intelligent agent technology. Washington DC: IEEE Computer Society, 2012:58-65.
[25] 安亚巍,操晓春,罗顺.面向语料的领域主题词表构建算法[J].计算机科学,2018,45(S1):396-397,410.
[26] LIANG H, SUN X, SUN Y, et al. Text feature extraction based on deep learning: a review[J]. Eurasip journal on wireless communications & networking, 2017, 211:1-12.
[27] 肖健,徐建,徐晓兰,等.英中可比语料库中多词表达自动提取与对齐[J].计算机工程与应用,2010,46(31):130-134,187.
[28] LI X F, ZHAO L L. A multilayer method of text feature extraction based on CILIN[C]// Proceedings of the 2008 international conference on computer science & information technology. Singapore: IEEE, 2008:48-52.
[29] CIMIANO P, HOTHO A, STAAB S. Learning concept hierarchies from text corpora using formal concept analysis[J]. Journal of artificial intelligence research, 2005, 24(1): 305-339.
[30] 毕崇武,叶光辉,李明倩,等.基于标签语义挖掘的城市画像感知研究[J].数据分析与知识发现,2019,3(12):41-51.
[31] 夏立新,曾杰妍,毕崇武,等.基于LDA主题模型的用户兴趣层级演化研究[J].数据分析与知识发现,2019,3(7):1-13.
[32] HEYMANN P, GARCIA-MOLINA H. Collaborative creation of communal hierarchical taxonomies in social tagging systems[R]. Palo Alto: Stanford InfoLab Publication Server, 2006.
[33] 刘苏祺,白光伟,沈航.基于用户自描述标签的层次分类体系构建方法[J].计算机科学,2016,43(7):224-229,239.
[34] 阮雪灵. 基于用户画像的普适推荐方法与模型研究[D].武汉:武汉纺织大学,2021.
[35] 姚严志,李建良.基于类信息的TF-IDF权重分析与改进[J].计算机系统应用,2021,30(9):237-241.
[36] 黄思思.基于特征词权重变更的检索优化策略[J].情报科学,2016,34(7):70-75.
[37] 唐晓波,周禾深,李诗轩,等.基于被引-逆文档权重的专家专长识别与分析——以图情领域为例[J].图书情报工作,2021,65(15):111-119.
[38] 王佩,张婧,张威威.基于云模型和多层权重求解的多粒度语言大群体决策方法[J].控制与决策,2021,36(9):2257-2266.
[39] LI W, LIANG Y, WANG W, et al. Research on security risk assessment based on the improved FAHP[C]// Proceedings of the 3rd annual international conference on cloud technology and communication engineering. Wuhan: IOP Publishing, 2020.
[40] 蒋斌,梁小安,高杨军,等.基于可靠度确定属性权重的三角模糊数多属性决策方法[J].模糊系统与数学,2021,35(4):113-123.
[41] KAHRAMAN C, ERTAY T, BUYUKZKAN G. A fuzzy optimization model for QFD planning process using analytic network approach[J]. European journal of operational research, 2006, 171(2):390-411.
[42] CHANG D Y. Applications of the extent analysis method on fuzzy AHP[J]. European journal of operational research, 1996, 95(3):649-655.