[目的/意义] 政策工具的识别与分析是政策研究的重要手段之一。此项工作目前多以人工开展。本文运用深度学习方法进行政策工具的自动识别,以期提高政策工具识别的效率。[方法/过程] 设计与实施政策数据采集与清洗——政策工具人工标引——模型训练——结果解读的政策工具自动识别的实验流程,并以北上广贵四地的政府信息公开政策为例,对比传统机器学习方法和深度学习方法在政策工具识别任务上的性能表现。此外,提出整合政策全局信息进行各段落政策工具识别的方案,并通过实验证明方案的有效性。[结果/结论] 深度学习模型CNN在全量测试数据上达到76.51%的准确率,整合全局信息的CNN模型达到77.13%的准确率。而仅对模型的高置信度结果进行评估发现,整合全局信息的CNN模型在其中55.63%的测试数据上准确率达到了95.44%。该准确率已经达到了实用的要求,表明超过一半的政策工具标引可以借用模型的高置信度结果,无需人工复核。基于深度学习方法研究政策工具的自动识别取得较好的效果,提升政策工具标引的效率,为大数据量的政策工具自动识别提供正面经验。
[Purpose/significance] The identification and analysis of policy tools is one of the important methods of policy research. However, the identification of policy tools is mostly manual. In this article, we attempt to use deep learning methods to automatically identify policy tools, aiming at improving the efficiency of policy tool identification. [Method/process] We designed and implemented the policy tool automatic identification experimental process of "Policy data collection and cleaning-policy tool manual indexing-model training-result interpretation". We take the open government data policies of Beijing, Shanghai, Guangzhou, and Guiyang as an example to compare the performance of traditional machine learning methods and deep learning methods on the task of identifying policy tools. In addition, we have proposed to integrate policy global information to identify policy tools in each paragraph, and our experiments have proved the effectiveness of the idea. [Result/conclusion] The deep learning model CNN achieves an accuracy of 76.51% on the full test data, and the CNN model that integrates global information achieves an accuracy of 77.13%. When evaluating the high-confident results of the model, we find that the model achieves an accuracy of 95.44% on 55.63% of the test data, which has reached the practical requirements. This shows that more than half of the data can be indexed with the model’s high-confidence results without manual review. Deep learning methods have been applied to the automatic identification of policy tools and has achieved good results. It could help to improve the efficiency of policy tool labeling and provide positive experience for the automatic identification of policy tools with big data. And it provides a positive experience for automatic identification of policy tools with large data volumes.
[1] 黄萃.政策文献量化研究[M].北京:科学出版社,2016.
[2] 陈振明,张敏.国内政策工具研究新进展:1998-2016[J].江苏行政学院学报,2017(6):109-116.
[3] 黄凯丽,赵频."一带一路"倡议的政策文本量化研究——基于政策工具视角[J].情报杂志,2018,37(1):53-58,46.
[4] 郭雨晖,汤志伟,翟元甫.政策工具视角下智慧城市政策分析:从智慧城市到新型智慧城市[J].情报杂志,2019,38(6):201-207,200.
[5] 汤志伟,雷鸿竹,郭雨晖.政策工具.创新价值链视角下的我国地方政府人工智能产业政策研究[J].情报杂志,2019,38(5):49-56.
[6] 黄新平,黄萃,苏竣.基于政策工具的我国科技金融发展政策文本量化研究[J].情报杂志,2020,39(1):130-137.
[7] 黄萃,赵培强,苏竣.基于政策工具视角的我国少数民族双语教育政策文本量化研究[J].清华大学教育研究,2015,36(5):88-95.
[8] LESTER J P, STEWART J. Public policy:an evolutionary approach[M]. Beijing:China Renmin University Press, 2004.
[9] HUGHES O E. Public management and administration:an introduction[M]. Beijing:China Renmin University Press, 2001.
[10] SALAMON L M. The tools of government:a guide to the new governance[M]. New York:Oxford University Press Inc., 2002.
[11] 赵筱媛,苏竣.基于政策工具的公共科技政策分析框架研究[J].科学学研究,2007(1):52-56.
[12] 张成福.论政策治理工具及其选择[J].公共行政,2003(4):303-304.
[13] 陈振明.政策科学——公共政策分析导论[M].2版.北京:中国人民大学出版社,2004.
[14] KIRSCHEN E S. Economic policy in our time[M]. Chicago:Rand McNally, 1964.
[15] SALAMON L M. Rethinking public management:third-party government and the changing forms of government action[J]. Public policy, 1981,29(3):225-275.
[16] HOOD C C. The tools of government[M]. London:Basingstoke, 1983.
[17] LORRAINE M, RICHARD E. Getting the job done:alternative policy instruments[J]. Educational evaluation and policy analysis,1987,9(2):133-152.
[18] SCHNEIDER A, INGARAM H. Behavioral assumptions of policy tools[J]. The journal of politics, 1990, 52(2):510-529.
[19] ROTHWELL R, ZEGVELD W. An assessment of government innovation policies[J]. Review of policy research,1984,3(3/4):436-444.
[20] 程啸天. 政策工具视角下的中国风电产业政策文本内容分析[D].杭州:浙江大学,2011.
[21] 张娜,马续补,张玉振,等.基于文本内容分析法的我国公共信息资源开放政策协同分析[J].情报理论与实践,2020,43(4):115-122.
[22] 谭春辉,谢荣,刘倩.政策工具视角下的我国政府信息公开政策文本量化研究[J].电子政务,2020(2):111-124.
[23] 黄萃,苏竣,施丽萍,等.政策工具视角的中国风能政策文本量化研究[J].科学学研究,2011,29(6):876-882,889.
[24] 黄萃,徐磊,钟笑天,等.基于政策工具的政策-技术路线图(P-TRM)框架构建与实证分析——以中国风机制造业和光伏产业为例[J].中国软科学,2014(5):76-84.
[25] 马费成,李小宇,张斌.中国互联网内容监管体制结构、功能与演化分析[J].情报学报,2013,32(11):1124-1137.
[26] 曾文,李智杰,王小玉,等.科技政策术语自动识别技术初探[J].中国科技资源导刊,2017,49(3):20-25.
[27] 刘兴. 贝叶斯分类算法在税收政策公文识别的研究和应用[D].长沙:湖南大学,2011.
[28] 李斌斌. 基于LDA模型的我国文化政策主题演化研究(1979-2017)[D].上海:上海大学,2019.
[29] 顾佳怡.基于BERT模型的政策条件识别研究[J].科技视界,2020(7):251-252.
[30] 林德明,王宇开,丁堃.基于语义识别的知识产权战略政策工具选择[J].情报学报,2020,39(2):178-185.
[31] 李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004(4):9-14.
[32] 李湘东,曹环,丁丛,等.利用《知网》和领域关键词集扩展方法的短文本分类研究[J].现代图书情报技术,2015(2):31-38.
[33] LECUN Y, BOTTOU L,BENGIO Y,et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE,1998,86(11):2278-1324.
[34] GU J, WANG Z, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern recognition, 2018, 77:354-377.
[35] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation,1997,9(8):1735-1780.
[36] CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].[2021-03-03]. https://arxiv.org/pdf/1412.3555.pdf.
[37] CHO K, VAN M B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].[2021-03-03]. https://arxiv.org/pdf/1406.1078.pdf.
[38] 李春梅.基于TF-IDF的网页新闻分类的研究与应用[J].贵州师范大学学报(自然科学版),2015,33(6):106-109.