图书情报工作 ›› 2016, Vol. 60 ›› Issue (21): 113-121.DOI: 10.13266/j.issn.0252-3116.2016.21.015

• 知识组织 • 上一篇    下一篇

基于语法树的SAO结构识别方法研究

杨超, 朱东华, 衡晓帆, 汪雪锋   

  1. 北京理工大学管理与经济学院 北京 100081
  • 收稿日期:2016-08-25 修回日期:2016-10-16 出版日期:2016-11-05 发布日期:2016-11-05
  • 作者简介:杨超(ORCID:0000-0002-0607-9552),博士研究生,E-mail:yc_2009@hotmail.com;朱东华(ORCID:0000-0001-8514-9733),研究员,博士生导师;衡晓帆(ORCID:0000-0001-5695-5234),博士研究生;汪雪锋(ORCID:0000-0002-4857-6944),教授。
  • 基金资助:

    本文系国家自然科学基金面上项目“基于语义TRIZ的新兴技术创新路径预测研究”(项目编号:71373019)和国家高技术研究发展计划“面向政府管理的大数据智能服务系统及应用示范”(项目编号:2014AA015105)研究成果之一。

Parse Tree-based SAO Structure Identification

Yang Chao, Zhu Donghua, Heng Xiaofan, Wang Xuefeng   

  1. School of Management and Economics, Beijing Institute of Technology, Beijing 100081
  • Received:2016-08-25 Revised:2016-10-16 Online:2016-11-05 Published:2016-11-05

摘要:

[目的/意义] SAO是一种能够表示主题信息和主题间关系的3元组结构,是文献计量学领域一个快速发展的研究方向。为了获得“满足文献计量分析需求的SAO结构”,需要解决现有SAO结构识别方法遭遇的3个问题:查全和查准率低、所识别SAO结构和领域主题相关性不强以及矩阵稀疏性。[方法/过程] 提出一种面向文献计量分析的基于语法树的SAO结构识别方法,首先基于共现算法和“主题词簇”方法(term clumping)识别SAO核心组件,然后利用基于语法树的抽取算法实现SAO结构的逐层抽取。[结果/结论] 案例研究发现,该方法的平均查准率为0.805 8,平均查全率为0.844 6,所识别SAO结构与领域主题关系较强,且矩阵稀疏性也得到较好改善,可有效应用于相关文献计量分析。

关键词: "主语-行为-宾语"(SAO)识别, 语法树, 语义分析, 共现算法, 主题词簇

Abstract:

[Purpose/significance] Subject-Action-Object (SAO) is a triple structure which can be used to both describe topics in details and explore the relationship between topics. SAO analysis is a fast-growing research field. In order to obtain the SAO structures which are suitable for the bibliometric analysis, three problems need to be solved. Recall and precision have been low. The SAOs don't have close relationships with domain topics. There is a problem of matrix sparsity. [Method/process] This paper proposed a parse tree-based SAO identification method for the bibliometric analysis. It included:(1) a model to identify the core components of SAO structures, where co-word analysis and term clumping processes were involved; (2) a parse tree-based hierarchical SAO extraction model to implement SAO structures identification. [Result/conclusion] The case study shows that the average precision and average recall of the proposed method is 0.8058 and 0.8446. The SAO extracted with our method has a great relationship with the domain topic and improves the matrix sparsity, which makes it be used as an effective tool for the bibliometric analysis.

Key words: subject-action-object (SAO) identification, parse tree, semantic analysis, co-word algorithm, term clumping

中图分类号: