图书情报工作 ›› 2021, Vol. 65 ›› Issue (8): 85-96.DOI: 10.13266/j.issn.0252-3116.2021.08.009

• 知识组织 • 上一篇    下一篇

融合词语义表示和新词发现的领域本体演化——以产品评论数据为例

耿骞1,2, 邓斯予2, 靳健2   

  1. 1 北京师范大学政府治理研究中心 珠海 519087;
    2 北京师范大学政府管理学院 北京 100875
  • 收稿日期:2020-10-15 修回日期:2021-01-12 出版日期:2021-04-20 发布日期:2021-06-02
  • 通讯作者: 靳健(ORCID:0000-0002-3239-2294),副教授,博士,硕士生导师,通讯作者,E-mail:jinjian.jay@bnu.edu.cn
  • 作者简介:耿骞(ORCID:0000-0001-5064-4996),教授,博士,博士生导师;邓斯予(ORCID:0000-0002-8000-4065),硕士研究生。
  • 基金资助:
    本文系国家社会科学基金重点项目"面向集成管理的政府数据组织与传递机制研究"(项目编号:19ATQ005)和国家自然科学基金项目"差异化客户需求的提取及比较研究:基于产品在线评论的挖掘分析"(项目编号:71701019)研究成果之一。

Integrating Word Semantic Representation and New Word Identification for Domain Ontology Evolution: A Case Study of Product Online Reviews

Geng Qian1,2, Deng Siyu2, Jin Jian2   

  1. 1 Center for Governance Studies, Beijing Normal University, Zhuhai 519087;
    2 School of Government, Beijing Normal University, Beijing 100875
  • Received:2020-10-15 Revised:2021-01-12 Online:2021-04-20 Published:2021-06-02

摘要: [目的/意义] 针对传统本体演化中对新知识和新需求的捕捉存在不准确、低效率的问题,提出一种基于领域新词发现的本体演化方法,并以用户产品评论数据为例进行验证。[方法/过程] 首先采用自然语言处理算法对用户产品评论文本语料进行文本预处理,并利用Word2vec算法进行词向量嵌入;然后采用深度学习中Bi-LSTM-Attention-CRF算法实现候选领域新词的识别和抽取,并利用K-means算法进行聚类以得到最终领域新词;最后利用本体演化的六阶段演化流程,实现领域本体的演化工作。[结果/结论] 以智能手机领域产品评论为实验数据,验证了本研究采用领域新词发现模型具有更高的准确率和召回率,由此演化得到智能手机领域新版产品本体。领域新版产品本体既可以帮助产品设计者根据领域本体中新特征、新功能优化产品设计,也可以支持消费者利用产品评论进行购买决策。

关键词: 本体演化, 领域新词, 新词发现, 注意力机制, 双向长短期记忆网络, 条件随机场

Abstract: [Purpose/significance] Due to the inaccuracy and low efficiency in capturing new knowledge and new requirements in traditional ontology evolution, based on domain new word identification, an ontology evolution method is proposed and evaluated by analyzing a large volume of product online reviews.[Method/process] First, a series of natural language processing algorithms were used to pre-process product review text corpus, and the Word2vec algorithm was adopted for word vector embedding. Then, a Bi-LSTM-Attention-CRF algorithm was utilized for the recognition and extraction of new words in a candidate set, and the K-means algorithm was applied for clustering to get the final domain new words. Finally, the Six-Stage evolution process of ontology evolution was invited for analyzing domain ontology evolution.[Result/conclusion] By analyzing smart phone reviews as examples, it can be found that the proposed approach about new word identification presents a higher accuracy and recall rate and a new version of the product ontology in the smart phone domain can be evolved accordingly. It helps designers to optimize feature and function configuration in new product development and consumers to analyze online opinions for purchase decisions.

Key words: ontology evolution, domain new words, new word detection, attention mechanism, Bi-directional Long Short-Term-Memory, Conditional Random Field

中图分类号: