知识组织

融合词语义表示和新词发现的领域本体演化——以产品评论数据为例

  • 耿骞 ,
  • 邓斯予 ,
  • 靳健
展开
  • 1 北京师范大学政府治理研究中心 珠海 519087;
    2 北京师范大学政府管理学院 北京 100875
耿骞(ORCID:0000-0001-5064-4996),教授,博士,博士生导师;邓斯予(ORCID:0000-0002-8000-4065),硕士研究生。

收稿日期: 2020-10-15

  修回日期: 2021-01-12

  网络出版日期: 2021-06-02

基金资助

本文系国家社会科学基金重点项目"面向集成管理的政府数据组织与传递机制研究"(项目编号:19ATQ005)和国家自然科学基金项目"差异化客户需求的提取及比较研究:基于产品在线评论的挖掘分析"(项目编号:71701019)研究成果之一。

Integrating Word Semantic Representation and New Word Identification for Domain Ontology Evolution: A Case Study of Product Online Reviews

  • Geng Qian ,
  • Deng Siyu ,
  • Jin Jian
Expand
  • 1 Center for Governance Studies, Beijing Normal University, Zhuhai 519087;
    2 School of Government, Beijing Normal University, Beijing 100875

Received date: 2020-10-15

  Revised date: 2021-01-12

  Online published: 2021-06-02

摘要

[目的/意义] 针对传统本体演化中对新知识和新需求的捕捉存在不准确、低效率的问题,提出一种基于领域新词发现的本体演化方法,并以用户产品评论数据为例进行验证。[方法/过程] 首先采用自然语言处理算法对用户产品评论文本语料进行文本预处理,并利用Word2vec算法进行词向量嵌入;然后采用深度学习中Bi-LSTM-Attention-CRF算法实现候选领域新词的识别和抽取,并利用K-means算法进行聚类以得到最终领域新词;最后利用本体演化的六阶段演化流程,实现领域本体的演化工作。[结果/结论] 以智能手机领域产品评论为实验数据,验证了本研究采用领域新词发现模型具有更高的准确率和召回率,由此演化得到智能手机领域新版产品本体。领域新版产品本体既可以帮助产品设计者根据领域本体中新特征、新功能优化产品设计,也可以支持消费者利用产品评论进行购买决策。

本文引用格式

耿骞 , 邓斯予 , 靳健 . 融合词语义表示和新词发现的领域本体演化——以产品评论数据为例[J]. 图书情报工作, 2021 , 65(8) : 85 -96 . DOI: 10.13266/j.issn.0252-3116.2021.08.009

Abstract

[Purpose/significance] Due to the inaccuracy and low efficiency in capturing new knowledge and new requirements in traditional ontology evolution, based on domain new word identification, an ontology evolution method is proposed and evaluated by analyzing a large volume of product online reviews.[Method/process] First, a series of natural language processing algorithms were used to pre-process product review text corpus, and the Word2vec algorithm was adopted for word vector embedding. Then, a Bi-LSTM-Attention-CRF algorithm was utilized for the recognition and extraction of new words in a candidate set, and the K-means algorithm was applied for clustering to get the final domain new words. Finally, the Six-Stage evolution process of ontology evolution was invited for analyzing domain ontology evolution.[Result/conclusion] By analyzing smart phone reviews as examples, it can be found that the proposed approach about new word identification presents a higher accuracy and recall rate and a new version of the product ontology in the smart phone domain can be evolved accordingly. It helps designers to optimize feature and function configuration in new product development and consumers to analyze online opinions for purchase decisions.

参考文献

[1] JIN J, LIU Y, JI P, et al. Review on recent advances in information mining from big consumer opinion data for product design[J]. Journal of computing and information science in engineering, 2019, 19(1):1-19.
[2] 邓斯予,耿骞,靳健,等. 基于产品评论分析的领域知识库构建与应用[J]. 情报理论与实践, 2019, 42(11):115-122,127.
[3] GENG Q, DENG S, JIA D, et al. Cross-domain ontology construction and alignment from online customer product reviews[J]. Information sciences, 2020, 531:47-67.
[4] CARDOSO S D, SILVEIRA M D, PRUSKI C. Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies[J]. Knowledge-based systems, 2020, 194(22):105508.
[5] 陈晶,刘钊,顾进广,等. 本体演化中基于TFOF的波及效应分析[J]. 武汉大学学报(理学版), 2020, 66(2):197-204.
[6] BENOMRANE S, SELLAMI Z, AYED M B. An ontologist feedback driven ontology evolution with an adaptive multi-agent system[J]. Advanced engineering informatics, 2016, 30(3):337-353.
[7] CHEN C, LIU Y, KUMAR M, et al. Energy consumption modelling using deep learning embedded semi-supervised learning[J]. Computers & industrial engineering, 2019, 135:757-765.
[8] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[9] NAGIREDDI V S K, MISHRA S. An ontology based cloud service generic search engine[C]//International conference on computer science & education. Colombo:IEEE, 2013:335-340.
[10] CHEN X, CHEN H, BI X, et al. BioTCM-SE:A semantic search engine for the information retrieval of modern biology and traditional Chinese medicine[J]. Computational and mathematical methods in medicine, 2014,13(2):1-13.
[11] 刘紫玉,杨雨佳,张晓明,等. 基于DBpedia的领域本体进化方法研究[J]. 情报杂志, 2017, 36(6):160-166.
[12] 陈晶,刘钊,顾进广,等. 本体演化的波及效应计算优化研究[J]. 计算机应用研究, 2020, 37(8):2366-2370.
[13] 刘毅,王宇,杨德礼. 本体进化驱动的个性化语义搜索研究[J]. 情报学报, 2015, 34(10):1048-1055.
[14] 刘莹. 基于本体进化和知识检索联动的知识管理系统[J]. 情报科学, 2016, 34(4):62-67.
[15] HUANG C, CAI H, XU L, et al. Data-driven ontology generation and evolution towards intelligent service in manufacturing systems[J]. Future generation computer systems, 2019, 101:197-207.
[16] 刘伟童,刘培玉,刘文锋,等. 基于互信息和邻接熵的新词发现算法[J]. 计算机应用研究, 2019, 36(5):1293-1296.
[17] 郭理,张恒旭,王嘉岐,等. 基于Trie树的词语左右熵和互信息新词发现算法[J]. 现代电子技术, 2020, 43(6):65-69.
[18] 王煜,徐建民. 用于网络新闻热点识别的热点新词发现[J/OL]. 计算机应用:1-9[2020-09-12]. http://kns.cnki.net/kcms/detail/51.1307.TP.20200722.1337.002.html.
[19] 杜丽萍,李晓戈,于根,等. 基于互信息改进算法的新词发现对中文分词系统改进[J]. 北京大学学报(自然科学版), 2016, 52(1):35-40.
[20] 周霜霜,徐金安,陈钰枫,等. 融合规则与统计的微博新词发现方法[J]. 计算机应用, 2017, 37(4):1044-1050.
[21] 王馨,王煜,王亮. 基于新词发现的网络新闻热点排名[J]. 图书情报工作, 2015, 59(6):68-74.
[22] 陈梅婕,谢振平,陈晓琪,等. 专利新词发现的双向聚合度特征提取新方法[J]. 计算机应用, 2020, 40(3):631-637.
[23] 张华平,商建云. 面向社会媒体的开放领域新词发现[J]. 中文信息学报, 2017, 31(3):55-61.
[24] 王汀,冀付军,徐天晟. 一种面向中文网络百科非结构化信息的知识获取方法[J]. 图书情报工作, 2016, 60(13):126-133.
[25] 陈先来,韩超鹏,安莹,等. 基于互信息和逻辑回归的新词发现[J]. 数据分析与知识发现, 2019(8):105-113.
[26] 刘昱彤,吴斌,谢韬,等. 基于古汉语语料的新词发现方法[J]. 中文信息学报, 2019, 33(1):46-55.
[27] 赵志滨,石玉鑫,李斌阳. 基于句法分析与词向量的领域新词发现方法[J]. 计算机科学, 2019, 46(6):29-34.
[28] 黄文明,杨柳青青,任冲. 结合信息量和深度学习的领域新词发现[J]. 计算机工程与设计, 2019, 40(7):1903-1907,1914.
[29] GREGOR K, DANIHELKA I, GRAVES A, et al. DRAW:a recurrent neural network for image generation[C]//ICML.15:proceedings of the 32nd international conference on international conference on machine learning. Lille:JMLR, 2015, 37:1462-1471.
[30] GRAVES A. Supervised sequence labelling with recurrent neural networks[M]//Studies in computational intelligence, SCI 385.Berlin:Springer, 2012:5-13.
[31] PALANGI H, DENG L, SHEN Y, et al. Deep sentence embed ding using long short-term memory networks:analysis and application to information retrieval[J]. IEEE/ACM transactions on audio, speech, and language processing, 2015, 24(4):694-707.
[32] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2020-09-16]. https://arxiv.org/pdf/1409.0473.pdf.
[33] 张华丽,康晓东,李博,等. 结合注意力机制的Bi-LSTM-CRF中文电子病历命名实体识别[J]. 计算机应用, 2020,40(S1):98-102.
[34] 李纲,潘荣清,毛进,等. 整合BiLSTM-CRF网络和词典资源的中文电子病历实体识别[J]. 现代情报, 2020, 40(4):3-12,58.
[35] MIKOLOV T. Distributed representations of words and phrases and their compositionality[J]. Advances in neural information processing systems, 2013, 26:3111-3119.
[36] 胡甜甜,但雅波,胡杰,等. 基于注意力机制的Bi-LSTM结合CRF的新闻命名实体识别及其情感分类[J]. 计算机应用, 2020, 40(7):1879-1883.
[37] STOJANOVIC L, MAEDCHE A, MOTIK B, et al. User-driven ontology evolution management[C]//Proceedings of the 13th international conference on knowledge engineering and knowledge management. Ontologies and the semantic Web. Berlin:Springer-Verlag:2002,285-300.
[38] NOY N F, CHUGH A, LIU W, et al. A framework for ontology evolution in collaborative environments[C]//International semantic web conference. Berlin:Springer, 2006.
文章导航

/