图书情报工作 ›› 2020, Vol. 64 ›› Issue (5): 114-123.DOI: 10.13266/j.issn.0252-3116.2020.05.012

• 情报研究 • 上一篇    下一篇

基于修正点互信息的特征级情感词极性自动研判

聂卉, 首欢容   

  1. 中山大学资讯管理学院 广州 510275
  • 收稿日期:2019-04-23 修回日期:2019-10-21 出版日期:2020-03-05 发布日期:2020-03-05
  • 作者简介:聂卉(ORCID:0000-0001-8567-3084),副教授,博士,E-mail:issnh@mail.sysu.edu.cn;首欢容(ORCID:0000-0003-0586-218X),硕士研究生。
  • 基金资助:
    本文系国家社会科学基金项目"面向用户感知效用的在线评论的质量与控制研究"(项目编号:15BTQ067)研究成果之一。

Feature-opinion Polarity Identification Based on the Modified PMI Algorithm

Nie Hui, Shou Huanrong   

  1. School of Information Management, Sun Yat-Sen University, Guangzhou 510275
  • Received:2019-04-23 Revised:2019-10-21 Online:2020-03-05 Published:2020-03-05

摘要: [目的/意义] 基于语料的情感词发现依语句上下文推断情感词极性,能显著提升情感分析的准确率,在面向领域的特征级情感分析任务中有重要应用价值。[方法/过程] 对特征级情感极性研判问题展开探讨,提出基于点互信息的"特征-情感"对情感极性自动判别算法,算法借助大规模领域语料,根据观点表达"特征-情感"对与情感语义明确的种子词的共现关系,同时引入依存句法分析语句间的情感转折,通过修正经典的点互信息算法,对上下文约束下的用户观点表达进行褒贬预测。[结果/结论] 实验证明,修正算法的性能显著优于词典匹配算法和经典的点互信息情感识别算法,不仅能够推断词典中未纳入的观点表达的情感指向,而且能较准确地推断"语境"中的情感词极性。在餐饮评论和数码产品评论两个评测语料集上,修正算法的F1宏平均指标分别达到0.827和0.878。该算法以领域相关的大规模语料为支撑,基于概率统计和句法分析,因数据获取便利,算法效率高,移植性好,具有普适性,尤其适用于面向领域的情感分析任务。

关键词: 情感分析, 点互信息算法, 领域情感词, 上下文

Abstract: [Purpose/significance] By using corpus-based sentiment analysis, opinion word polarity can be predicted in accordance with its context. The method is significant in applications oriented to specific-domains sentiment analysis tasks since it can improve the prediction accuracy.[Method/process] In the paper, context-oriented sentiment polarity identification for emotion expressions was investigated. A Pointwise Mutual Information(PMI) based algorithm was proposed to solve the problem. In terms of PMI, polarity of an emotion expression "feature-opinion" was inferred according to the co-occurrence of the expression with contextual opinion seed words. Furthermore, employing dependence relation analysis to detect sentimental reverse in context; with the modified PMI algorithm, we can predict polarity of emotion expressions in a sentence more accurately.[Result/conclusion] The results indicate, compared with the Lexicon-based method and the classical PMI, the modified method performs better. With it, opinion-words unlisted in lexicons can be identified, and context-specific sentimental orientation of an expression can be detected precisely as well. Modifying the macro F1 value to 0.827 and 0.878 in cater-review corpus and electronic-product review corpus separately. The algorithm, supported by large-scale domain-specific corpus and based on statistics and dependency analysis, is efficient due to convenience for data acquisition, which make it easier be applied in other domain-specific sentimental analysis tasks.

Key words: sentimental analysis, pointwise mutual information, domain-specific opinion word, context

中图分类号: