[目的/意义] 基于语料的情感词发现依语句上下文推断情感词极性,能显著提升情感分析的准确率,在面向领域的特征级情感分析任务中有重要应用价值。[方法/过程] 对特征级情感极性研判问题展开探讨,提出基于点互信息的"特征-情感"对情感极性自动判别算法,算法借助大规模领域语料,根据观点表达"特征-情感"对与情感语义明确的种子词的共现关系,同时引入依存句法分析语句间的情感转折,通过修正经典的点互信息算法,对上下文约束下的用户观点表达进行褒贬预测。[结果/结论] 实验证明,修正算法的性能显著优于词典匹配算法和经典的点互信息情感识别算法,不仅能够推断词典中未纳入的观点表达的情感指向,而且能较准确地推断"语境"中的情感词极性。在餐饮评论和数码产品评论两个评测语料集上,修正算法的F1宏平均指标分别达到0.827和0.878。该算法以领域相关的大规模语料为支撑,基于概率统计和句法分析,因数据获取便利,算法效率高,移植性好,具有普适性,尤其适用于面向领域的情感分析任务。
[Purpose/significance] By using corpus-based sentiment analysis, opinion word polarity can be predicted in accordance with its context. The method is significant in applications oriented to specific-domains sentiment analysis tasks since it can improve the prediction accuracy.[Method/process] In the paper, context-oriented sentiment polarity identification for emotion expressions was investigated. A Pointwise Mutual Information(PMI) based algorithm was proposed to solve the problem. In terms of PMI, polarity of an emotion expression "feature-opinion" was inferred according to the co-occurrence of the expression with contextual opinion seed words. Furthermore, employing dependence relation analysis to detect sentimental reverse in context; with the modified PMI algorithm, we can predict polarity of emotion expressions in a sentence more accurately.[Result/conclusion] The results indicate, compared with the Lexicon-based method and the classical PMI, the modified method performs better. With it, opinion-words unlisted in lexicons can be identified, and context-specific sentimental orientation of an expression can be detected precisely as well. Modifying the macro F1 value to 0.827 and 0.878 in cater-review corpus and electronic-product review corpus separately. The algorithm, supported by large-scale domain-specific corpus and based on statistics and dependency analysis, is efficient due to convenience for data acquisition, which make it easier be applied in other domain-specific sentimental analysis tasks.
[1] 大连理工大学信息检索研究室.大连理工大学中文情感词汇本体库[EB/OL].[2019-01-10]. http://ir.dlut.edu.cn/EmotionOntologyDownload.
[2] HU M Q, LIU B. Mining and summarizing customer reviews[C]//Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM Press, 2004:168-177.
[3] LIU B, HU M Q, CHENG J S. Opinion observer:analyzing and comparing opinions on the Web[C]//Proceedings of the 14th international conference on WWW. New York:ACM Press, 2005:342-351.
[4] POPESCU A M, ETAIONI O. Extracting product features and opinions from reviews[C]//Natural language processing and text mining. London:Springer, 2007:9-28.
[5] TURNEY P D, LITTMAN M L. Measuring praise and criticism:inference of semantic orientation from association[J]. ACM transactions on information systems, 2003, 21(4):315-346.
[6] DING X, LIU B, YU P S. A holistic lexicon-based approach to opinion mining[C]//Proceedings of the 2008 international conference on Web search and data mining. New York:ACM Press, 2008:231-240.
[7] DRACUT E C, YU C, SISTLA P, et al. Construction of a sentimental word dictionary[C]//Proceedings of the 19th ACM international conference on information and knowledge management. New York:ACM Press, 2010:1761-1764.
[8] WU Y F, WEN M M. Disambiguating dynamic sentiment ambiguous adjectives[C]//Proceedings of the 23rd international conference on computational linguistics. Stroudsburg:Associational for Computational Linguistics, 2010:1191-1199.
[9] 王科, 夏睿. 情感词典自动构建方法综述[J].自动化学报, 2016, 42(4):495-509.
[10] BALAHUR A, MONTOYO A O. Applying opinion mining techniques for the disambiguation of sentiment ambiguous adjectives in SemEval-2 task 18[C]//Proceedings of the 5th international workshop on semantic evaluation. Shroudsburg:Association for Computational Linguistics, 2010:444-447.
[11] LEK H H, POO D C C. Sentix:an aspect and domain sensitive sentiment lexicon[C]//Proceedings of the 2012 IEEE 24th international conference on tools with artificial intelligence. Washington, DC:IEEE Computer Society, 2012:261-268.
[12] XIA Y Q, CAMBRIA E, HUSSAIN A, et al. Word polarity disambiguation using Bayesian model and opinion-level features[J]. Cognitive computation, 2014,7(3):369-380.
[13] WHITELAW C, GARG N,ARAGON S. Using appraisal groups for sentiment analysis[EB/OL].[2019-10-01]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.8147&rep=rep1&type=pdf.
[14] HUANG S L, CHENG W C. Discovering Chinese sentence patterns for feature-based opinion summarization[J]. Electronic commerce research and application, 2015,14(6):582-591.
[15] 史伟, 王洪伟, 何绍义. 基于语义的中文在线评论情感分析[J]. 情报学报, 2013, 32(8):860-867.
[16] 聂卉. 隐主题模型下产品评论观点的凝聚与量化[J]. 情报学报, 2017, 36(6):565-573.
[17] ESULI A, SEBASTIAN F. Determining the semantic orientation of terms through gloss classification[C]//Proceedings of the 14th ACM international conference on information and knowledge management. New York:ACM Press, 2005:617-624.
[18] KAMPS J, MARX M, MOKKEN J R, et al. Using WordNet to measure semantic orientations of adjectives[EB/OL].[2019-04-11].http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.2534&rep=rep1&type=pdf.
[19] HASSAN A, RADEV D. Identifying text polarity using random walks[C]//Proceedings of the 48th annual meeting of the association for computational linguistics. Stroudsburg:Associational for Computational Linguistics,2010:395-403.
[20] OLIVEIRA N, CORTEZ P, AREAL N. Stock market sentiment lexicon acquisition using microblogging data and statistical measures[J]. Decision support system, 2016, 85:62-73.
[21] DENG S, SINHA A P, ZHAO H. Adapting sentiment lexicons to domain-specific social media texts[J]. Decision support system, 2017, 94:65-76.
[22] 郗亚辉. 产品评论中领域情感词典的构建[J]. 中文信息学报, 2016, 30(5):136-144.
[23] LABILLE K, GAUCH S, ALFARHOOD S. Creating domain-specific sentiment Lexicons via text mining[EB/OL].[2019-04-11]. http://www.sentic.net/wisdom2017labille.pdf.
[24] LI Y, PAN Q, YANG T, et al. Learning word representations for sentiment analysis[J]. Cognitive computation, 2017, 9(6):843-851.
[25] 林江豪, 周咏梅, 阳爱民,等. 基于词向量的领域情感词典构建[J]. 山东大学学报(工学版), 2018, 48(3):40-47.