Data-driven integration of multiple sentiment ...

Viewer
Transcript

Knowledge-Based Systems 71 (2014) 61–71

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classiﬁcation of product reviews Heeryon Cho a, Songkuk Kim a,b,⇑, Jongseo Lee a,b, Jong-Seok Lee a,b,⇑ a b

Yonsei Institute of Convergence Technology, Yonsei University, Republic of Korea School of Integrated Technology, Yonsei University, Republic of Korea

a r t i c l e

i n f o

Article history: Available online 13 June 2014 Keywords: Sentiment dictionary Domain adaptation Integration Lexicon-based review classiﬁcation Sentiment analysis

a b s t r a c t In lexicon-based sentiment classiﬁcation, the problem of contextual polarity must be explicitly handled since it is a major cause for classiﬁcation error. One way to handle contextual polarity is to revise the prior polarity of the sentiment dictionary to ﬁt the domain. This paper presents a data-driven method of adapting sentiment dictionaries to diverse domains. Our method ﬁrst merges multiple sentiment dictionaries at the entry word level to expand the dictionary. Then, leveraging the ratio of the positive/ negative training data, it selectively removes the entry words that do not contribute to the classiﬁcation. Finally, it selectively switches the sentiment polarity of the entry words to adapt to the domain. In essence, our method compares the positive/negative review’s dictionary word occurrence ratios with the positive/negative review ratio itself to determine which entry words to be removed and which entry words’ sentiment polarity to be switched. We show that the integrated sentiment dictionary constructed using our ‘merge’, ‘remove’, and ‘switch’ operations robustly outperforms individual dictionaries in the sentiment classiﬁcation of product reviews across different domains such as smartphones, movies, and books. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction Reading online product reviews is now a routine part of purchasing experience that provides prospective buyers valuable information regarding the product. Product reviews can be divided into positive and negative reviews in which the reviewer either recommends or does not recommend a product. Due to the ever increasing volume of online product reviews, efﬁciently dividing these reviews into positive and negative reviews is a fundamental step in utilizing product reviews. A growing body of research on automatically classifying the sentiment in product reviews exists in the ﬁeld of ‘‘opinion mining and sentiment analysis’’ with different solutions to supervised, unsupervised, semi-supervised, and concept-based approaches [1–4]. This study presents yet another lexicon-based approach to classifying product reviews using existing sentiment lexicons, but proposes a novel method of merging and revising multiple sentiment

⇑ Corresponding authors. Address: 162-1 Songdo-dong, Yeonsu-gu, 406-840 Incheon, Republic of Korea. Tel.: +82 32 749 5842 (S. Kim). Tel.: +82 32 749 5846; fax: +82 32 818 5801 (J.-S. Lee). E-mail addresses: [email protected] (S. Kim), [email protected] (J.-S. Lee). http://dx.doi.org/10.1016/j.knosys.2014.06.001 0950-7051/Ó 2014 Elsevier B.V. All rights reserved.

lexicons by incorporating labeled product reviews to enable domain adaptation of sentiment values. Numerous sentiment lexicons with varying format and size have been constructed to aid the classiﬁcation of positive and negative sentiments in texts. Examples include SentiSense [5], SentiWordNet [6], Micro-WNOp [7], WordNet-Affect [8], which are based on the English lexical database WordNet [9], SO-CAL [10], AFINN [11], Opinion Lexicon [12], Subjectivity Lexicon [13], General Inquirer [14], which are manually or semi-automatically constructed, and SenticNet [15] that is constructed from the Open Mind common sense knowledge base. With diverse sentiment lexicons readily available, we are naturally inclined to ask the following questions: (1) How are they different? (2) Can we construct a better sentiment lexicon for sentiment classiﬁcation by integrating multiple sentiment lexicons? We answer these questions by ﬁrst comparing different sentiment lexicons and their positive/negative product review classiﬁcation performances across different domains. We then present merge, remove, and switch operations that merge and revise the entry words of the multiple sentiment lexicons using labeled product reviews. Consequently, an integrated sentiment dictionary is constructed. We show that the integrated sentiment dictionary outperforms individual dictionaries in the sentiment classiﬁcation tasks of the three kinds of product reviews robustly via domain adaptation.

62

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

While many existing works approach the acquisition of domain-speciﬁc sentiment lexicon as a lexicon building problem, the proposed approach starts out with the existing sentiment lexicons, i.e., the ﬁnal product of the lexicon building approaches, and combines these existing lexicons to come up with improved sentiment lexicons. The main contribution of this study is the development of a novel method of removing and switching the content of the existing sentiment lexicons. While the sentiment lexicons provide a list of sentiment words, the sentiment values are deﬁned a priori as general-purpose values without consideration to a speciﬁc domain. Therefore, it is necessary to adjust the sentiment lexicons to the target domain before utilizing them for sentiment classiﬁcation. Our method ﬁrst selectively removes the sentiment words from the existing lexicon to prevent erroneous matching of the sentiment words during lexicon-based sentiment classiﬁcation. Next, it selectively switches the polarity of the sentiment words to adjust the sentiment values to a speciﬁc domain. The remove and switch operations are performed using the target domain’s labeled data, online product reviews in this study, by comparing the positive and negative distribution of the labeled reviews with a positive and negative distribution of the sentiment words. We propose a data-driven approach to automatically determine the necessary parameter for removing less contributing sentiment words and switching the polarity of the selected sentiment words to update the sentiment lexicon to ﬁt the given domain. The threshold for judging the positivity or the negativity of given product reviews is also set automatically. The data-driven nature of the proposed approach is feasible and beneﬁcial in situations where the labeled data is abundantly available but a human expert is absent or costly. Although the creation of a large amount of labeled data may be considered as laborious in general, the explosion of labeled Web opinion data and openly available lexical resources (e.g., sentiment lexicons) and the progress in big data processing techniques in recent years have formed an advantageous environment for the data-driven sentiment classiﬁcation. We fully leverage such an environment in this study. We also conduct extensive experiments to compare the performances of different state-of-the-art sentiment dictionaries, examine the effectiveness of each of the remove and switch operations when revising the dictionary, analyze the effect of different thresholds and slider values (i.e., parameters used in sentiment classiﬁcation and revising the dictionary, respectively), and examine the revised dictionaries’ performance with respect to the size of the training data. The remainder of the paper is structured as follows. We outline the existing approaches on sentiment lexicon construction, domain adaptation, and sentiment classiﬁcation in Section 2, and describe the ten sentiment lexicons and the procedures for standardizing, merging, and revising multiple lexicons in Section 3. We then summarize the experimental setup for sentiment dictionary evaluation in Section 4 and report the sentiment classiﬁcation performances of the ten individual sentiment dictionaries and one integrated dictionary on three product review classiﬁcation tasks in Section 5. Finally, we conclude this paper in Section 6.

2. Related work One of the primary lexicon-based approaches to classifying sentiments in texts focuses on detecting the polarity of a given word using a domain corpus. For instance, Turney and Littman proposed a method of inferring the sentiment of a given word using pointwise mutual information (PMI) by associating the word with a set of positive and negative seed words [16]. The sentiment of a word differs according to different domains, however, and this has led to an extensive research on domain-speciﬁc sentiment lexicon extraction and construction. Fahrni and Klenner proposed a

domain-speciﬁc adaptation of sentiment-bearing adjectives [17]. Adjectives such as ‘‘warm’’ and ‘‘cold’’ possess prior polarity, but depending on the context, this polarity may change. For example, warm mittens may be desirable, but warm beer may not be. To tackle such a problem of contextual polarity, Fahrni and Klenner implemented a two-stage process that ﬁrst identiﬁes domain-speciﬁc targets using Wikipedia, and then determines target-speciﬁc polarity of adjectives using a tagged corpus. Unlike this method, our method does not limit the scope of sentiment analysis only to adjectives but can consider contextual polarity of nouns, verbs, and adverbs in addition to adjectives covered by the default sentiment dictionary, which would lead to more accurate analysis. Yu et al. measured the similarity between the seed words and the words in a news corpus by comparing their contextual distribution using an entropy measure to extract useful emotion words [18]. Whereas Yu et al. employed a human annotator to select seed words to ensure the quality of the seed words, our method does not require any human intervention, but allows a fully automatic identiﬁcation of domain-relevant and domain-irrelevant emotion words. Huang et al. used generic sentiment lexicon, ‘‘and’’ and ‘‘but’’ clues, and synonym and antonym relations to extract pairwise candidate sentiment terms and propagated this knowledge to other sentiment terms to obtain a domain-speciﬁc sentiment lexicon [19]. No lexical clues, but only distributional information of positive/negative reviews and review words is employed in our approach, which enables a much simpler processing of sentences. Qiu et al. proposed an approach to iteratively extract both the domain sentiment words and features [20]. In doing so, they parsed dependency relation and part-of-speech information in order to map those types of information to the rules for sentiment word and feature extraction. We, on the other hand, parse only the part-of-speech information for looking up the matching entry words in the sentiment dictionary; our approach performs a simpler and faster processing of sentences. Kanayama and Nasukawa proposed an unsupervised lexicon building method that uses context coherency, i.e., the tendency for the same polarities to appear successively in contexts, in order to detect polar clauses that convey positive or negative aspects in a speciﬁc domain [21]. They treat Japanese polar clauses as unit of sentiment analysis and use domain-independent polar clauses and conjunctions such as ‘‘and’’, ‘‘but’’, and ‘‘because’’ as clues for polarity detection. While their method is also fully automatic and leverages the distribution of the context coherency, their method also requires syntactic parsing of the sentences. Neviarouskaya et al. proposed methods for automatically expanding a sentiment dictionary by adding and scoring new words based on the sentiment-scored lemmas and types of afﬁxes and by leveraging the direct synonym/antonym/hyponym relations, derivation, and compounding with known lexical units [22]. Their approach, however, focuses on the expansion of the prior-polarity sentiment words with no special concerns to domain-speciﬁc polarity. Our approach, on the other hand, considers both the domain-speciﬁc polarity and the expansion of the sentiment dictionary by merging and revising multiple sentiment dictionaries. Note that in the process, we generate an improved sentiment dictionary by integrating and revising multiple sentiment dictionaries although our goal is not about building a sentiment dictionary per se. Lu et al. proposed an optimization framework that provides a uniﬁed and principled way to combine general-purpose sentiment lexicon, overall sentiment rating, thesaurus, and linguistic heuristics for learning contextdependent sentiment lexicon [23]. We do not employ any lexical heuristics, but simply leverages statistical information, i.e., the positive and negative ratio of the product reviews and the review-matched dictionary entry words. Most of the approaches mentioned above focus on building and expanding the domainspeciﬁc sentiment lexicon using combinations of domain corpus,

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

seed words, and lexical heuristics. Our approach, on the other hand, focuses on the utilization of the existing sentiment lexicons and distributional information of the training data. We update the content of the existing sentiment lexicons using labeled data to adapt the lexicon to a given domain rather than extract or expand new domain-speciﬁc lexicons from a domain corpus. Positive/negative labeled data are used to perform polarity adaptation of the domain-speciﬁc sentiment words by comparing the positive and negative distribution of the labeled data with a positive and negative distribution of the sentiment words. Our approach is fully automatic, and the strength is in its simplicity. Since domain adaptation is an important issue in sentiment classiﬁcation, a copious amount of research has tackled the problem of domain transfer of sentiment classiﬁers, where a statistical sentiment classiﬁer learned from one domain is adapted to another domain. For example, Aue and Gamon compared the sentiment classiﬁcation performances of four different approaches that employ support vector machines (SVMs) and naïve Bayes classiﬁers while adjusting the source domain’s training data and features [24]. Tan et al. tackled the domain transfer problem by ﬁrst training a base transductive support vector machine classiﬁer in the old domain, then using it to select informative unlabeled samples in the new domain, and ﬁnally retraining the old classiﬁer over the selected samples [25]. Yoshida et al. proposed a Bayesian probabilistic generative model to associate each word with a domain label, domain dependence/independence, and a word polarity to handle multiple source domains and multiple target domains [26]. Other approaches used structural correspondence learning (SCL) [27], spectral feature alignment (SFA) [28], joint sentiment topic (JST) model [29], and part-of-speech (POS) based model [30], to learn the sentiment classiﬁer using a single domain corpus and then adjusting the classiﬁer to a different domain. Our method uses labeled data of a domain to update the sentiment dictionary for improved classiﬁcation for the domain, instead of adapting a sentiment classiﬁer trained with data of another domain. This has the advantage of transparency, meaning that our dictionary-based approach allows human intuition-based revision of dictionary entry words to realize domain adaptation without any target domain data. For example, the negative prior polarity possessed by the word ‘‘conspiracy’’ is appropriate in many domains, but in the mystery novel domain, the presence of ‘‘conspiracy’’ is a favorable factor requiring the sentiment polarity to be switched to positive. By manually switching the polarity of ‘‘conspiracy’’ in the sentiment dictionary from negative to positive based on human intuition, our method can achieve domain adaptation without any domain data. Dictionary-based approaches have the added advantage of ensuring a minimum baseline performance using the default (prior polarity) sentiment dictionary in the absence of domain data. Leveraging both the machine learning approach and the dictionary-based approach to improve sentiment classiﬁcation was presented by Dang et al., where they added new features generated using a sentiment lexicon to the content-free and content-speciﬁc features to build an SVM classiﬁer [31]. Their sentiment lexicon feature-added machine learning method achieved an average binary sentiment classiﬁcation accuracy of 82.1% across ﬁve different product reviews, whereas our purely dictionary-based method, which calculates the review sentiment scores by averaging the dictionary-matched sentiment values, achieved an average accuracy of 81.5% across three different product reviews. Although a direct comparison of the two methods is difﬁcult, the results suggest that our method possibly achieves comparable performance to that of the machine learning technique even without any machine learning procedures. To summarize, many of the existing approaches build various statistical classiﬁers for sentiment classiﬁcation, but we only calculate a simple average of the product reviews’ sentiment values and do not build any

63

statistical classiﬁer. Note that we do use training data to update the existing sentiment dictionary to adapt to a given domain and to set the sentiment classiﬁcation threshold. Our approach, however, is much simpler in that we only calculate the ratio of the positive and negative reviews and compare it to the ratio of a given sentiment word in positive and negative reviews. Assuming that the sentiment lexicons have domain-speciﬁc sentiment values, the sentiment classiﬁcation performance of a given text may vary according to how the sentiment value of the overall text is calculated. Polanyi and Zaenen stressed the importance of handling negation (e.g., ‘‘not good’’) and intensiﬁcation (e.g., ‘‘very good’’), which they named as contextual valence shifters [32]. Kennedy and Inkpen improved the performance of movie review sentiment classiﬁcation by considering negation, intensiﬁers and diminishers [33]. Taboada et al.’s lexicon-based sentiment analysis method also handled negation and intensiﬁcation by coupling hand-crafted high-quality sentiment dictionary with detailed sentence analysis [10]. Although all of these sentiment calculation methods, which can handle negation and intensiﬁcation, are more complex than ours, our method keeps the sentence processing to a minimum by simply calculating the arithmetic mean of the sentiment values. Instead, our method focuses on the effective domain adaptation of the sentiment dictionary in order to achieve satisfactory classiﬁcation performance. Ding et al. proposed a holistic lexicon-based approach that integrates various sentiment constructs including words, clauses, or sentences within opinion texts [34]. Consequently, they deﬁned negation rules and ‘‘but’’ clause rules to handle negations, and intra-sentence and inter-sentence conjunction rules to calculate sentiment orientation of sentential contexts. In contrast, our approach focuses on the word-level sentiment score aggregation to arrive at the overall sentiment score of the product review, and do not deﬁne or employ any pre-deﬁned rules. Our main focus is on the revision and utilization of the domain-adapted sentiment dictionary to achieve better sentiment classiﬁcation performance. We do this by removing the noisy dictionary entry words from the existing sentiment dictionary and switching the polarity of selected dictionary entry words based on the difference of the positive and negative review ratio and the ratio of matched sentiment words in positive and negative reviews. 3. Material and method The proposed method improves the positive vs. negative classiﬁcation performance of product reviews by merging, removing, and switching the entry words of the multiple sentiment dictionaries. Here, ten state-of-the-art sentiment lexicons, AFINN [11], General Inquirer [14], Micro-WNOp [7], Opinion Lexicon [12], SenticNet [15], SentiSense [5], SentiWordNet [6], SO-CAL [10], Subjectivity Lexicon [13], and WordNet-Affect [8] are investigated, and the method of revising sentiment dictionaries is described. In the following, we explain the standardization of individual sentiment lexicons in Section 3.1, the method of merging and removing lexicons’ entry words and switching the entry words’ polarities in Section 3.2, and the procedure for setting threshold and slider parameters in Section 3.3. 3.1. Standardization of sentiment lexicons Table 1 compares the entry sizes and formats of the ten sentiment lexicons. They vary signiﬁcantly, as some sentiment lexicons deﬁne sentiment scores with varying numerical ranges while others deﬁne one or more sentiment categories such as positive, negative, neutral, or a variety of emotions such as joy and sadness. The different format makes a direct comparison between

64

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

Table 1 Comparison of ten sentiment lexicons. Sentiment resource

Entry size

Sentiment value

AFINN [11] General Inquirer [14] Micro-WNOp [7]

2477 Words 11,789 Words

No category is given. Each word has an integer score ranging between 5 (very negative) and 5 (very positive) Positive/Negative/Pstv/Ngtv/Pleasur/Pain/EMOT/etc. categories are given. No numerical scores are given

1105 Synsets/1960 words 6786 Words

Positive/negative/objective categories are given. Each category has a real-valued sentiment score within 0 and 1

15,143 Common sense concepts 2190 Synsets/4404 words 117,659 Synsets/ 155,287 words 6306 Words 8221 Words

Pleasantness/attention/sensitivity/aptitude/polarity categories are given. Each category has a real-valued sentiment score within 1 and 1 Joy/sadness/love/hate/despair/hope/etc. 14 emotion categories are given. No numerical scores are given

2872 Synsets/4552 words

Synsets are ﬁrst categorized into emotion/mood/trait/behavior/etc., and these categories are further categorized into positive/negative/ambiguous/neutral categories. No numerical scores are given

Opinion Lexicon [12] SenticNet [15] SentiSense [5] SentiWordNet [6] SO-CAL [10] Subjectivity Lexicon [13] WordNet-Affect [8]

Positive/negative categories are given. No numerical scores are given

Positive/negative/objective categories are given. Each category has a real-valued sentiment score within 0 and 1 No category is given. Each word has an integer score ranging between 5 (very negative) and 5 (very positive) Positive/negative/both/neutral categories are given. No numerical scores are given

the sentiment lexicons difﬁcult, so each lexicon is standardized to allow each entry word to have common systematic values. In this study, we standardize the sentiment lexicons in two ways: each entry word is standardized (1) to have a single real-value score ranging between 1.0 and 1.0 or (2) to have one of the three discrete values, 1, 0, or 1. We compare the effect of these two standardization approaches in Section 5. Note that, in the case of General Inquirer and WordNet-Affect, only those entries with sentiment-related categories are selected and standardized. Hereafter, we will call these standardized sentiment lexicons as sentiment dictionaries. Below we explain how each sentiment lexicon was standardized in detail. (1) AFINN: The sentiment scores were normalized from [5, 5] to [1, 1]. (2) General Inquirer (GI): GI contains 11,789 words that are belonged to one or more GI categories out of more than 180 GI categories. We considered Positive, Pstv, PosAff, Pleasur, Virtue, Complet, and Yes categories as positive and assigned words in these categories a sentiment score of 1. We then considered Negativ, Ngtv, NegAff, Pain, Vice, Fail, Negate, and No categories as negative and assigned words in these negative categories a sentiment score of 1. For the words belonged to both positive and negative categories, a sentiment score of 0 was assigned. As a result, 1738 words, 2113 words, and 71 words were respectively assigned 1, 1, and 0 as sentiment scores. (3) Micro-WNOp: Positive-negative paired sentiment scores given by human judges were averaged to produce a single real-value sentiment score. For example, the word harmonious#a#1 contains three paired sentiment scores, e.g., (1.0, 0), (1.0, 0), and (0.75, 0), so the standardized sentiment score was obtained as the average of these three scores, hence, 0.917. (4) Opinion Lexicon: Opinion Lexicon is composed of 2006 positive words and 4783 negative words. Positive words were assigned a sentiment score of 1 while negative words were assigned a score of 1. Three ambiguous words that were included in both the positive and negative lists were given a sentiment score of 0. (5) SenticNet: A total of 5726 and 14,244 common sense knowledge concepts are deﬁned in SenticNet 1.0 and SenticNet 2.0, respectively, where 4827 concepts exist in both versions. These two versions were merged to create a dictionary with

(6)

(7)

(8) (9)

(10)

15,143 unique concepts. The two versions of the SenticNet contain various sentiment-related scores such as pleasantness, attention, sensitivity, aptitude, and polarity, but only the polarity score was used. SentiSense: SentiSense contains synsets with emotional categories, and these categories were utilized to determine the sentiment score. The synsets with joy, love, hope, calmness, and like categories were assigned a sentiment score of 1, while fear, anger, disgust, despair, sadness, and hate categories were assigned a score of 1. Those with ambiguous, surprise, and anticipation categories were assigned a sentiment score of 0. SentiWordNet: The negative score was subtracted from the positive score to produce a single real number sentiment score within a range of [1, 1]. SO-CAL: The sentiment scores were normalized from [5, 5] to [1, 1]. Subjectivity Lexicon: Words having positive, negative, both, and neutral prior polarity were converted to have sentiment scores of 1, 1, 0, and 0, respectively. WordNet-Affect: Synsets were divided into affective hierarchical categories such as positive-emotion, negative-emotion, ambiguous-emotion, and neutral-emotion and were assigned sentiment scores of 1, 1, 0, and 0, respectively.

Note that in all WordNet-based resources, the different senses of a given word, e.g., happy#1, happy#2, . . ., were aggregated, and their sentiment scores were averaged. 3.2. Merging, removing, and switching dictionary entries The general idea for improving the sentiment dictionary is to (1) increase the matching of the sentiment bearing words, (2) remove those entry words that do not contribute to the classiﬁcation, and (3) revise the polarity of the sentiment values that go against the correct classiﬁcation. First, multiple sentiment dictionaries are merged by averaging the sentiment scores of the overlapping entry words to implement the ﬁrst idea. Merging the overlapping entries results in a larger dictionary with more entry words, consequently increasing the chance of matching unseen sentiment bearing words in the future reviews. Since we standardized various sentiment lexicons in Section 3.1 to allow sentiment lexicons having different formats to be merged, the sentiment classiﬁcation performance may be

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

affected by this standardization. For example, different sentiment value ranges (e.g., [1,9] compared with [1, 1]) lead to different threshold values for judging the positivity and the negativity of a given product review, and merging two lexicons having different formats will certainly alter the threshold value. Furthermore, popular entry words that appear across multiple dictionaries and unique words that appear only in a single dictionary will respectively have new (i.e., mean of all standardized sentiment values) and old sentiment values, which may be difﬁcult to be directly compared. Next, to implement the second idea, a dictionary entry word is removed if the ratio of its occurrence in positive and negative reviews is similar to the ratio of positive and negative reviews. The left panel of Fig. 1 illustrates this idea. In the ﬁgure, the four positive book reviews and the one negative book review all contain the word interested once. Therefore, the word occurrence ratio of interested in the positive vs. negative reviews is 4:1 because the word interested occurs four times in the positive review and once in the negative review. Consequently, the word occurrence ratio is the same as the ratio of the positive vs. negative reviews itself. If the positive vs. negative classiﬁcation is determined on the arithmetic mean of the review words’ sentiment values, the word interested will not contribute to the classiﬁcation since it is included in all of the reviews. Hence, we remove the word interested from the sentiment dictionary. By doing so, the other words that do contribute in determining the relevant class can be highlighted. Lastly, to implement the third idea, we switch the sentiment polarity of those dictionary entry words for which the difference between the positive vs. negative word occurrence ratio and the review polarity ratio has a different sign from the entry word’s polarity. The right panel of Fig. 1 illustrates this idea. The word betrayal occurs four times in the positive reviews and once in the negative reviews (4:1). Meanwhile, there are four positive reviews and two negative reviews (4:2). Subtracting the review ratio (4/2) from the word occurrence ratio (4/1) outputs a positive value (4/14/2 = 2), but the initial polarity of betrayal deﬁned in the sentiment dictionary is negative. Hence, the sentiment polarity is switched from negative to positive. Below, the pseudocode (Algorithm 1) and the detailed steps for performing the remove and switch operations are summarized.

65

Step 1. Prepare training data. Split the data into positive and negative reviews to obtain the positive vs. negative review ratio.

Rev iewRatio ¼ # ofPositiv eRev iews=# ofNegativ eRev iews Step 2. For each dictionary entry word, count its frequency in the positive vs. negative reviews to obtain the word occurrence ratio.

WordRatio ¼ WordFreqInPosRev iew=WordFreqInNegRev iew Step 3. Calculate the absolute ratio difference between the word ratio and the review ratio.

AbsRatioDifference ¼ jWordRatio Rev iewRatioj Step 4. Divide the absolute ratio difference by the review ratio to calculate the proportion of the ratio difference.

ProportionOfRatioDifference ¼ AbsRatioDifference=Rev iewRatio Step 5. Set a value (called the slider value) to determine which dictionary entry to be removed and which sentiment polarity to be switched.

Fig. 1. Examples of remove (left) and switch (right) operations.

66

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

3.3. Setting threshold and slider value

calculation, and evaluation metric described in the following subsections.

Because each sentiment dictionary contains different entry words and sentiment values, the threshold for judging the positivity and negativity of a product review must be set differently for each dictionary. Based on the threshold, a product review is judged as positive if the sentiment score of the overall review is greater than or equal to the threshold, or as negative, otherwise. Moreover, the slider value for selectively removing and switching dictionary entries discussed in Step 5 of Section 3.2 also must be set during the training phase to improve the sentiment classiﬁcation performance on test data. Here, the method of automatically setting the threshold and the slider value of each dictionary is presented. Fig. 2 shows a sentiment classiﬁcation accuracy table for smartphone reviews according to different thresholds and slider values. Column one lists the threshold values ranging between 0.15 and 0.30 with an interval of 0.01. Columns two to seven display the sentiment classiﬁcation accuracies of the revised sentiment dictionaries using six different slider values (0.0–0.5). Column eight lists the average accuracy across the six slider values for each threshold. In order to determine the optimal threshold and slider value, the following procedures are performed on training data. First, the classiﬁcation accuracies of the entire threshold and slider combinations are calculated, as in Fig. 2. Next, the three consecutive average accuracies (Fig. 2 column eight, the curly braces) are summed and compared to ﬁnd the greatest sum of the average accuracy triple (Fig. 2, the bold rectangle). The triple determines the three threshold candidates (Fig. 2 column one, 0.21, 0.22, and 0.23). Once the threshold candidates are determined, the sum of the triple accuracies of each slider value is compared with one another (Fig. 2, the columns two to seven inside the bold rectangle). Then the slider value with the greatest triple accuracy sum is selected (Fig. 2, ﬁrst column). Finally, among the three selected accuracies, the largest accuracy is selected (i.e., bold and underlined accuracy of 0.761). The threshold and the slider value corresponding to 0.761 are selected as the ﬁnal threshold and slider value for the test data (i.e., numbers in the shaded cells: the threshold is 0.21 and the slider value is 0.0). 4. Experimental setup The performance of the proposed method is evaluated using the dataset, sentiment dictionaries, review sentiment score

4.1. Dataset Online product reviews on Amazon.com were collected for smartphones, movies, and books to build a positive/negative review dataset of 17,500, 35,000, and 90,000 reviews, respectively. Table 2 summarizes the dataset composition. Each Amazon product review contained a star rating given by the reviewer. The most recommended products received 5-stars (positive reviews) whereas the least recommended products received 1-stars (negative reviews). For our dataset, the 5-star and 4-star reviews were labeled as positive reviews and the 1-star and 2-star reviews were labeled as negative reviews. The 3-star reviews were excluded from our dataset. 2500, 5000, and 10,000 reviews were randomly selected from the smartphone, movie, and book reviews to construct the training data for setting the threshold and slider values. The remaining data were used as test data. 4.2. Sentiment dictionary Ten sentiment dictionaries and one merged dictionary were used in the sentiment classiﬁcation experiment (Table 3). Note that when constructing the merged dictionary, we excluded SentiWordNet and merged nine dictionaries because including SentiWordNet would simply create an expanded version of the SentWordNet due to a signiﬁcant number of overlapping entry words between SentiWordNet and the remaining dictionaries. Table 3 shows the number of the dictionary entry words that match the words in each of the three product reviews in the dataset. Hereafter, the dictionary abbreviation given in the table will be used throughout the evaluation to indicate a given sentiment dictionary. Note that discrete-value sentiment dictionaries containing three discrete values, 1, 0, and 1 were additionally constructed with the eight dictionaries with real number sentiment values (i.e., AFN, MWN, STN, STS, SWN, SCL, WNA, and MRG). Here, the negative real numbers were replaced with 1, the positive real numbers with 1, and zero was left as is. We constructed discretevalue sentiment dictionaries in order to investigate the effect of discrete vs. real sentiment values in sentiment classiﬁcation. We report the result of discrete vs. real sentiment values in Section 5.1.

SLIDER THRES 0.15 0.16 0.17 0.18 0.19 0.20

0.0

0.1

0.2

0.3

0.4

0.5

AVG

0.749 0.750 0.753 0.754 0.751 0.750

0.744 0.744 0.744 0.742 0.742 0.741

0.740 0.742 0.740 0.739 0.740 0.739

0.720 0.724 0.724 0.726 0.728 0.728

0.702 0.705 0.706 0.709 0.711 0.710

0.659 0.662 0.666 0.669 0.670 0.670

0.719 0.721 0.722 0.723 0.724 0.723

0.21 0.22 0.23

0.761 0.759 0.756

0.744 0.744 0.741

0.748 0.746 0.743

0.742 0.741 0.740

0.728 0.728 0.728

0.698 0.698 0.699

0.736 0.735

0.24 0.25 0.26 0.27 0.28 0.29 0.30

0.752 0.746 0.740 0.730 0.724 0.715 0.711

0.738 0.735 0.728 0.720 0.716 0.708 0.703

0.741 0.738 0.733 0.726 0.722 0.714 0.711

0.739 0.738 0.738 0.737 0.734 0.729 0.726

0.728 0.728 0.728 0.728 0.726 0.724 0.722

0.699 0.700 0.701 0.703 0.703 0.701 0.701

0.733 0.731 0.728 0.724 0.721 0.715 0.712

0.737

Fig. 2. A sentiment classiﬁcation accuracy table according to different thresholds and slider values is automatically generated using the training data in order to determine the threshold and slider value used in testing.

67

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71 Table 2 Positive/negative dataset composition of customer product reviews. Data composition

Smartphone

Movie

Book

Total number of reviews Number of positive/negative reviews (Ratio) Number of training data for parameter setting [#Pos/#Neg] Number of test data [#Pos/#Neg]

17,500 12,500/5000 (2.5:1) 2500 [1785/715] 15,000 [10,715/4285]

35,000 30,000/5000 (6:1) 5000 [4290/710] 30,000 [25,710/4290]

90,000 81,000/9000 (9:1) 10,000 [9000/1000] 80,000 [72,000/8000]

Table 3 Size of review word-matched dictionary entry words. Dictionary

Abbr.

Smartphone

Movie

Book

Average

AFINN General Inquirer Micro-WNOp Opinion Lexicon SenticNet SentiSense SentiWordNet SO-CAL Subjectivity Lexicon WordNet-Affect Merged

AFN GIQ MWN OPL STN STS SWN SCL SBL WNA MRG

1117 2426 812 2742 3842 1896 12,724 2955 3302 486 6931

1642 3716 1247 4912 5834 3021 28,156 4937 5913 1022 12,045

1732 3869 1403 5534 6235 3389 38,488 5483 6581 1199 13,747

1497 3337 1154 4396 5304 2769 26,456 4458 5265 902 10,908

4.3. Review sentiment score calculation Each product review was tokenized, lemmatized, and part-ofspeeches tagged using the Stanford CoreNLP suite [35] to calculate the sentiment score of each review. Once the list of lemmatized review words was obtained, the sentiment score (Dj(wi)) of each word (wi) was looked up in the sentiment dictionary (Dj). The scores of all review words matching the dictionary (wi, i = 1, . . . , n where n is the number of matched dictionary words) were averaged to yield the Review Sentiment Score. n 1X Rev iewSentimentScoreðDj Þ ¼ Dj ðwi Þ n i¼1

As described in Section 3.3, the threshold and the slider value of each dictionary were automatically set using the training data. The threshold ranged between 0.1 and 0.3 with an interval of 0.005. The slider value ranged between 0.0 and 0.5 with an interval of 0.1. 4.4. Evaluation metric In order to measure the overall sentiment classiﬁcation performance of each dictionary, balanced accuracy was used since the product review dataset used in the experiment was an imbalanced dataset containing more positive reviews than negative reviews. Recallpos and Recallneg each measure the accuracy of positive and negative reviews, respectively.

training data and tested on 15,000, 30,000, and 80,000 test data. The performances across the three domains (smartphone, movie, and book) were averaged for each dictionary unless a speciﬁc domain is explicitly stated. Refer to Table 3 for the dictionaries’ abbreviations. 5.1. Real vs. discrete sentiment scores We ﬁrst compared the sentiment classiﬁcation performance for discrete (three values, 1, 0, and 1) and real number ([1.0, 1.0]) sentiment scores. Fig. 3 displays a pair-wise comparison of discrete and real number versions of eight sentiment dictionaries (see Section 3.1 for the discrete dictionary construction). The vertical axis indicates the averaged balanced sentiment classiﬁcation accuracy of each sentiment dictionary (discrete vs. real) across the three product review domains. While no signiﬁcant performance difference is observed for the six dictionaries (AFN, MWN, STS, SWN, SCL, WNA), the real number representation of sentiment values hugely outperforms the discrete value representation for STN and MRG. Based on this ﬁnding, the real number sentiment value system was employed for the eight dictionaries throughout the remainder of the evaluation experiments. 5.2. Original vs. ‘remove and switch’ operations How effective are the proposed remove and switch operations? The revised dictionaries’ sentiment classiﬁcation performances are compared to the corresponding original dictionaries in Fig. 4. We see that, in all cases, the revised dictionaries with the remove and switch operations applied outperform their original counterparts. The largest improvement is observed for STN (+14.1%) and MRG (+10.5%) whereas the least improvement is observed for MWN (+3.0%) and WNA (+2.3%). The average improvement observed across the eleven dictionaries is 7.4%. 5.3. Remove vs. switch operations How much do the remove and switch operations individually contribute to the performance improvement? Fig. 5 compares the

BalancedAccuracy ¼ 0:5 Recallpos þ 0:5 Recallneg true positives true positives þ false negatives true negatives ¼ true negatives þ false positives

Recallpos ¼ Recallneg

5. Results and discussion The performance of the sentiment dictionaries was evaluated under various conditions. All dictionaries discussed below were revised through the remove and switch operations. Only those mentioned as ‘‘original’’ were used without any revision. The revised dictionaries were constructed using 2500, 5000, and 10,000

Fig. 3. Real vs. discrete sentiment values.

68

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

Fig. 4. Original vs. revised (with the remove and switch operations) dictionary.

Fig. 6. Performance and the size of review-matched dictionary words.

Fig. 5. Remove vs. switch operations.

individual contributions of the remove and switch operations on each dictionary. We see that the switch operation is a major contributor to the performance improvement. Nevertheless, the best performance is obtained when both operations are applied.

Fig. 7. Performance across different product reviews.

5.4. Size of review-matched dictionary words Although bigger dictionaries are expected to match more review words, does a greater number of review-matched dictionary words lead to better classiﬁcation performance? The relationship between the number of matched dictionary entry words and the performance is investigated in Fig. 6. The horizontal axis in Fig. 6 indicates the average number of review-matched dictionary words of each sentiment dictionaries across the three product review domains. We see that the two smallest dictionaries, i.e., MWN (average review-matched word size: 1154; see Table 3) and WNA (word size: 902), perform poorly. There seems to be a correlation that larger is better. However, more matched words do not necessarily guarantee better performance. For example, SWN (26,456 words matched) performs worse than MRG (10,908 words matched). This conﬁrms that not only the size, but the selectivity of the sentiment words matter in the lexicon-based review classiﬁcation. 5.5. Different domains Fig. 7 compares the performances of each dictionary across the three different product review domains, i.e., smartphones, movies, and books. In addition, we compared the standard deviation of the three balanced accuracies for each dictionary. SCL (standard deviation of the three balanced accuracies: 0.351) and SBL (0.379) yield relatively stable performance while STS (2.214), STN (2.146), and SWN (1.986) show larger variances in cross-domain performances.

In all cases of smartphone, movie, and book reviews, MRG performs the best with balanced accuracies of 82.6%, 80.1%, and 81.8%, respectively, which demonstrates the applicability of the proposed merge, remove, and switch method across different domains.

5.6. Different threshold and training data size How do the threshold and slider value interact? How much training data for setting the threshold and slider value are needed to improve the sentiment dictionary? Figs. 8–10 each show the smartphone, movie, and book review classiﬁcation performance of the merged and revised dictionaries using different thresholds (horizontal axis ranging 0.10 to 0.30) and slider values (dict_0.00, dict_0.10, . . . , dict_0.50 curves). The dict_orig curve indicates the merged dictionary prior to the application of the remove and switch operations. As the slider value increases, we see that the threshold for the peaks also increases as seen by the curves moving toward right in the two cases (Figs. 8 and 9). Since the slider value determines the amount of dictionary entry words to be removed and the sentiment polarity to be switched, we conjecture that the dictionaries revised using larger slider values possibly end up with containing a greater positive total sum than those with smaller slider values. Moreover, the curves in Fig. 10 being distinct from those in Figs. 8 and 9 is presumed to be the result of active switching of the

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

Fig. 8. Sentiment classiﬁcation accuracy (%) of smartphone reviews using different thresholds (horizontal axis, 0.1 to 0.3) and slider values (dict_0.00, . . ., dict_0.50).

Fig. 9. Sentiment classiﬁcation accuracy of movie reviews using different thresholds and slider values.

69

Fig. 10. Sentiment classiﬁcation accuracy of book reviews using different thresholds and slider values.

Fig. 11. Sentiment classiﬁcation performance of book reviews using the merged dictionary trained on different data sizes (dict_00100, . . ., dict_10000). Optimal slider value was set differently for each case.

5.7. Discussion sentiment polarity of many dictionary entry words, whose prior sentiment polarities are listed as positive in the original dictionary, to negative. Fig. 11 shows the book classiﬁcation accuracy of merged, removed, and switched dictionaries built using training data of different sizes (dict_00100, dict_00250, ..., dict_10000). Notice that the revised dictionaries start to outperform the original dictionary (68.1%) when the training data size hits and exceeds 1000 (dict_01000). In the case of smartphone and movie reviews, the revised dictionaries outperform the original dictionary at dict_00250 and dict_00500. These ﬁndings indicate that the revised dictionary requires a certain number of training data to outperform the original dictionary. Furthermore, we ﬁnd that the performance continues to increase as more training data are fed into the remove and switch operations until the performance saturates. Note that dict_0.30 curve in Fig. 10 and dict_10000 curve in Fig. 11 are identical.

Dictionary-based approaches usually have limited word coverage and may fail to identify domain-speciﬁc sentiment words not deﬁned in the sentiment lexicon. This prompted many existing approaches to building domain-speciﬁc sentiment lexicons using a domain corpus, lexical heuristics, seed words, etc. We, on the other hand, turned our attention to the variety of existing sentiment lexicons that are the ﬁnal product of the sentiment lexicon generation. Our initial motivation for combining multiple sentiment lexicons was to expand the sentiment lexicon’s coverage in order to ameliorate the individual lexicon’s limited word coverage. At the outset, we surmised that combining multiple dictionaries will have the following effects: (1) the word coverage will broaden and different dictionaries will complement each other; and (2) the numerical sentiment scores will be updated to incorporate diverse measurements leading to less odd scores. However, broader

70

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

coverage did not necessarily guaranteed better sentiment classiﬁcation performance since words irrelevant to sentiment analysis were matched to generate noise. By incorporating the remove operation, we reduced such noise by eliminating those words that did not contribute to the sentiment analysis. Examples of removed words in the book review domain included book, story, novel, mystery, communicate, and interested. The remove operation seems to have the effect of removing words that are ever-present in both the positive and negative product reviews. Finding such words manually would be tedious and time-consuming, and we think this is where we can beneﬁt from the remove operation. Regarding the leveling of the sentiment scores (assumption (2) above), we found that the contextual adjustment of the sentiment scores was crucial for each product review domain. Consequently, we proposed the switch operation that switches the sentiment polarity of the selected dictionary entries based on the ratio difference of positive/negative word-review occurrences. Examples of the initially negative words switched to positive in the book review domain included horror, murder, betrayal, dilemma, and conspiracy. Although these words in general possess negative prior polarity, they are the subject matters that add excitement to the book and hence should be considered as positive in the book domain. On the other hand, the initially positive sentiment words switched to negative were beautify, conveniently, properly, substantially, and benevolence. These words were employed as negative words in some contexts as in the case, ‘‘the hero is conveniently wealthy.’’ One of the advantages of the dictionary-based approach is that it works even in such situations where no labeled data are available. In terms of lexical resources, only the sentiment dictionary is needed to calculate the overall sentiment of a product review. In this sense, we can view our method as employing a hybrid approach that combines both the supervised and unsupervised approaches. That is, we use the target domain’s training data to improve the sentiment dictionary (supervised), and we calculate the overall sentiment score of a product review using the improved dictionary for sentiment classiﬁcation (unsupervised). We also have the added advantage of obtaining a domain-speciﬁc sentiment dictionary and a deeper understanding of the domain-speciﬁc sentiment values. The improved dictionary can be reused ‘as is’ in similar domains (e.g., hotel and restaurant product reviews), since the domain-speciﬁc lexicons are similar (e.g., small, closed, loud, speedy, unknown, etc.) and the calculation of the overall sentiment is simple and straightforward. One of the requirements of our approach is that a minimum amount of labeled data is required to revise the sentiment dictionary for outperforming the original dictionary. However, the same can be said for the supervised approaches to reach usable performance. Fortunately, the time and effort for acquiring labeled data has decreased in recent years with various resources (e.g., labeled datasets, natural language processing tools, etc.) now being publically available. The advancement of technologies for processing large amounts of texts (or big data) is also a plus to data-driven approaches such as ours. The generation of the accuracy table (see Fig. 2) for setting the threshold and slider value may seem as requiring large amounts of calculation, but the actual calculation involved is a simple averaging of sentiment values. Hence, the overall calculation involved in generating the accuracy table is not expensive. The method of averaging the sentiment values of the overall review (i.e., review sentiment score) may be too simplistic, but we wanted to focus on the improvement of the sentiment dictionary while other factors such as sentence calculation are kept basic. The handling of more complex sentence calculation such as negation and intensiﬁcation using the revised dictionary will be the target of our future study.

Note that prior to the application of the remove and switch operations on the merged dictionary, we experimented with averaging, weighted-averaging, and majority voting of the multiple dictionaries’ sentiment classiﬁcation results. These approaches, however, did not yield enough improvement. This led us to dig deeper into the dictionaries at the dictionary entry word level, and the integration, i.e., merging, removing, and switching, of the multiple dictionaries’ entry words was proven to be better than the superﬁcial integration of the individual dictionaries’ classiﬁcation results. Last but not least, in Section 5.4, we discovered that the proposed remove and switch operations work better on the dictionaries with large entry sizes. This may be natural since the proposed method includes the remove operation. We anticipate that the existing sentiment dictionaries could be further improved by manually adding unlisted sentiment words relevant to the domain and performing remove and switch operations. 6. Conclusions An effective data-based method for revising sentiment dictionaries was proposed to improve lexicon-based sentiment classiﬁcation of product reviews. The proposed method introduces the integration of multiple individual sentiment dictionaries by merging and removing the dictionary entries, and by switching the dictionary entries’ sentiment polarities. The proposed method leverages the skewed distribution of the positive/negative training data to revise the original sentiment dictionary. The difference between the positive/negative word occurrence ratio and the positive/negative review ratio is calculated to determine whether a given dictionary entry word should be removed, or the entry word’s prior polarity should be switched. The proposed method removes what we believed to be words less contributing to the sentiment analysis. Moreover, domain adaptation of dictionary entry words is realized by switching the sentiment polarity of the selected entry words based on the ratio difference of the positive/negative word-review occurrences. With the remove and switch operations, the sentiment classiﬁcation accuracy of the merged dictionary was successfully improved. We believe that the proposed method will be useful when ample training data are available, but a dictionary expert is absent or costly. Our approach beneﬁts from the data-driven trends in recent years to build quality sentiment dictionaries for sentiment classiﬁcation. Acknowledgment This research was supported by the Ministry of Science, ICT and Future Planning (MSIP), Korea, under the ‘‘IT Consilience Creative Program’’ (NIPA-2014-H0201-14-1002) supervised by the National IT Industry Promotion Agency (NIPA). We would like to thank Julian Brooke and Maite Taboada for providing us with SO-CAL sentiment dictionaries. References [1] B. Pang, L. Lee, Opinion mining and sentiment analysis, Found. Trends Inf. Retrieval Now (2008). [2] B. Liu, Opinion mining and sentiment analysis, in: Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2012. [3] B. Liu, Sentiment analysis and subjectivity, in: Handbook of Natural Language Processing, second ed., Taylor and Francis Group, Boca, 2010. [4] E. Cambria, B. Schuller, Y. Xia, C. Havasi, New avenues in opinion mining and sentiment analysis, IEEE Intell. Syst. 28 (2013) 15–21. [5] J.C. de Albornoz, L. Plaza, P. Gervas, Sentisense: An easily scalable conceptbased affective lexicon for sentiment analysis, in: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), 2012, pp. 3562–3567. [6] S. Baccianella, A. Esuli, F. Sebastiani, Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining, in: Proceedings of the 7th

H. Cho et al. / Knowledge-Based Systems 71 (2014) 61–71

[7]

[8]

[9] [10] [11] [12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

International Conference on Language Resources and Evaluation (LREC 2010), 2010, pp. 2200–2204. S. Cerini, V. Compagnoni, A. Demontis, M. Formentelli, C. Gandini, MicroWNOp: a gold standard for the evaluation of automatically compiled lexical resources for opinion mining, in: A. Sanso (Ed.), Language Resources and Linguistic Theory, Franco Angeli, 2007, pp. 200–210. C. Strapparava, A. Valitutti, WordNet-affect: an affective extension of WordNet, in: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), 2004, pp. 1083–1086. C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database, MIT Press, 1998. M. Taboada, J. Brooke, M. Toﬁloski, K. Voll, M. Stede, Lexicon-based methods for sentiment analysis, Comput. Linguist. 37 (2011) 267–307. F.A. Nielsen, A new ANEW: evaluation of a word list for sentiment analysis in microblogs, CoRR (2011). abs/1103.2903. M. Hu, B. Liu, Mining and summarizing customer reviews, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), 2004, pp. 168–177. E. Riloff, J. Wiebe, Learning extraction patterns for subjective expressions, in: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), 2003, pp. 105–112. P.J. Stone, E.B. Hunt, A computer approach to content analysis: studies using the general inquirer system, in: Proceedings of the Spring Joint Computer Conference (AFIPS 1963), 1963, pp. 241–256. E. Cambria, C. Havasi, A. Hussain, Senticnet 2: a semantic and affective resource for opinion mining and sentiment analysis, in: Proceedings of the 25th Florida Artiﬁcial Intelligence Research Society Conference (FLAIRS 2012), 2012, pp. 202–207. P.D. Turney, M.L. Littman, Measuring praise and criticism: inference of semantic orientation from association, ACM Trans. Inf. Syst. 21 (4) (2003) 315–346. A. Fahrni, M. Klenner, Old wine or warm beer: target-speciﬁc sentiment analysis of adjectives, in: Proceedings of the 2008 Affective Language in Human and Machine Symposium, AISB 2008 Convention, 2008. L. Yu, J. Wu, P. Chang, H. Chu, Using a contextual entropy model to expand emotion words and their intensity for the sentiment classiﬁcation of stock market news, Knowl.-Based Syst. 41 (2013) 89–97. S. Huang, Z. Niu, C. Shi, Automatic construction of domain-speciﬁc sentiment lexicon based on constrained label propagation, Knowl.-Based Syst. 56 (2014) 191–200. G. Qiu, B. Liu, J. Bu, C. Chen, Expanding domain sentiment lexicon through double propagation, in: Proceedings of the 21st International Joint Conference on Artiﬁcial Intelligence, 2009, pp. 1199–1204. H. Kanayama, T. Nasukawa, Fully automatic lexicon expansion for domainoriented sentiment analysis, in: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, 2006, pp. 355–363.

71

[22] A. Neviarouskaya, H. Prendinger, M. Ishizuka, SentiFul: a lexicon for sentiment analysis, IEEE Trans. Affect. Comput. 2 (1) (2011) 22–36. [23] Y. Lu, M. Castellanos, U. Dayal, C. Zhai, Automatic construction of a contextaware sentiment lexicon: An optimization approach, in: Proceedings of the 20th International Conference on World Wide Web, 2011, pp. 347–356. [24] A. Aue, M. Gamon, Customizing sentiment classiﬁers to new domains: A case study, in: Proceedings of Recent Advances in Natural Language Processing (RANLP), 2005. [25] S. Tan, G. Wu, H. Tang, X. Cheung, A novel scheme for domain-transfer problem in the context of sentiment analysis, in: Proceedings of the 16th ACM on Conference on Information and Knowledge Management, 2007, pp. 979–982. [26] Y. Yoshida, T. Hiro, T. Iwata, M. Nagata, Y. Matsumoto, Transfer learning for multiple-domain sentiment analysis identifying domain dependent/ independent word polarity, in: Proceedings of the 25th AAAI Conference on Artiﬁcial Intelligence, 2011, pp. 1286–1291. [27] J. Blitzer, M. Dredze, F. Pereira, Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classiﬁcation, in: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 2007, pp. 440–447. [28] S.J. Pan, X. Ni, J. Sun, Q. Yang, Z. Chen, Cross-domain sentiment classiﬁcation via spectral feature alignment, in: Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 751–760. [29] Y. He, C. Lin, H. Alani, Automatically extracting polarity-bearing topics for cross-domain sentiment classiﬁcation, in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011, pp. 123–131. [30] R. Xia, C. Zong, A POS-based ensemble model for cross-domain sentiment classiﬁcation, in: Proceedings of the 5th International Joint Conference on Natural Language Processing, 2011, pp. 614–622. [31] Y. Dang, Y. Zhang, H. Chen, A lexicon-enhanced method for sentiment classiﬁcation: an experiment on online product reviews, IEEE Intell. Syst. 25 (2010) 46–53. [32] L. Polanyi, A. Zaenen, Contextual valence shifters, in: Computing Attitude and Affect in Text: Theory and Applications, 2006, pp. 1–10. [33] A. Kennedy, D. Inkpen, Sentiment classiﬁcation of movie reviews using contextual valence shifters, Comput. Intell. 22 (2006) 110–125. [34] X. Ding, B. Liu, P.S. Yu, A holistic lexicon-based approach to opinion mining, in: Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM 2008), 2008, pp. 231–240. [35] K. Toutanova, D. Klein, C.D. Manning, Y. Singer, Feature-rich part-of-speech tagging with a cyclic dependency network, in: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003), vol. 1, 2003, pp. 173–180.

Data-driven integration of multiple sentiment ...

Jun 13, 2014 - are based on the English lexical database WordNet [9], SO-CAL ...... Positive/negative dataset composition of customer product reviews. Data ...

Download PDF

2MB Sizes 2 Downloads 261 Views

Report

Data-driven integration of multiple sentiment ...

Recommend Documents