Rating Prediction using Feature Words Extracted from ...

Viewer
Transcript

Rating Prediction using Feature Words Extracted from Customer Reviews Masanao Ochi1

Makoto Okabe2

Rikio Onai3

The University of Electro-Communications1,2,3 , JST PRESTO2 1-5-1, Chofugaoka, Chofu, Tokyo, Japan1,2,3

{ochi, onai}@onailab.com1,3

[email protected]

ABSTRACT

Restaurant A

c

We developed a simple method of improving the accuracy of rating prediction using feature words extracted from customer reviews. Many rating predictors work well for a small and dense dataset of customer reviews. However, a practical dataset tends to be large and sparse, because it often includes too many products for each customer to buy and evaluate. Data sparseness reduces prediction accuracy. To improve accuracy, we reduced the dimension of the feature vector using feature words extracted by analyzing the relationship between ratings and accompanying review comments instead of using ratings. We applied our method to the Pranking algorithm and evaluated it on a corpus of golf course reviews supplied by a Japanese e-commerce company. We found that by successfully reducing data sparseness, our method improves prediction accuracy as measured using RankLoss.

Restaurant B

2 The food is very terrible

Categories and Subject Descriptors

3 3

2.5

3 It is not so terrible

c

H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—information filtering

terrible

4.5

3 Dishes looked delicious

c

c

Rest. A Rest. B Rest. C delicious

c

c

Restaurant C

5 It is very delicious

4 I enjoyed delicious cuisine 3 Not very terrible food

Figure 1: An illustration of our approach in the case of restaurant reviews. Left: The original dataset has ratings only in the blue cells and the whole matrix is sparse. Right: Instead of using customers, we use the feature vector of feature words extracted from the review comments. This reduces the dimension of the feature vector from 6 to 2, so the whole matrix becomes dense. The score of each cell is an average of the scores rated with each word. reduced the dimension of the feature vector and increased the density of the matrix by extracting “delicious” and “terrible” as feature words. We extracted a word as a feature word when it appeared frequently and was accompanied with a polarized score. The words “delicious” and “terrible” appeared many times with high and low scores and therefore were selected as feature words. “Very” was not selected as a feature word; although it also appeared many times, its accompanying score varied. Our method of analyzing ratings and accompanying review comments is a simple but powerful way of extracting feature words that can capture a characteristic of each rating aspect well. This is often difficult to implement by relying only on natural language-processing techniques or other corpora such as WordNet [3, 4].

General Terms Experimentation

Keywords sentiment analysis, review mining, rating prediction

1. INTRODUCTION Rating prediction is a practically important problem, because it enables the market to estimate how satisfied a customer will be with a service [1, 2, 5]. Given a dataset of customer reviews, rating predictors (like the Pranking algorithm [1]) can learn and predict the rating of a target customer. These algorithms work well for a small and dense dataset. However, the problem is that in practice a dataset is large and sparse, because it often includes too many products for each customer to buy and evaluate (Figure 1). The dimension of the feature vector increases as fast as the increase in the number of products. For example, the matrix of the EachMovie dataset is reportedly sparse (5% full) [5]. To address this problem, we developed a way to reduce the dimension of a feature vector using feature words extracted by analyzing the relationship between ratings and accompanying review comments. Figure 1 illustrates our approach. We

2.

EFFECT OF DATA DENSITY

As shown in Figure 1, the original dataset is often sparse. Crammer et al. proposed filling in blank cells with an intermediate score [1], which results in low prediction accuracy. We evaluated the relationship between data sparseness and the accuracy of the Pranking algorithm. Given a target customer and a product, the Pranking algorithm predicts the rating that the customer is likely to give the product by learning from the dataset how the customer has rated other products in the past. We used the Book-Crossing Dataset1 , which is a relatively dense dataset because of the

Copyright is held by the author/owner(s). SIGIR’11, July 24–28, 2011, Beijing, China. ACM 978-1-4503-0757-4/11/07.

1

1205

http://www.informatik.uni-freiburg.de/∼cziegler/BX/

㻻㼢㼑㼞㼍㼘㼘㼣㼛㼞㼐㼍㼢㼑㼞㼍㼓㼑㼓㼞㼑㼍㼠㻠㻚㻢㻾㼛㼜㼑㻔㼓㼛㼘㼒㻌㼏㼛㼡㼞㼟㼑㻕㻠㻚㻡㼜㼞㼍㼕㼟㼑㻠㻚㻡㼟㼜㼘㼑㼚㼐㼕㼐㻠㻚㻠㼙㼕㼚㼐㼒㼡㼘㻠㻚㻠㼎㼑㼍㼡㼠㼕㼒㼡㼘㻠㻚㻠㼠㼔㼛㼡㼓㼔㼠㼒㼡㼘㻠㻚㻠㼜㼑㼞㼒㼑㼏㼠㻠㻚㻠㼞㼕㼏㼔㻠㻚㻠㼠㼔㼞㼕㼘㼘㼕㼚㼓㻠㻚㻠

㼣㼛㼞㼐㼚㼍㼟㼠㼥㼏㼔㼑㼍㼜㼎㼍㼐㼍㼟㼗㼕㼚㼓㼍㼜㼛㼘㼛㼓㼥㼍㼚㼓㼑㼞㼒㼕㼚㼍㼘㼘㼥㼑㼙㼎㼍㼞㼞㼍㼟㼟㼕㼚㼓㼘㼥㼍㼟㼟㼐㼕㼞㼠㼥

㼍㼢㼑㼞㼍㼓㼑㻞㻚㻡㻞㻚㻢㻞㻚㻣㻞㻚㻤㻞㻚㻤㻞㻚㻤㻞㻚㻤㻞㻚㻤㻞㻚㻥㻞㻚㻥

㻯㼛㼡㼞㼟㼑㼣㼛㼞㼐㼍㼢㼑㼞㼍㼓㼑㻵㼚㼛㼡㼑㻔㼚㼍㼙㼑㻕㻠㻚㻡㻿㼑㼕㼕㼏㼔㼕㻔㼚㼍㼙㼑㻕㻠㻚㻡㼓㼞㼑㼍㼠㻠㻚㻠㼟㼗㼕㼘㼘㼒㼡㼘㻠㻚㻠㻵㼟㼔㼕㼛㼗㼍㻔㼓㼛㼘㼒㻌㼏㼛㼡㼞㼟㼑㻕㻠㻚㻠㼍㼓㼕㼠㼍㼠㼑㻠㻚㻠㼙㼛㼠㼕㼢㼍㼠㼑㻠㻚㻟㼞㼍㼞㼕㼠㼥㻠㻚㻟㼢㼍㼞㼕㼑㼠㼥㻠㻚㻟㼏㼔㼍㼙㼜㼕㼛㼚㼟㼔㼕㼜㻠㻚㻟

㼣㼛㼞㼐㼘㼍㼏㼗㼚㼍㼟㼠㼥㼏㼔㼑㼍㼜㼣㼑㼑㼐㼑㼚㼒㼛㼞㼏㼑㼐㼞㼕㼢㼑㼞㼟㼍㼚㼐㻌㼜㼕㼠㼙㼛㼚㼛㼠㼛㼚㼑㼓㼡㼘㼘㼥㼜㼍㼠㼏㼔㼥

㻯㼛㼟㼠㼋㼜㼑㼞㼒㼛㼞㼙㼍㼚㼏㼑㼣㼛㼞㼐㼍㼢㼑㼞㼍㼓㼑㼗㼕㼘㼘㼑㼞㻌㼜㼞㼕㼏㼑㻠㻚㻢㼞㼑㼍㼘㻌㼏㼔㼑㼍㼜㻠㻚㻢㼜㼑㼞㼒㼛㼞㼙㼍㼚㼏㼑㻠㻚㻡㼎㼑㼠㼠㼑㼞㻠㻚㻠㼏㼛㼟㼠㻠㻚㻠㻯㻼㻠㻚㻠㼏㼔㼑㼍㼜㻠㻚㻠㼒㼕㼞㼑㻙㼟㼍㼘㼑㻠㻚㻟㼓㼞㼑㼍㼠㻠㻚㻟㼛㼡㼠㼏㼘㼍㼟㼟㼕㼚㼓㻠㻚㻟

㼍㼢㼑㼞㼍㼓㼑㻟㻚㻜㻟㻚㻜㻟㻚㻜㻟㻚㻜㻟㻚㻜㻟㻚㻝㻟㻚㻝㻟㻚㻝㻟㻚㻝㻟㻚㻝

㼣㼛㼞㼐㼞㼑㼘㼍㼠㼕㼢㼑㻌㼑㼤㼜㼑㼚㼟㼑㼜㼑㼚㼚㼥㻙㼣㼕㼟㼑㼍㼟㼗㼕㼚㼓㼠㼑㼞㼞㼕㼎㼘㼑㼍㼜㼛㼘㼛㼓㼥㼑㼙㼎㼍㼞㼞㼍㼟㼟㼕㼚㼓㼘㼥㼍㼞㼞㼛㼓㼍㼚㼠㼙㼑㼟㼟㼥㼍㼟㼟㼏㼛㼙㼜㼘㼍㼕㼚㼠

㼍㼢㼑㼞㼍㼓㼑㻟㻚㻜㻟㻚㻝㻟㻚㻞㻟㻚㻞㻟㻚㻞㻟㻚㻟㻟㻚㻟㻟㻚㻟㻟㻚㻟㻟㻚㻟

Figure 2: Feature words extracted from customer reviews. All reviews were written in Japanese, and we translated each extracted word into English.

(a) RankLoss-DataDensity

(b)

(c)

overall

course

1.15

1.3

1.1 1.05

0.28

1.1 1.1

RankLoss

RankLoss

RankLoss

1 0.27

0.95 0.9 0.85

0.26

0.8

0.75 0.7 0

0.05

0.1

0.15

1 0.9

0.8

0.25

1.2 1.15

1.2

RankLoss

0.29

(d) cost_performance

1.05 1

customer

0.9

word_avg

0.85

0.7 1

DataDensity

6

11

16

base

0.95

0.8 1

6

itera on

11

16

itera on

1

6

11

16

itera on

Figure 3: (a) The performance of a rating predictor depends on the data density. (b)-(d) The RankLoss for each aspect. Base means prediction of an intermediate score, 3 in this case. Customer means customer ratings are used as the feature vector. Word avg means feature words are used as the feature vector. nature of the service. The original matrix of this dataset is 15% full. We generated a sparse dataset by removing data from randomly selected cells and applied it to the Pranking algorithm. We evaluated prediction accuracy using RanPT |ˆyt −yt | kLoss [1], RankLoss = . T is the number of t T products, yˆt is the t-th predicted output score, and yt is the t-th desired output score. As shown in Figure 3 (a), the sparser the dataset becomes, the lower the prediction accuracy becomes. This result motivated us to solve the problem of data sparseness by reducing the dimension of the feature vector.

rating is caused by complaints about the conditions of a golf course. We used the extracted feature words as the feature vector of the Pranking algorithm and reduced the dimension to increase the density of the data. The original dataset had 85,000 customers and was sparse (0.5% full). We selected 100 feature words with high and low average scores, thereby reducing the dimension of the feature vector from 85,000 to 100. As a result, data density was increased (30% full). We evaluated the RankLoss (Figure 3 (b)-(d)), and our results outperformed the results of the original sparse dataset.

3. EXPERIMENT AND RESULTS

4.

We applied our technique to the Pranking algorithm and evaluated it on a corpus of golf course reviews supplied by a Japanese e-commerce company2 . Each review was accompanied by a set of three aspects, each on a scale of 1-5: overall experience, golf course, and cost performance. To extract feature words related to each aspect, we computed the score for each word as the average of all rated scores accompanied by a review comment that included the word. The score is shown as “average” in Figure 2. As a result, the score of a word irrelevant to the aspect was close to the total average score, and a word that captured the characteristic of the aspect had a high or low average score. Figure 2 shows that our method successfully extracts interesting feature words. Note that our method extracts not only positive and negative words but also words that explain the semantic context of the aspect. For example, “Inoue” and “Seiichi,” shown in the Course table, refer to a famous golf course designer who has designed many golf courses in Japan. The negative side of the Course table includes words such as “weed,” “river,” and “sand pit,” from which one learns that a customer’s low

We have demonstrated that data density is important for prediction accuracy. We have developed a novel feature vector using feature words that successfully improves prediction accuracy as measured using RankLoss.

2

5.

CONCLUSIONS

REFERENCES

[1] K. Crammer. Pranking with ranking. NIPS’01, pages 641–647, 2001. [2] E. F. Harrington. Online ranking/collaborative filtering using the perceptron algorithm. ICML’03, pages 250–257, 2003. [3] M. Hu and B. Liu. Mining and summarizing customer reviews. KDD’04, pages 168–177, 2004. [4] J. Kamps, M. Marx, R. J. Mokken, and M. de Rijke. Using wordnet to measure semantic orientations of adjectives. LREC’04, pages 1115–11118, 2004. [5] A. Sashua and A. Levin. Ranking with large margin principle: Two approaches. NIPS’03, pages 937–944, 2002.

http://rit.rakuten.co.jp/rdr/index.html

1206

Rating Prediction using Feature Words Extracted from ...

1-5-1, Chofugaoka, Chofu, Tokyo, Japan1,2,3. {ochi ... Data sparseness reduces prediction accuracy. To ... We found that by successfully reducing data sparse-.

Download PDF

361KB Sizes 2 Downloads 251 Views

Report

Rating Prediction using Feature Words Extracted from ...

Recommend Documents