Rating Prediction using Feature Words Extracted from Customer Reviews Masanao Ochi1
Makoto Okabe2
Rikio Onai3
The University of Electro-Communications1,2,3, JST PRESTO2 Our objective is predicting the target customer’ s review rate more accurate.
Changing Feature Vector Restaurant C
Rest. A Rest. B Rest. C
5 c1 It is very delicious
“Too many customers” About 85,000 customers c2 In experimental dataset.
delicious 2 The food is terrible
4 I enjoyed delicious cuisine
Increasing the Data density, reducing the RankLoss (Accuracy is improving). 0.28
“Sparse reviews” 0.5% full in experimental dataset.
3 Not so terrible food
c6
RankLoss-DataDensity 0.29
2.5
3
“Dense data” 30% full in experimental dataset.
3 It is not so terrible
c4 c5
3
terrible
3 dishes looked delicious
c3
4.5
RankLoss
Restaurant B
Restaurant A
“Reducing a dimension” 100 extracted words In experimental dataset.
0.27
0.26
Using the customers’ review rates feature vector (Existing)
0.25
Using the polarized words feature vector (Contribution)
0
We applied words feature vector to the Pranking algorithm and evaluated it on a corpus of Japanese golf course reviews. Our results outperformed the results of the original sparse dataset for all rating aspect.
Experimental Result word average nasty 2.5 cheap 2.6 bad 2.7 asking 2.8 apology 2.8 anger 2.8 finally 2.8 embarrassingly 2.8 ass 2.9 dirty 2.9
Course word average Inoue(name) 4.5 Seiichi(name) 4.5 great 4.4 skillful 4.4 Ishioka(golf course) 4.4 agitate 4.4 motivate 4.3 rarity 4.3 variety 4.3 championship 4.3
word lack nasty cheap weed enforced river sand pit monotone gully patchy
average 3.0 3.0 3.0 3.0 3.0 3.1 3.1 3.1 3.1 3.1
Cost_performance word average killer price 4.6 real cheap 4.6 performance 4.5 better 4.4 cost 4.4 CP 4.4 cheap 4.4 fire-sale 4.3 great 4.3 outclassing 4.3
word average relative expense 3.0 penny-wise 3.1 asking 3.2 terrible 3.2 apology 3.2 embarrassingly 3.3 arrogant 3.3 messy 3.3 ass 3.3 complaint 3.3
Extracted words example
We select only 100 words (50 most positive/negative polarized words) for each aspect using more than 100 times in the review corpus, and we adopt this word lists as feature vector elements for each rating aspect.
Accuracy Measurement
overall
course
1.1
cost_performance
The RankLoss accuracy was improved 15.6% on the average.
1.2 1.15
1.2
1.05
1.1
0.95 0.9 0.85
1.1
RankLoss
1
RankLoss
RankLoss
These words lists show that our method successfully extracts interesting feature words. Our method extracts not only positive and negative words but also words that explain the semantic context of the aspect. For example, “Inoue" and “Seiichi," shown in the Course table, is known as a famous golf course designer who has designed many golf courses in Japan. The negative side of the Course table includes words such as “weed," “river," and “sand pit." Because a customer's low rating is caused by complaints about the conditions of a golf course.
1.3
1.15
T is the number of products, is the t-th predicted output score, and is the t-th desired output score. RankLoss averages the sum of the difference and for each iteration (in this case, each restaurant.)
0.15
The relationship between the data density and the RankLoss (prediction accuracy) using the Book-Crossing Dataset.
The difference of the existing feature vector and our feature vector
Overall word average great 4.6 Rope(golf course) 4.5 praise 4.5 splendid 4.4 mindful 4.4 beautiful 4.4 thoughtful 4.4 perfect 4.4 rich 4.4 thrilling 4.4
0.05 0.1 DataDensity
1 0.9
1.05 base
1
customer
0.95
word_avg
0.9
0.8
0.8
0.75 0.7
0.85 0.8
0.7 1
6
11 iteration
1
16
6
11 iteration
1
16
6
11 iteration
16
Result graphs for each review aspect
Extracting Feature Words
We extract feature words to use the feature vector. We define feature words which have polarized average rates.
1. Score Distribution
2. Extracting Polarized Words "delicious" rate distribution for food aspect
"terrible" rate distribution for food aspect
1200
“5” is distributed all words equally.
2
… monotone taste, and terrible foods
5
… tastes are also better than these other terrible foods
Frequency
“2” is distributed all words equally.
It’ s easy to find negative rates when the word “terrible” was used in review comments on food aspect or positive rates when the word “delicious” was used.
1400
100
Frequency
Food Review 1 … foods are terrible, too.
1600
120
“1” is distributed all words equally.
80 60 40
1000 800 600 400
Distributing a review rate to each word in a review comments, and doing it for all review & rate pairs (about 320,000 reviews).
20
200 0
0 1
2
3
Rate
4
5
1
2
3
Rate
4
5
Rate distribution for each word. If you have any question, please mail me to
[email protected]. If you are recruiting new Ph.D. students, offer to me!
Rating Prediction Background Existing Task 1 (Task A)
Finding Users who have similar preference & Predicting “5”
Tokyo Steakhouse’s all reviews
2/5
5/5
User1
3/5
User2
?/5
User3
Target User
Problem: There are No users who have same buying history!
Many Rating Predictors use other users’ rating rates as the feature vector.
Rating Prediction which informs customers about their evaluation for things they want in advance is very practical task. Users can decide things they want very quickly, because they only check the predicted rate without seeking useful few reviews in the other too many and long reviews. As existing rating predictors basically use other users’ review rates, they have a common problem which the feature vector is too large and sparse. Because all review sites often includes too many things for each customer to buy and evaluate, and the dimension of the feature vector increases as fast as the increase in the number of products. On the other hand, the task which transforms a review comment to review rates each review aspects use words as the feature vector. To approach the problem which the feature vector is too large and sparse on the rating prediction, we propose a new task which integrates both other users’ review rates and comments. We developed a simple method of improving the accuracy of rating prediction and reducing the dimension of the feature vector using feature words extracted from customer reviews.
Existing Task 2 (Task B) The target user’s review on Tokyo Steakhouse
... Everything was perfect! The food is mostly delicious twists on classics. ...
Review for the item
?/5, Price ?/5, Overall ?/5, ...
Problem: No practical use for the rating prediction!
The task which transforms a review comment to a review rate each review aspects use words as the feature vector.
Task C
Review sentences written by other users
Food
Review values written by other users
Target User
Other Users who bought the item
Transform to numerical order from a review sentence
Task A Proposing Task (Task C)
Finding the target user’s favorite words & Predicting “5”
Tokyo Steakhouse’s all reviews
4.5/5
2.8/5
“delicious”
“terrible”
Word1
Word2
Target User
?/5
4.4/5
Review Value
Review Sentence
Task B
“crazy” Word3
The illustration of the difference of each task.
Target User
We propose a new task that predicts a review rate using both other users’ review rates and comments.
We use the rakuten golf review dataset in this experiment. This dataset is provided from Japanese e-commerce company, and all reviews are written in Japanese.
Setting of the Experiment
This review daset have about 320,000 reviews about 1,700 golf courses written by 85,000 customers. Customers rate a golf course to 5 scale for 8 rating aspects. Review comments is written the total of 15.6 million words, and consist of 43,000 vocabulary. We extract feature words using the simple score distribution method. We select 100 feature words that have high and low average scores using more than 100 times in this review dataset, i.e., reducing the dimension of the feature vector from 85,000 to 100. We experiment for 520 customers writing comments for more than 20 golf courses.
The example of a review (It is written in Japanese language and we translate it to English.) Rate 1 2 3 4 5 Avg. Var.
Overall 0.9% 3.2% 26.0% 50.2% 19.7% 3.85 0.64
Staff 1.5% 4.4% 32.7% 42.8% 18.6% 3.73 0.75
Equipment 1.1% 8.3% 44.4% 34.2% 11.9% 3.47 0.72
Food 1.2% 5.9% 44.1% 35.8% 13.1% 3.54 0.70
Course 0.5% 4.1% 34.1% 45.8% 15.4% 3.71 0.63
Cost 0.7% 4.0% 30.6% 37.7% 27.0% 3.86 0.78
Length 2.2% 14.9% 52.3% 24.9% 5.8% 3.17 0.69
Width 1.7% 13.4% 46.9% 29.0% 9.1% 3.30 0.76
The summary of review rates.
If you are interested in my research and want more information, please visit my web site at https://sites.google.com/site/masanaoochi/ !