Automatic Digest Generation for Mobile Phone Online Reviews

School of Science and Technology 科技學院
Computing Programmes 電腦學系

Automatic Digest Generation for Mobile Phone Online Reviews

YEUNG Wing Hong


Programme	Bachelor of Science with Honours in Computing
Supervisor	Dr. Andrew Lui
Areas	Text Mining for Intelligent Applications
Year of Completion	2011

Objectives

The aim of the project is to design a system that using opinion mining technique to help buyers reorganize the chaotic information into neat information. Our system is focused on phone reviews and the main objective of the project is to generate a phone summary from available online phone review site. As our framework mentioned, the feature and its corresponding opinion are the most important information people interested in. Hence, the summary will categorize the positive and negative opinions into different feature. Reorganizing the chaotic information, the system allows reader interested exclusively in positive or negative opinions of specific feature to save opinion searching time and clear reader mind on conflicting opinion.

Background and Methodology

With the rapid growth of web, people feel more comfortable with the internet and increasing writing online review. Online reviews become one of the most valuable and easily reached resources for buyer making their buying decision. When a consumer desires to purchase a phone, the consumers will probably go to one of the online review sites, like cNet and amazon, and read the reviews to help their buying decision. To avoid bias, people will prefer read as many review as possible. However, when digesting lots of review, consumer may encounter the problem of information overloading because the information is distributed among the reviews, and the different perspective of the reviewer will induce information misunderstanding problem. The consumer may confuse with the above mentioned chaotic information.

Our system is focus on the opinion mining technique to generate the summary from the reviews. Opinion mining technique is a main research direction in text sentiment analysis (Zhang 2008). This technique refers of deriving opinion information, such as sentiment information, from the sentence. Beside of the opinion mining, our work is related to synonyms grouping, and sentiment classification.

The system first downloads the reviews from cNet. The reviews will undergo pre-processing and split into a sentence level. Then, the feature classifier will help us to extract and classify the feature in the sentence. After that, the opinion classifier will extract the corresponding opinion and the semantic orientation for the sentence is identified. Finally, the summary is produced. The system overview is shown below:

The main task for the feature classifier is feature identification and classification. Concerned with the feature identification, the system will extract the feature as well as the sentence. For example, the sentence “There are two more keys placed under the display” and labeled with feature “display” will be extracted. Regarding the feature classification, the system will group the sentence into different categories. For our application, the feature will be classified into 6 categories. They are general, display and control, camera, sound, connectivity, and battery life. For the general category, it implies the attributes of the phone, such as size, speed, appearance, and etc. For example, the feature “clock” and feature “processor” have the similar meaning in specific domain, so we will classify it together.

The main task for the opinion classifier is to identify and classify the opinion. Concerned with the opinion identification, the system extracts the opinion words of each sentence according to the feature extracted. For example, “large” is the opinion word of the sentence “The handset has a comfortable keypad and a large display”. Regarding the opinion classification, the system groups the sentence into two categories, positive and negative. After the opinion words are extracted, the system will determine its orientation. The semantic orientations of the sentence are highly depending on the opinion words. If most of the opinion words in the sentence are positive, the semantic orientation of that sentence will be positive too.

The system methodology is shown in details below:

In the LSR pattern matching, which is responsible for automatic rule pattern generation, an example is shown below:

When the LSR mining is completed, we will set the threshold of support and confidence for feature words mining by experiment. For our system, if the rule support is larger than 0.02% and the confidence is larger than 60%, it will be a feasible rule. When mining a new review, the keyword list and the rule pattern can be the source to match the sentence and extract the sentence as a candidate segment for further opinion grouping.

After finishing finding the features within the sentence, we need to group the discrete features into category to have a clear summary. It is more meaningful to the buyers if the highly related features are grouped together. WordNet is used to group the synonyms together is one of the methods commonly used. The WordNet ontology which shows the isA relationships among the noun to produce ontology and having 8 path from image to size is detailed below:

In the opinion identification, we assume that when the author is writing down the feature, there must be an opinion words near it (Popescu and Etzioni, 2005). Because of this assumption, we can make use of dependency grammar in the Stanford NLP parser to extract the opinion words. An example is shown below:

In the opinion classification an opinion lexicon is built, we will iteratively searching for their synonyms and antonyms of words in WordNet and group them into same set or opposite set until 3 iteration. Because the polysemy of some words in wordNet, we find that 3 iteration is suitable to produce the lexicon. An example is shown below:

After compiling the opinion lexicon, the system can identify the opinion orientation of each sentence. We will determine the sentence in sentence level. For each sentence, our system will classify it into positive and negative. A positive word is assigned a score of +1 and a negative word is assigned a score of –1. All the score are then summed up. If the sentence with a positive score, the sentence is a positive sentence, otherwise, it is a negative sentence. The summary is shown below:

Evaluation

After the 150 sentence are collected, we will manually read and tag the entire sentence with 6 categories in feature and 2 categories in opinion. To increase reliability, we will evaluate the agreement of annotations using the Cohen's Kappa coefficient. It means that the sentences will be tagged by two people instead of one. The Cohen's Kappa coefficient, the proportion of agreement corrected for chance between two judges assigning cases to a set of k categories, offers as a measure of reliability. It intends to give the reader a quantitative measure of the magnitude of agreement between observers. The calculation is based on the difference between how much agreement is actually present compared to how much agreement would be expected to be present by chance alone. The following two tables illustrate the results for the inter-annotator agreement for the feature classifier and opinion classifier respectively:

A/B	General	D&C	Camera	Sound	Network	Battery	Null
General	22	6	0	0	0	0	4
D&C	8	37	0	0	0	0	2
Camera	1	0	15	1	1	0	1
Sound	1	0	0	16	1	0	5
Network	0	1	0	0	9	0	0
Battery	1	0	0	0	0	8	0
Null	3	6	0	1	2	0	10

A/B	Pos	Neg
Pos	69	8
Neg	9	23

The Cohen's Kappa coefficient for the feature classifier is 0.6587 and for opinion classifier is 0.6205. Both of them are above 0.6. This means that they are substantial agreement and can use it for evaluation. The following table gives the matching result as well as precision and recall for the feature classifier:

Test/standard	General	D&C	Camera	Sound	Network	Battery	Null	Precision
General	10	4	0	0	1	0	2	0.588
D&C	0	17	0	0	0	0	1	0.944
Camera	0	0	13	0	0	0	1	0.928
Sound	0	0	1	18	0	0	0	0.947
Network	0	1	0	0	5	0	0	0.833
Battery	0	0	0	0	0	5	1	0.833
Null	16	11	1	4	1	0	13
Recall	0.385	0.515	0.867	0.818	0.714	1

The average precision is 0.845 and the recall is 0.716. Both of them are acceptable. However, we observed that some specific category, such as category 1 (general) have a relative low recall. The reason of the low recall may be caused by the implicit feature. Implicit feature is some feature may not appear in sentences and it is more difficult to identify than the explicit feature. Implicit feature usually uses the adjective as feature indicator. For example, “The phone is small”. The word “small” indicates the size of the phone. For the category 1, the features are mostly the attribute of phone. It has a higher chance to encounter the implicit feature and make the low recall. The following table gives the matching result as well as the precision and recall result for the opinion classifier:

Test/Standard	Pos	Neg
Pos	64	14	0.821
Neg	9	10	0.526
	0.877	0.417

The average precision is 0.6735 and the average recall is 0.647. Although the result is not excellent, the result is still valid for our application. However, our application hasn't any algorithm to deal with the neutral opinion and objective fact. Thus, if our application classifies the opinion orientation into 3 categories, the result will drop significantly. In summary, the classifiers we built are valid and acceptable for our application. Let us use LG optimus 2x as the example for the qualitative evaluation. We use the LG optimus 2X because it is the first phone which embeds with dual core processor. There are three type of summary and they are radar chart, bar chart sentence summary respectively. These three types of summary is two side of the same coin including radar chart, bar chart, and sentence summary. The following show some of the examples for each one of those:

Conclusion and Future Development

This paper presents an application to solve the two common problems when people digests the reviews, that is information overloading and information misunderstanding. Our project aims to reorganizing the chaotic information to neat information by using data mining and natural language processing methods. The main objective of our project is to produce a phone summary to help buying decision and two sub-objectives which are building a feature classifier and opinion classifier to categorize the chaotic information. Our project combines the existing methods to produce the feature classifier and opinion classifier, and tested that it is valid to produce the summary base on the review from cNet.

In our future work, we will mainly focus on dealing with implicit feature and pronoun resolution. Because the sentence is not always contains feature words, they may present in implicit feature and pronoun. By finding the implicit feature and pronoun resolution, the recall for feature classifier can highly increase. Furthermore, we will further improve our algorithm by using machine learning for text, such as SVM, kNN and naïve Bayesian, and build a neutral opinion lexicon to extract the neutral and objective opinon.