Big Data Analysis To Features Opinions Extraction of Customer Big Data Analysis To Features Opinions Extraction of Customer

Available online at www.sciencedirect.
com
Available
Available online
online at www.sciencedirect.com
at www.sciencedirect.com
ScienceDirect
Procedia
Procedia Computer
Computer Science
Science 11200 (2017)
(2017) 000–000
906–916
Procedia Computer Science 00 (2017) 000–000 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
International
International Conference
Conference on
on Knowledge
Knowledge Based
Based and
and Intelligent
Intelligent Information
Information and
and Engineering
Engineering
Systems, KES2017, 6-8 September 2017, Marseille, France
Systems, KES2017, 6-8 September 2017, Marseille, France
Big
Big data
data analysis
analysis to
to Features
Features Opinions
Opinions Extraction
Extraction of
of customer
customer
a* a,b
Ammar
Ammar Mars
Marsa*,, Mohamed
Mohamed Salah
Salah Gouider
Gouidera,b
a Université de Tunis, ISG , SMART Lab , 2000, Cité Bouchoucha, Le Bardo, Tunisia
a Université de Tunis, ISG , SMART Lab , 2000, Cité Bouchoucha, Le Bardo, Tunisia
b Université
b Université de Tunis, ESSECT, 1089 , Montfleury ,Tunisia
de Tunis, ESSECT, 1089 , Montfleury ,Tunisia
Abstract
Abstract
Opinion mining refers to extract subjective information from text data using tools such as natural language processing (NLP), text
Opinion mining refers to extract subjective information from text data using tools such as natural language processing (NLP), text
analysis and computational linguistics. Micro-blogging and social network are the most popular Web 2.0 applications , like Twitter
analysis and computational linguistics. Micro-blogging and social network are the most popular Web 2.0 applications , like Twitter
and Facebook which are developed for sharing opinions about different topics. This kind of application becomes a rich data source
and Facebook which are developed for sharing opinions about different topics. This kind of application becomes a rich data source
for opinion mining and sentiment analysis. This information is crucial for managers, who should improve the quality of a product
for opinion mining and sentiment analysis. This information is crucial for managers, who should improve the quality of a product
based on customers’ opinions. Concerning the characteristic of a product as mobile phone, it is particularly difficult to identify the
based on customers’ opinions. Concerning the characteristic of a product as mobile phone, it is particularly difficult to identify the
features being commented on (e.g., camera quality, battery life, price, etc).
features being commented on (e.g., camera quality, battery life, price, etc).
In our work, we present a new method that able to extract product features opinions of customer from social networks using text
In our work, we present a new method that able to extract product features opinions of customer from social networks using text
analysis techniques. This task identifies customers opinions regarding product features. We develop a system for retrieving tweets
analysis techniques. This task identifies customers opinions regarding product features. We develop a system for retrieving tweets
about a product from Twitter and detect product features opinions and their polarity. To validate the effectiveness of this approach
about a product from Twitter and detect product features opinions and their polarity. To validate the effectiveness of this approach
, we used a dataset published by Bing Lius group in our approach experimentation. This dataset contains many notated customer
, we used a dataset published by Bing Lius group in our approach experimentation. This dataset contains many notated customer
reviews of five products such as Canon G3 and Nokia 6610. Next, we test this method with tweets retrieved from Twitter about
reviews of five products such as Canon G3 and Nokia 6610. Next, we test this method with tweets retrieved from Twitter about
Nokia,Samsung and Iphone features products.
Nokia,Samsung and Iphone features products.
c 2017 The Authors. Published by Elsevier B.V.

©c 2017
2017 The
TheAuthors.
Authors.Published
Publishedby byElsevier
ElsevierB.V.
B.V.
Peer-review under responsibility of KES International.
International
Keywords: Big data; Sentiment Analysis; Opinion mining ; Feature mining ;
Keywords: Big data; Sentiment Analysis; Opinion mining ; Feature mining ;
1. Introduction
1. Introduction
Big data mining is considered as the core component of the modern decision support systems related to social
Big data mining is considered as the core component of the modern decision support systems related to social
networks and other data sources. ”What other people think” has been an important piece of information decision-
networks and other data sources. ”What other people think” has been an important piece of information decision-
making processes. People are accustomed to frequently give their opinions about a subject available in different social
making processes. People are accustomed to frequently give their opinions about a subject available in different social
media like Facebook and Twitter. As well as other Web sources such as: product reviews, forums, discussion groups,
media like Facebook and Twitter. As well as other Web sources such as: product reviews, forums, discussion groups,
and blogs.
and blogs.
It is important for the decision maker of a company to study the opinion of customers about a product. However,
It is important for the decision maker of a company to study the opinion of customers about a product. However,
due to the large amount of information collected from these data sources, it is essentially impossible for a supplier or
due to the large amount of information collected from these data sources, it is essentially impossible for a supplier or
∗ Corresponding author.
∗ Corresponding author.
E-mail address: ammar.mars@gmail.com
E-mail address: ammar.mars@gmail.com
1877-0509 c 2017 The Authors. Published by Elsevier B.V.

1877-0509 c 2017 The Authors. Published by Elsevier B.V.
Peer-review©under
1877-0509 2017responsibility
The Authors. Published
of KES by Elsevier B.V.
International.
Peer-review under responsibility of KES International
10.1016/j.procs.2017.08.114
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916 907
decision maker to read all this data collected and make a decision. For this reason, the ”big data opinion mining” has
become an important issue for research in Web information. It aims to extract user opinions from a huge volume of
data texts, automatically or semi-automatically.
It is a challenge for us to provide effective information extraction systems able to process a large mass of data and
extract customer opinions from data available on the Web. It is important to study users’ opinions about a merchandise,
a product or a brand. e.g. studying users’ opinions about a Samsung brand.
It is necessary to develop a system that helps to exploit users opinions from different data sources like open data.
In this work, we aim to propose a big data architecture for analyzing data and extracting customers opinions about
product features using a lot of search domain such as Machine Learning(ML), Natural Processing Language(NLP)
and big data. This architecture is used to make decision. We consider Twitter as a data source: It is a microblogging
service addressed in our work, a communication means and a collaboration system that allows users to share short
text messages, which do not exceed 140 characters with a defined group of users called followers.
In this work, we aim to design a new approach able to identify features costumers opinions and their polarity about
a specific product from Twitter. We present the related work in section 1. In section 2, we discuss our proposed
approach for features opinions extraction from social networks. Next, We present the found experimentation and
results founded in section 3. Finally, we give some conclusions in section 4.
2. Related Works
Feature Extraction (FE) is a delicate task in sentiment analysis .In this work, we will propose big data framework
able to extract features opinions of customer about a product from opened data.In this work we use three main concepts
:”Opinion Mining” , ”Big Data” and ”Opinion feature mining”. Several researchers have worked and many work has
been published about these concepts . We present in this section the related work.
2.1. Big Data
To face the explosion data volume , a new field of technology was born that called ”Big Data” 7 . It was iInvented
by the Wweb giants like Google, Facebook and Twitter with solutions designed to provide a real-time access to large
volumes of data., We can mention that Google processes hundreds of pPetabytes (PB), Facebook generates data of
more than 10 petabytes per month, and Twitter generates 10 terabytes daily.
It is used to describe a large mass of data. The difference between traditional data and this enormous mass of data is
that the latter contains structured and unstructured data. Big data give us an opportunity and a challenge ,at the same
time, to process a very large mass of data which are unstructured. Also, these data could not be acquired, managed
and treated by traditional information technology.
The concept of big data has been characterized by ”3V”: volume, velocity and variety 8 . The value chain of Big data
can be generally divided into four phases: the generation, acquisition, storage and data analysis 9 .Data analysis is the
latest and most important phase in the chain of Big Data 8 . It is used to extract useful to provide fatherly suggestions
or decisions. The challenge of this phase is to quickly extract information from massive data. There are several treat-
ment methods, and the most used method is the ”Parallel Computing”.
We cite MapReduce 25 and Dryad 26 as parallel programming model.MapReduce 25 is a simple but powerful program-
ming model for large-scale computing using a large number of clusters of commercial PCs to achieve automatic
parallel processing and distribution. In MapReduce, the computational workload are caused by inputting key-value
pair sets and generating key-value pair sets. The computing model only has two functions, i.e., Map and Reduce, both
of which are programmed by users.
2.2. Opinion Mining
An important part of our information-gathering behavior is always to find out ”what other people think ?”. With
the emergence of Web 2.0, a lot of opinion resources are available such as online review sites and personal blogs.
They are a new opportunities and challenges to use information technologies to seek out and understand the opinions
908 Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
of costumers.
Sentiment analysis , also known as opinion mining, is the process aiming to determine whether the polarity of a textual
corpus (document, sentence, paragraph etc.) tends to be positive, negative or neutral.
Binali et al 11 defined a sentiment classification that determine the subjectivity, polarity (positive/negative) and polarity
strength (weakly positive, mildly positive or strong positive) of an opinion text.It was a sub discipline of computational
linguistics . Some literature work uses the ontology concept 5 10 .Zhou et al 10 proposed an ontology-supported polarity
mining (OSPM) approach that aimed to enhance the opinion mining process with ontology by providing detailed
topic-specific information. Kontopoulos et al put forward a method to analyse sentiment based on an ontology. The
paper used the reviews of users from Amazon web site and E-bay 16 . Kim 17 presented a system that, given a topic,
automatically would finds the people who hold opinions about that topic and the sentiment of each opinion. This
system contains a module for determining word sentiments from a sentence utilizing various classification models.
In 13 12 the authors have suggested a classifier based approach for discovering opinions so as improve the marketing
strategies of firms.In addition,other work proposed by Cardie et al 18 : It was an approach with multi-perspective
question answering, which viewed the task as one of opinion-oriented information extraction.
Tong et al 14 presented an approach able to extract sentiments from online discussions about movies by tracking and
displaying the number of positive and negative messages using a timeline. In the same field, Pang et al examined
movie reviews utilizing supervised learning methods, and they concluded that the unsupervised learning techniques
outperformed the supervised learning techniques. Other work using Twitter as data source for the task of sentiment
analysis 3 ,it perform linguistic analysis of the collected corpus.
2.3. Mining Opinion feature
Feature Extraction (FE) is a delicate task in sentiment analysis. Agrawal 15 presented a method to examine senti-
ment analysis on Twitter data.It proposed a POS-tagging specific to analyze the role of linguistic features. To extract
product features opinions of product , Hu et al have presented a new method 19 . A system named OPINE proposed by
Popescu et al, mined customer reviews in order to extract an opinions list of customer about features of a particular
product 20 . In the same field , a work published by Siqueira and al.It present a feature extraction process from opinions
on services using WhatMatter system 21 .
Eirinaki et al put forward an algorithm that analyze the sentiment of a document/review and also identifies the se-
mantic orientation of specific components. The algorithm is integrated in an opinion search engine which presents a
summary of the sentiments of the most important features 22 . Yaakub et al have also proposed an architecture that uses
a multidimensional model to integrate customers characteristics and their comments about products. This approach
identifies the entities and then sentiments present in the customers reviews are transformed into an attribute table by
using a seven point polarity system (-3 to 3). After that a model is then produced on the basis of customers’ comments
for an entity and its features 23 . Other work to mention 6 ,It is named Red Opal system ,which aims to identify the best
evaluated products with respect to a set of automatically extracted features.This work consider a word as a feature
when the probability of finding it in texts containing opinions is very different from the probability of finding it in any
other kind of text.Many work based on ontology , we can cite 5 , It present a new method employing a hierarchical do-
main ontology . The proposed use a text classification algorithms and feature selection algorithms to extract features.
Finally , Htay et al 2 proposed a new method to find opinion words or phrases for each feature from customer reviews
in an efficient way. In financial field , Salas et al propose a new sentiment analysis method for feature and financial
news polarity classification.
3. Contribution
The sentiment Analysis is composed by four tasks: opinion identification, feature extraction (identifying the aspects
being commented on - e.g., the price of notebook), sentiment classification (the opinion polarity is positive, negative
or neutral), and result visualization and summarization 21 .
We notice that researchers tend to study the features opinions extraction. Knowing that , we treat the Web data. It is
huger, more variable in space and time, heterogeneous as they come from different sources.
In this work, we aim to extracting features opinions of customer about a product in social networks. We present a new
and effective approach for extracting product features opinions from enormous data using the combine of big data and
Text mining technologies.
Our approach composed essentially of four phases:
1. Product presentation and lexical ontology : This step is also formed by two small steps. The first is to design
a tree of positive and negative words, and the other is to design an ontology representing the product.
2. Data collection and storage : It aims to collect and integrate data published by customers in the web and more
precisely from Twitter . It is a microblogging platforms .
3. Feature opinion mining: In this step , we analyze data collected using Big data and text mining techniques for
customers features opinions extraction.
4. Result visualization and summarization : we present a panel representing the various characteristics of a
product as well as customer opinions.
3.1. Product presentation and lexical ontology
3.1.1. Product presentation

The customers usually write comments by using product features, for example: the battery life is short, screen
colors are bright, phone has a FM radio, and the accessories are very bad. The People buy mobiles according to
their ease of use and according to the mobile features. We have thought to create manually an ontology that presents
a product . We cite the ontology proposed by Yaakub et al 23 which covered features and characteristics of mobile
phones in general and in other technical terms.
A product is presented by ontology of concepts and sub-concepts as following and an example of an ontology that
present the mobile features:
It is in essence a simple taxonomy of concepts and attributes. The customer can use in her expression the word
”display” or ”monitor” instead of ”screen”. The ontology is enriched with synonyms and hyponyms of the attributes.
3.1.2. Lexical ontology

Minqing Hu and Bing Liu 19 presented a list of negative and positive word used by the customer to express their
opinion 19 . In this work, we use a lexical ontologie which is built using this list of words. We can cite some examples
of words in the table below:
Words Polarity
Afraid, aggressive, agony, angry, batty, bad, bait, boil, buggy ... Negative
Accommodative, accomplish, achievement, admire, affable, alluring, amazed, amusing ... Positive
We propose this ontology in Fig.1. to present the negative and positive word to use in next step. We used this
ontology in a previous work 4
3.2. Data collection and storage
Big Data is handled at different stages namely, collection, storing, organizing, analyzing etc. with a motive of
deciphering useful insights useful to make business decisions.This stage involves the collection of data from several
Fig. 1. Words Opinions Ontology
types of data sources.

In this step, we aim to conceive a model able to store the collected data. We have a program that collects and stocks
Tweets from Twitter in our proposed structure. In the world of Web applications, like Faceboo or Twitter, the data are
not described in the same way. In fact it would lead to store data with zero values. Thus the use of relational databases
is not always appropriate.
This step is the first step of a big data mining process. Specifically, it is large-scale, diverse and complex integrated
datasets that coming from different data sources. Such data sources include flat Files, databases, streams or other
available data sources.
This operation also called Data Acquisition. During this acquisition, once the raw data is collected, an efficient trans-
formation mechanism should be used to store it in storage system to be used in the analytic process.
3.3. Feature opinion mining
Generally, Data analysis research can be classified into six fields:a structured data analysis, a text data analysis, a
website data analysis, a multimedia data analysis,a network data analysis, and a mobile data analysis.
This phase is the most important phase of big data process. As we mentioned that this approach aims to extract feature
from tweets collected. We need to use the text analysis in this step because the format of tweets stored is text. Text
analysis, also named text mining, is a process to extract useful information and knowledge from unstructured data.
Most text mining systems are based on text expressions and natural language processing (NLP). The NLP can enable
computers to analyze, interpret, and even generate text. Some common NLP methods are: lexical acquisition, word
sense disambiguation, part of-speech tagging, and probabilistic context free grammar 24 .
Some NLP-based technologies are applied to text mining, including information extraction, topic models, text summa-
rization, classification, clustering, question answering, and opinion mining. Information mining should automatically
extract specific structured information from texts. We illustrate this phase in this figure below in Fig. 2
we will detail following this task :
3.3.1. feature opinions extraction based on MapReduce

MapReduce 25 26 is a programming model designed for processing large volumes of data in parallel by dividing the
work into a set of independent tasks. MapReduce programs are written in a particular style influenced by functional
programming constructs, specifically idioms for processing lists of data. MapReduce can take advantage of locality
of data, processing it on or near the storage assets in order to reduce the distance over which it must be transmitted.
In fact MapReduce is divided to two tasks: The first is colled ”MAP” that take a series of key/value pairs, processes
each, and generates zero or more output key/value pairs.The second colled ”REDUCE” that Reduce function once for
each unique key in the sorted order. The Reduce can iterate through the values that are associated with that key and
produce zero or more outputs.
In our work, we adapt the MapReduce programming model to extract the feature opinion polarity from a huge amount
of data. We illustrate algorithm tasks in fig.3.
Fig. 2. Data Analysis and features extraction
Fig. 3. MapReduce for feature opinion polarity extraction
Our system aims to find the features and detect their polarity (negative or positive ) of tweets collected from Twitter
about a product.
In the first task (MAP) our system iterate the set of tweets and determines the feature and their its polarity of each
tweet. We illustrate this task in the algorithm bellow:
Algorithm 1 MAP Task

1: Function Map (String name, String file):
2: // name: file name
3: // File : HDFS file
4: for each Tweet in file do
5: P=Polarity (tweet)
6: F=Feature (tweet)
7: emit (tweet, F , P)
8: end for
Conventionally the polarity of a Tweet is defined as follows:



 1 if it is a positive opinion

Polarity(T ) = 
 −1 if it is a negative opinion

 0 if it is a neutral opinion
For the next step (REDUCE), our system groups the tweets according to their polarity and their feature .
Algorithm 2 Reduce Task

Function Reduce ( int Polarity , Iterator TweetList):
2: // Polarity: Opinion Polarity +,- or Neutral
// TweetList : List of tweets
4: for each feature F in TweetList do
Sum+=1
6: emit (F , Polarity, Sum )
end for
3.3.2. POS-tagging of Tweet

Before discussing the application of part-of-speech tagging from natural language processing, we first give some
example sentences.
Example 1
Input The battery life is short

Output The/DT battery/NN life/NN is/VBP short/JJ
Example 2
Input screen colors are bright

In corpus linguistics, part-
Output the/DT screen/NN colors/NNS are/VBP bright/JJ
of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is
the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both
its definition, as well as its context i.e. the relationship with adjacent and related words in a phrase, sentence, or
paragraph.
3.3.3. Feature Detection

Using the ontology of product presentation defined previously, the Feature Detection task able to determine the
feature identified in a tweet. For each noun found in a tweet, our systems try to verify whether it is a product features
from the previously ontology constructed . We present bellow the method of polarity determination.
Algorithm 3 Feature Detection

1: Function Polarity ( String Tweet):
2: for each W AS [noun] in Tweet Do TweetList do
3: if W Exist in ProductOntologie then
4: Return Feature
5: else
6: Return null
7: end if
8: end for
3.3.4. Polarity identification

Using the ontology defined previously, the POLARITY task able to determine the polarity of a tweet. We illustrate
this method in algorithm 4. It is a method used by Map task to identify if a tweet is a negative or positive opinion.For
each word (verb, adverb or adjective) found in a tweet, our systems try to verify if its a negative/positive words from
the ontology constructed previously. We present bellow the method of polarity determination.
Algorithm 4 Polarity identification

1: Function Polarity ( String Tweet):
2: for each W AS [verb/adjective/adverb] in Tweet Do TweetList do
3: if W Exist in NegativeOntology then
4: Return -1
5: else if W Exist in PositiveOntology then
6: Return 1
7: else
8: Return 0
9: end if
10: end for
3.4. Result visualization and summarization
Result presentation is the final and most important phase of the value chain of big data. Big data opinion mining
can provide useful values and statics for decision making. It is a summarisation and result of data processing. In our
work ,It is summary representation of the features product opinions collected from Twitter.
4. Experimentation
We extract data from platform of microblogging Twitter. To validate our approach described above.
4.1. Experimental data and result
We conceived a tree of negative and positive lexicon from a list of words published by Minqing Hu and Bing
Liu 19 27 . We uses three datasets ,from the ones available on the web ,to validate our approach in the first step. Then
we are collected a set of tweets from twitter and tested their with our approach proposed.
4.1.1. Minqing Hu and Bing Liu, 2004

This dataset is a subset of the opinin mining datasets released by Minqing Hu and Bing Liu from University of
Illinois at Chicago. Their dataset is available in this web site 1 . This subset contains annotated customer reviews of
fives products:
• Canon G3
• Nikon coolpix 4300
• Nokia 6610
• Creative Labs Nomad Jukebox Zen Xtra 40GB
• Apex AD2600 Progressive-scan DVD player
The symbols used in the annotated reviews:[+n] is Positive opinion, n is the opinion strength: 3 strongest, and 1
weakest. Note that the strength is quite subjective. [-n]: Negative opinion . This is an example of a review in Fig.4.
We measured the effectiveness of our system using this Dataset. The followings table illustrate the experimentation
results (Fig.5):
4.1.2. Tweets collected from Twitter

To retrieve tweets from twitter, you must have a subject determined by a hash-tag. For this case, we used some
hash-tag to find tweets about some products from twitter; we chose the name of products to find and retrieve tweets
1 http://www.cs.uic.edu/ liub/FBS/sentiment-analysis.html
Product Feature Positive Negative Not classified

Size 70% 21% 9%
battery 60% 21% 19%
Nokia 6610
Weight 88% 0% 22%
Screen 55% 39% 6%
Picture 49% 23% 28%
Canon G3 Photo 64% 34% 2%
Battery 41% 59% 0%
weight 95% 0% 5%
Nikon coolpix 4300
Size 77% 13% 10%
Fig. 4. Review example of dataset
Fig. 5. Experimentation results of Minqing Hu and Bing Liu dataset
about these products : Samsung and Nokia. We have collected 100000 tweets around this . This next is table give the
data analysis results:
Product Feature Positive Negative Cost(s)

Size 53% 12%
battery 90% 4%
Nokia 27
Weight 63% 22%
Screen 66% 24%
Size 79% 5%
battery 73% 10%
Samsung 29
Weight 80% 9%
Screen 33% 40%
Size 92% 3%
battery 89% 5%
Iphone 26
Weight 53% 33%
Screen 93% 5%
Size 44% 28%
Weight 50% 22%
HTC 26
Screen 59% 13%
battery 73% 10%
Weight 61% 9%
Blackberry 24
Screen 33% 40%
Fig. 6. Experimentation results of tweets collected
4.2. Tools
Hadoop : Hadoop is an open source Apache project that aims to provide ”reliable and scalable distributed com-
puting” . The Hadoop package aims to provide all the tools one could need for large scale computing: Fault Tolerant
Distributed Storage in the form of the HDFS.
Twitter4j: Twitter4J is a Twitter API binding library for the Java language licensed under Apache License 2.0 .It is
an open source library .It is able to integrate the Twitter service on a program.
5. Conclusion
In this work , we have proposed a framework for extracting features of customer opinions using Big Data tech-
nologies combined with machine learning and text mining tools. First,we have proposed an algorithm of opinion
polarity determination based on MapReduce. Next, we test and validate the effectiveness of this framework with
datasets available in the web in first time and next we use data cames from platform of microblogging Twitter. Our
experimental results show that it is necessary to test this system with other benchmarks. Another line of research will
be to treat tweet not classified. It is also very complex operation.
References
1. M. d. P. Salas-Zárate, R. Valencia-Garcı́a, A. Ruiz-Martı́nez, and R. Colomo-Palacios, “Feature-based opinion mining in financial news: an
ontology-driven approach,” Journal of Information Science, p. 0165551516645528, 2016.
2. S. S. Htay and K. T. Lynn, “Extracting product features and opinion words using pattern knowledge in customer reviews,” The Scientific World
Journal, vol. 2013, 2013.
3. X. Ding, B. Liu, and P. S. Yu, “A holistic lexicon-based approach to opinion mining,” in Proceedings of the 2008 international conference on
web search and data mining. ACM, 2008, pp. 231–240.
4. A. Mars, M. S. Gouider, and L. B. Saı̈d, “A new big data framework for customer opinions polarity extraction,” in International Conference:
Beyond Databases, Architectures and Structures. Springer, 2015, pp. 518–531.
5. B. B. Wang, R. I. Mckay, H. A. Abbass, and M. Barlow, “A comparative study for domain ontology guided feature extraction,” in Proceedings
of the 26th Australasian computer science conference-Volume 16. Australian Computer Society, Inc., 2003, pp. 69–78.
6. C. Scaffidi, K. Bierhoff, E. Chang, M. Felker, H. Ng, and C. Jin, “Red opal: product-feature scoring from reviews,” in Proceedings of the 8th
ACM conference on Electronic commerce. ACM, 2007, pp. 182–191.
7. C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, and T. Rabl, “Big data benchmarking,” in Proceedings of the 2012 Workshop on Management
of Big Data Systems, ser. MBDS ’12. New York, NY, USA: ACM, 2012, pp. 39–40. [Online].
8. H. F. M. Gali Halevi, “Special issue on big data,” Research Trends, 2012.
9. Y. L. Min Chen, Shiwen Mao, “Big data: A survey,” Springer Science+Business Media, 2014.
10. L. Zhou and P. Chaovalit, “Ontology-supported polarity mining,” Journal of the American Society for Information Science and technology,
vol. 59, no. 1, pp. 98–110, 2008.
11. H. Binali, V. Potdar, and C. Wu, “A state of the art opinion mining and its application domains,” in Industrial Technology, 2009. ICIT 2009.
IEEE International Conference on. IEEE, 2009, pp. 1–6.
12. K. Dave, S. Lawrence, and D. M. Pennock, “Mining the peanut gallery: Opinion extraction and semantic classification of product reviews,” in
Proceedings of the 12th international conference on World Wide Web. ACM, 2003, pp. 519–528.
13. L.-W. Ku, L.-Y. Lee, T.-H. Wu, and H.-H. Chen, “Major topic detection and its application to opinion summarization,” in Proceedings of the
28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005, pp. 627–628.
14. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning techniques,” in Proceedings of the
ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002, pp.
79–86.
15. A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau, “Sentiment analysis of twitter data,” in Proceedings of the Workshop on
Languages in Social Media. Association for Computational Linguistics, 2011, pp. 30–38.
16. E. Kontopoulos, C. Berberidis, T. Dergiades, and N. Bassiliades, “Ontology-based sentiment analysis of twitter posts,” Expert systems with
applications, vol. 40, no. 10, pp. 4065–4074, 2013.
17. S.-M. Kim and E. Hovy, “Determining the sentiment of opinions,” in Proceedings of the 20th international conference on Computational
Linguistics. Association for Computational Linguistics, 2004, p. 1367.
18. C. Cardie, J. Wiebe, T. Wilson, and D. J. Litman, “Combining low-level and summary representations of opinions for multi-perspective
question answering.” in New directions in question answering, 2003, pp. 20–27.
19. M. Hu and B. Liu, “Mining opinion features in customer reviews,” in AAAI, vol. 4, no. 4, 2004, pp. 755–760.
20. A.-M. Popescu and O. Etzioni, “Extracting product features and opinions from reviews,” in Natural language processing and text mining.
Springer, 2007, pp. 9–28.
21. H. Siqueira and F. Barros, “A feature extraction process for sentiment analysis of opinions on services,” in Proceedings of International
Workshop on Web and Text Intelligence, 2010, pp. 404–413.
22. M. Eirinaki, S. Pisal, and J. Singh, “Feature-based opinion mining and ranking,” Journal of Computer and System Sciences, vol. 78, no. 4, pp.
1175–1184, 2012.
23. M. R. Yaakub, Y. Li, A. Algarni, and B. Peng, “Integration of opinion into customer analysis model,” in Proceedings of the The 2012
IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology-Volume 03. IEEE Computer Society,
2012, pp. 164–168.
24. W. Paik, S. Yilmazel, E. Brown, M. Poulin, S. Dubon, and C. Amice, “Applying natural language processing (nlp) based metadata extraction to
automatically acquire user preferences,” in Proceedings of the 1st international conference on Knowledge capture. ACM, 2001, pp. 116–122.
25. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113,
2008.
26. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in ACM
SIGOPS Operating Systems Review, vol. 41, no. 3. ACM, 2007, pp. 59–72.
27. B. Liu, M. Hu, and J. Cheng, “Opinion observer: analyzing and comparing opinions on the web,” in Proceedings of the 14th international
conference on World Wide Web. ACM, 2005, pp. 342–351.

Big Data Analysis To Features Opinions Extraction of Customer Big Data Analysis To Features Opinions Extraction of Customer

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Analysis To Features Opinions Extraction of Customer Big Data Analysis To Features Opinions Extraction of Customer

Uploaded by

Copyright:

Available Formats

Available online at www.sciencedirect.

1877-0509 c 2017 The Authors. Published by Elsevier B.V.

2.1. Big Data

2.2. Opinion Mining

2.3. Mining Opinion feature

3.1. Product presentation and lexical ontology

3.1.1. Product presentation

3.1.2. Lexical ontology

3.2. Data collection and storage

Fig. 1. Words Opinions Ontology

types of data sources.

3.3. Feature opinion mining

3.3.1. feature opinions extraction based on MapReduce

Fig. 2. Data Analysis and features extraction

Fig. 3. MapReduce for feature opinion polarity extraction

Algorithm 1 MAP Task

Conventionally the polarity of a Tweet is defined as follows:

Algorithm 2 Reduce Task

3.3.2. POS-tagging of Tweet

Input The battery life is short

Input screen colors are bright

3.3.3. Feature Detection

Algorithm 3 Feature Detection

3.3.4. Polarity identification

Algorithm 4 Polarity identification

3.4. Result visualization and summarization

4.1. Experimental data and result

4.1.1. Minqing Hu and Bing Liu, 2004

4.1.2. Tweets collected from Twitter

Product Feature Positive Negative Not classified

Fig. 5. Experimentation results of Minqing Hu and Bing Liu dataset

Product Feature Positive Negative Cost(s)

You might also like

1877-0509 c 2017 The Authors. Published by Elsevier B.V.