Professional Documents
Culture Documents
In addition to classifying comments by mood, we also used the Figure 5: Graphing comments in a thread by valence:
Reasoning Through Search system to characterize each Conversations in general start positive and end negative. This
comments valence and intensity (on a scale from -2 to +2). Over is true of the comments on the story in this illustration.
the entire corpus of comments, we found that 54.5% of the
comments were negative, 2.7% were neutral, and 42.8% were
5. DISCUSSION AND FUTURE WORK
positive. The average valence of comments in the corpus was In this paper we have outlined our approach surfacing negative
-0.211. This automated analysis indicates that the comments tend comments in online comment forums. We were impressed with
slightly negative overall. the relative success of the relevance plus sentiment detection
system.
An interesting trend emerged as we examined valence/intensity
changes within individual threads. We found that the most Our approach reveals not just negative and positive valence, but
commonly occurring trend (occurring in 54% of the threads) was offers a richer palette of affect, classifying comments as being
for a thread to start out positive and end negative (as seen in an happy, sad or angry. We believe that introducing this additional
example thread in Figure 5), while only 43% made a negative to nuance offers greater insight into both the overall sentiment felt
positive change and 3% stayed roughly the same throughout the by an aggregate group but also can help us identify inappropriate
course of the thread. This emphasizes our need to detect affect levels (by establishing a level of anger scale). We
negativity, as it seems that these comments bring the discussion conjecture that a post which is off-topic and very angry is a
thread to an end, perhaps creating a boundary to participation. clearly salient for further investigation by community and site
managers. Once conversational and trial posts (e.g., hi) are
As these examples illustrate, the sentiment classification taken into account, we find also that highly off-topic neutral posts
algorithms fared well in classifying the comments. We conducted are often spam. Our technique will also allow us to identify people
a sample test to compare our own performance as human who are consistently posters of angry content across different
classifiers with that of the classification algorithms. We took 20 stories.
comments selected at random from our whole dataset of 168,095
threads and 782,934 comments, and conducted our own manual Given that our mood classification system was trained on
classification of these comments. For each comment, we (each of LiveJournal blog posts, it is not surprising that there were some
the authors) answered two questions 1) Is this comment positive detection errors. In future work we will train our sentiment
or negative? 2) Is this comment happy, sad or angry? We worked classifier with a larger dataset of comment threads taken from the
alone and then compared our classifications with that of the two site we are studying, utilizing this unlabeled data in an
automated sentiment classifiers (the RTS valence and mood Expectation Maximization approach. This will also address the
classification systems). challenge of how well a sentiment detection system can perform
For question #1, our answers agreed for 95% of the documents when trained on short snippets or comments rather than blocks of
(only disagreeing on 1 comment out of 20). When compared to prose.
the RTS classified valence of each comment, each of us agreed We have also begun a hand-coded classification of comments
with the system output for 75% of the comments. For question according to an expanded notion of relevance: on-topic relevance
is where a comment relates to the original text, whereas [4] Chesney, T,. Coyne, I., Logan, B., and Madden. N. Griefing
conversational relevance denotes when a comment refers to a in Virtual Worlds: Causes, Casualties and Coping Strategies,
previous posted comment by another community member. In the Information Systems Journal, 19, 6, 525-548, 2009
latter case, we have found through our analyses that [5] Ekman, P. Emotions Revealed: Recognizing Faces and
conversational relevance can be statements that are directed at Feelings to Improve Communication and Emotional Life,
something someone said, or at the person themselves. Negative, Henry Holt and Company, New York, NY: 2003.
off-topic, conversational relevance, where there is no relevance to
the original text, is usually an insult directed at a personthe [6] Lampe, C. and Resnick, P. (2004) Slash(dot) and Burn:
content refers to a community member themselves (e.g., by name, Distributed Moderation in a Large Online Conversation
using You are) and is combined with known insulting terms. Space. In Proc. CHI 2004, pp 543-550
We are also interested in whether the emotional valence of the [7] Lampe, C. and Johnson, E. (2005) Follow the (Slash)dot:
original text is correlated with the emotional valence of the overall Effects of Feedback on New Members in an Online
comment thread. We are validating our model by hand before Community. In Proc Group05, pp 11-20.
investigating what kinds of linguistic disambiguation will need to
[8] Gazan, R. When Online Communities Become Self-Aware. In
be used in combination with our existing sentiment detection
Proc HICSS09 (HICSS-42), pp 1-10, ACM Press.
model.
[9] Lou, J.K., Chen. K.T. and Lei, C.L. A collusion resistant
In addition, we are intending to use our method to model
automation scheme for social moderation systems. In Proc.
emotional trajectories through comment threads. Specifically we
IEEE Conference on Consumer Communications and
wish to address whether negative comments (crossed by whether
Networking, p 571-575.
they are on or off topic) have an impact on comments that follow:
does negativity beget more negativity? Do conversations go [10] Owsley, S, Sood, S and Hammond, K.J.. Domain specific
south? And if they do, what are the characteristics of a affective classification of documents. In Proceedings of the
conversation that escalates versus one that does not? Through this AAAI Spring Symposium on Computational Analysis of
means we intend to address the issue of whether undesirable Weblogs., pages 181-183, 2006.
behavior does or does not model undesirable behavior in others, [11] Pang, B. and Lee, L.. Seeing stars: exploiting class
and if so what are effective, in thread, remediation strategies relationships for sentiment categorization with respect to
which may be better than simple deletion. We believe these local rating scales. In Proceedings of ACL, pages 115-124, 2005.
strategies over content are the way to effective community
management. With tools that help surface where non socio- [12] Pang, B., Lee, L. and Vaithyanathan, S.. Thumbs up?
normative behaviors are occurring we can support human sentiment classification using machine learning techniques.
community managers work more effectively by automatically In Proceedings of EMNLP, pages 79-86, 2002.
finding and filtering potential violations. [13] Reuters-21578 text categorization test collection
There are of course also open questions, especially when dealing [14] Salton, G. and Buckley, C. (1988). Term-weighting
with people who are regular community visitors. For example, an approaches in automatic text retrieval. Information
open question is about how a single person behaves over time: Processing and Management. 24(5): 513 to 523.
that is, what is the history of an authors sentiment about a topic
[15] Sood, S., Owsley, S. Hammond, K.J. and Birnbaum, L..
over time? What is their authority in the social group, and from
Reasoning Through Search: A Novel Approach to Sentiment
that what is their influence? Clearly some peoples negative
Classification. Northwestern University Tech Report
comments may have more weight than others. Do we see the
Number NWU-EECS-07-05, 2007.
emergence of groups who all share the same sentiment on certain
topics? These questions represent a valuable, but fine-grained and [16] Sood, S..O. and Vasserman, L.. ESSE: Exploring Mood on
socially oriented research program. We believe our approach, to the Web. International Conference on Weblogs and Social
combine relevance, affect/sentiment and descriptions of posting Media Data, 2009.
patterns is a good starting point. [17] Turney, P.D. Thumbs up or thumbs down? Semantic
orientation applied to unsupervised classification of reviews.
6. ACKNOWLEDGMENTS In ACL, pages 417{424, 2002.
We thank our colleagues at Yahoo! for their help with these
analyses. [18] Wans, N., El-Saban, M., Ashour, H. and Ammar W. (2008).
Automatic Scoring of Online Discussion Posts. In
7. REFERENCES WICOW08, pp 19-25. ACM Press.
[1] Aue, A. and M. Gamon. Customizing sentiment classifiers to [19] Weimer, M., Gurevych, I. and Muhlhauser, M. (2007)
new domains: a case study. In Proceedings of RANLP, 2005. Automatically Assessing the Post Quality in Online
[2] Bradley, M.M. and Lang, P.J. Affective norms for English Discussions of Software. In Proc. ACL 2007, Demos and
words (ANEW): Stimuli, instruction manual and affective Posters, pp 125-128. Association for Computational
ratings. Technical Report C-1, Center for Research in Linguistics.
Psychophysiology, University of Florida, Gainesville, [20] Wensel, A. and Sood, S.O. VIBES: Visualizing Changing
Florida, 1999. Emotional States in Personal Stories. ACM MultiMedia
[3] Budzik, J, Hammond, K.J., and Birnbaum, L. Information Workshop on Story Representation, Mechanism and Context,
access in context. Knowledge-Based Systems. 14(1-2): 37-53 2008.
(2001)