Professional Documents
Culture Documents
Corpus (SARC)1 , a large corpus for sar- each provided with author, topic, and contex in-
casm research and for training and evalu- formation, the dataset also exceeds all previous
ating systems for sarcasm detection. The sarcasm corpora by an order of magnitude. This
corpus has 1.3 million sarcastic state- dataset is possible due to the comment structure of
ments 10 times more than any previous the social media site Reddit3 as well its frequently-
dataset and many times more instances used and standardized annotation for sarcasm.
of non-sarcastic statements, allowing for Following a discussion of corpus construction
learning in regimes of both balanced and and relevant statistics, in Section 4 we present re-
unbalanced labels. Each statement is fur- sults of a manual evaluation on a subsample of the
thermore self-annotated sarcasm is la- data as well as a direct comparison with alterna-
beled by the author and not an independent tive sources. Then in Section 5 we examine simple
annotator and provided with user, topic, methods of detecting sarcasm on both a balanced
and conversation context. We evaluate the and unbalanced version of our dataset.
corpus for accuracy, compare it to previ-
ous related corpora, and provide baselines 2 Related Work
for the task of sarcasm detection.
Since our main contribution is a corpus and not a
1 Introduction method for sarcasm detection, we point the reader
to a recent survey by Joshi et al. (2016) that dis-
Sarcasm detection is an important component of cusses many interesting efforts in this area. Note
many natural language processing (NLP) sys- that many of the works the authors mention will be
tems, with direct relevance to natural language discussed by us in this section, with many papers
understanding, dialogue systems, and text min- using their own datasets; this illustrates the need
ing. However, detecting sarcasm is difficult be- for common baselines for evaluation.
cause it occurs infrequently and is difficult for Sarcasm datasets can largely be distinguished
even human annotators to discern (Wallace et al., by the sources used to get sarcastic and non-
2014). Despite these properties, existing datasets sarcastic statements, the amount of human anno-
either have balanced labels data with approxi- tation, and whether the dataset is balanced or un-
mately the same number of examples for each la- balanced. Reddit has been used before, notably by
bel (Gonzalez-Ibanez et al., 2011; Bamman and Wallace et al. (2015); while the authors allow un-
Smith, 2015; Joshi et al., 2015; Amir et al., 2016; abalanced labeling, they do not exploit the possi-
Oraby et al., 2016) or use human annotators bility of using self-annotation and generate around
to label sarcastic statements (Riloff et al., 2013; 10,000 human-labeled sentences. Twitter is a fre-
Swanson et al., 2014; Wallace et al., 2015). quent source due to the self-annotation provided
In this work, we make available the first corpus by hashtags such as #sarcasm, #notsarcasm, and
for sarcasm detection that has both unbalanced and 2
https://www.twitter.com
1 3
https://nlp.cs.princeton.edu/SARC/ https://www.reddit.com
Welcome to /r/Politics! Please read the wiki before participating.
#irony (Reyes et al., 2013; Bamman and Smith,
Bankers celebrate dawn of the Trump era (politico.com)
2015; Joshi et al., 2015). As discussed in Sec- submitted 4 months ago by Boartar
76 comments share save hide give gold
tion 4.2, its low quality language and other proper- sorted by: top
ties make Twitter a less attractive source for anno- [] Quexana 50 points 4 months ago
5 Baselines for Sarcasm Detection speculation is so much more fun than all
those pesky facts , amirite ?
A direct application of our corpus is for train-
ing and evaluating sarcasm detection systems; we yeah , totally nothing wrong with bullying
thus provide baselines for the task of classify- people with the threat of meritless litigation
ing a given statement as sarcastic or not-sarcastic.
We consider non-contextual representations of the Figure 3: Two example sarcastic sentences from
comments as feature vectors to be classified by a SARC-main. Darker shading corresponds to non-
regularized linear support vector machine (SVM) stopwords with a higher SNIF weight.
trained by stochastic gradient descent (SGD). In
the unbalanced regime we use validation to tune method (Arora et al., 2017). Given a word ws
the class-weight, assigning more weight to the sar- frequency fw , SIF-weights assign a weight a+f a
,
w
casm class to force its consideration in the loss. where a is a hyperparameter often set to a = 10 . 4