Professional Documents
Culture Documents
t
p(
t
|)
j
p(
j
|)
z
i
(w
i
)(z
i
)
To infer z, and , run Markov Chain Monte Carlo (Gibbs sampling) and,
t
(w) n
tw
+
j
(t) n
jt
+
Juan Gabriel Romero (Universidad Nacional de Colombia) Latent Topic Feedback for Information Retrieval May 31, 2013 6 / 14
Topic representation
First, with k = 10,
W
t
= k argmax
w
t
(w)
label generation (Best topic word)
Description Score
Word probability f
1
(w) = P(w|z = t)
Topic posterior f
2
(w) = P(z = t|w)
PMI f
3
(w) =
W
t
\wPMI (w, w
)
Conditional 1 f
4
(w) =
W
t
\wP(w|w
)
Conditional 2 f
5
(w) =
W
t
\wP(w
|w)
Juan Gabriel Romero (Universidad Nacional de Colombia) Latent Topic Feedback for Information Retrieval May 31, 2013 7 / 14
Topic representation
ngram identication (Turbo Topics)
dD
q
k argmax
t
d
(t)
Related topics:
R =
tE
k argmax
t
/ E
(t, t
)
Filter topics:
PMI (t) =
1
k(k 1)
(w,w
)W
t
PMI (w, w
)
Juan Gabriel Romero (Universidad Nacional de Colombia) Latent Topic Feedback for Information Retrieval May 31, 2013 9 / 14
Query expansion
Add 10 most probable words in the topic W
t
to the query
With [0, 1] as weight parameter.
For N
q
the words in the original query, the weight is
(1)
N
q
The weight for each word from the selected topic, then is
t
(w),
with
t
representing the re-normalized topic-word probability:
t
(w) =
t
(w)
W
t
t
(w
)
Juan Gabriel Romero (Universidad Nacional de Colombia) Latent Topic Feedback for Information Retrieval May 31, 2013 10 / 14
Experiments
Questions:
Can query expansion with latent topic feedback improve the result of
actual queries?
Assuming there are latent topics, will the topic selection described
present them to the user?
If presented with a helpful topic will a user actually select it?
(Outside the scope)
Juan Gabriel Romero (Universidad Nacional de Colombia) Latent Topic Feedback for Information Retrieval May 31, 2013 11 / 14
Experimental setup
Data set from TREC
MALLET
Preparation: Downcasing; removal of numbers, punctuacion marks;
stop words; lter rarely occuring words
Vocabularies between 10,000 and 20,000
Gibbs inference run 1,000 times re-estimating each 25 samples
500 topics
= 0.25
Juan Gabriel Romero (Universidad Nacional de Colombia) Latent Topic Feedback for Information Retrieval May 31, 2013 12 / 14
Results
Mean Average Precision, Normalized Discounted Cumulative Gain,
NDCG15
Juan Gabriel Romero (Universidad Nacional de Colombia) Latent Topic Feedback for Information Retrieval May 31, 2013 13 / 14
Results
For 40% of queries exist a latent topic that can enhance results
For 40% of these queries the approach nds relevant topics
Changes in the technique give worst results