I wrote a paper this past fall that I hope you will read and enjoy. I welcome feedback.
Tens of thousands of questions are asked and answered every day on social question and answer (Q&A) Web sites such as Yahoo Answers. While these sites generate an enormous volume of searchable data, the problem of determining which questions and answers are archival quality has grown. One major component of this problem is the prevalence of conversational questions, identified both by Q&A sites and academic literature as questions that are intended simply to start discussion. For example, a conversational question such as “do you believe in evolution?” might successfully engage users in discussion, but probably will not yield a useful web page for users searching for information about evolution. Using data from three popular Q&A sites, we confirm that humans can reliably distinguish between these conversational questions and other informational questions, and present evidence that conversational questions typically have much lower potential archival value than informational questions. Further, we explore the use of machine learning techniques to automatically classify questions as conversational or informational, learning in the process about categorical, linguistic, and social differences between different question types. Our algorithms approach human performance, attaining 89.7% classification accuracy in our experiments.
I wrote this paper with Daniel Moy, an undergraduate at the University of Minnesota, and Joe Konstan, my advisor. Also, many of you (my friends and colleagues) helped this research by coding data – thanks!
This paper will be published in the proceedings of CHI 2009, and is currently nominated for a best paper award, which is a real honor. Also of note is the fact that our Q&A paper at last year’s CHI conference was wedged in a session with a medley of completely unrelated work (e.g., public interactive displays). This year, Q&A has it’s own session with some outstanding researchers.