I Have Subjected Myself to the Mercy of Pagerank

Last week, I wrote an introduction to the talk I’m going to give at CHI. It’s about the creation of archival quality knowledge in the world through social discourse in Q&A sites.

The intro that I wrote depends on Google search results. Before an interaction, there were no good search results; after the interaction the results had greatly improved. In particular, I’m using an example from Ask Metafilter where I wish to learn about the search string fix hydraulic chair. When I wrote the intro, Google search results looked like this (the relevant link appears third in the list):

Original Google search results, from last week

Original Google search results, from last week. The third hit (indented) is the link of interest for the purposes of my talk.

Today, Google doesn’t have any results for the queries fix hydraulic chair, fix hydraulic chair site:metafilter.com or even fix hydraulic chair pneumatic cylinder site:metafilter.com.  Where have you gone, my precious example? (I note that Yahoo lists it as the top hit, but nobody I know uses Yahoo search.)

I think I’m going to persist with my example – it’s too much work to revisit the entire intro simply because some quirk of PageRank has determined that this post is in fact not worth indexing. Perhaps my link in this post will restore it to its former glory?

Facts or Friends? at CHI 2009

I wrote a paper this past fall that I hope you will read and enjoy.  I welcome feedback.

Facts or Friends? Distinguishing Informational and Conversational Questions in Social Q&A Sites (pdf)

The abstract:

Tens of thousands of questions are asked and answered every day on social question and answer (Q&A) Web sites such as Yahoo Answers. While these sites generate an enormous volume of searchable data, the problem of determining which questions and answers are archival quality has grown. One major component of this problem is the prevalence of conversational questions, identified both by Q&A sites and academic literature as questions that are intended simply to start discussion. For example, a conversational question such as “do you believe in evolution?” might successfully engage users in discussion, but probably will not yield a useful web page for users searching for information about evolution. Using data from three popular Q&A sites, we confirm that humans can reliably distinguish between these conversational questions and other informational questions, and present evidence that conversational questions typically have much lower potential archival value than informational questions. Further, we explore the use of machine learning techniques to automatically classify questions as conversational or informational, learning in the process about categorical, linguistic, and social differences between different question types. Our algorithms approach human performance, attaining 89.7% classification accuracy in our experiments.

I wrote this paper with Daniel Moy, an undergraduate at the University of Minnesota, and Joe Konstan, my advisor.  Also, many of you (my friends and colleagues) helped this research by coding data – thanks!

This paper will be published in the proceedings of CHI 2009, and is currently nominated for a best paper award, which is a real honor.  Also of note is the fact that our Q&A paper at last year’s CHI conference was wedged in a session with a medley of completely unrelated work (e.g., public interactive displays).  This year, Q&A has it’s own session with some outstanding researchers.

Voting Systems Research

I just saw a piece over at Stack overflow’s blog, called Vote Fraud and You – Blog – Stack Overflow.  Stack overflow is a Q&A site geared towards programming.  I think it’s a high-quality site, with many wiki-like features.  It’s designed more for the creation of archival quality information than for ephemeral Q&A (see Yahoo Answers).

This post reminds me that voting systems are still poorly understood, and a rich research area.  For example, the article discusses the development of technology that automatically detects “revenge voting patterns”.  Very cool.  However, I’m sad that most of this work appears to be happing (for the time being) in industry, where it’s harder for the rest of us to learn from these innovations.

Researchers: let’s see more work on voting systems!  If I build a points system into my community site, how should it work?

Don’t Use Endnote

Do you use Endnote for your references?  I appeal to you to switch to a different system.  Endnote, (owned by Thompson-Reuters), is suing the Academic researchers at George Mason University who wrote Zotero, a free and open source bibliography manager plugin for firefox.

The act of a large corporation suing academics who are creating high quality free software is just plain evil.  Don’t use Endnote.  Tell your friends to stop using Endnote.

Proactive Displays

Interactive public displays are cool.  I spent last summer (2007) building software to support a network of touchscreen computers that could sense who was nearby (using bluetooth phone proximity) and react by displaying a collage of social media.  Someone made a video of an early prototype:

The three of us wrote a paper about our experiences, called “The Context, Content & Community Collage: Sharing Personal Digital Media in the Physical Workplace”.  Joe McCarthy (my summer mentor) will be presenting this work at CSCW in 2008.  Read the paper if you’re interested!

My Experience with KDE 4.1: Not Worth It Yet

I recently bought a new Dell Laptop (an M1530).  The first task was reinstalling.  I went with Kubuntu + KDE 4.1, having just read some buzz (I think from Slashdot).  Long story short: I’m back to Gnome.  I write this post to counterbalance some (what I interpret as) undeserved hype.

First, the good.  KDE 4.1 is pretty cool/professional looking.  It has a very intuitive menu interface, and nice integration with desktop widgets.  KDE in general has a nice design philosophy.  Some KDE features (e.g., the wireless connection applet) are better than their Gnome (GDM) counterparts.

However, I had a bunch of problems:

  • A bunch of stuff just didn’t work right out of the box.  Important things such as suspend/hibernate and sound.  Nvidia drivers screwed up the fonts (a common problem), and I had to fix manually.  When I log into GDM, all these problems go away by default!
  • The window manager constantly consumed 20-30% of my fast/new dual-core CPU until I tweaked some of the defaults.  It’s not capable of using 3D graphics, so everything’s sluggish.  One could enable Beryl effects, but the integration between the KDE window manger and Beryl is terrible and buggy.  GDM has good Beryl support and integration.
  • Tons of bugs.  This will probably improve.  However, I found a whole lot of problems in two days of futzing (note: nothing that crashed the system).  Some features (such as the “download more widgets from the web”) appeared to be non-implemented stubs!
  • Important features were missing.  E.g., there was no applet to monitor system load.  C’mon, windows 2000 had this.  There was ksysmon, but that is a full window, and consumed 10-20% CPU.

Some of this is especially worrisome.  One of the tenets of Linux windowing systems is that they are merely interchangeable front-ends.  However, this was very much not the case.  On my box, suspend/hibernate, sound, and shortcut keys work out of the box using GDM, but not using KDE 4.1.  What’s the deal?