I wrote an archive rotation script


I just released an archive rotation script on github. It’s here: https://github.com/maxharp3r/archive-rotator I use this script to rotate a number of backup archives that are the result of nightly db dumps and directory tar | gzips.

There are a few things that motivated me to write this script:

  • It’s stateless. The script uses a naming convention to track its state, so it doesn’t generate extra junk in my directories.
  • It doesn’t use configuration files. Again, no extra junk.
  • It doesn’t rely on dates or times. It can be used as a daily job just as easily as it can be used weekly, monthly, or every minute.
  • It supports three useful algorithms for managing the disk space/history tradeoff. These algorithms are: FIFO (simple rotation), Tower of Hanoi, and a tiered algorithm that is a more configurable version of grandfather-father-son.

My design is opinionated. If you’re interested in alternatives, here’s one that has different opinions, but is also cool: https://github.com/adamfeuer/rotate-backups

Photo credit: http://www.flickr.com/photos/benoit_d/4136716652/

New Q&A Dataset Available

Some researchers (from Singapore!) recently requested access to the data from our Facts or Friends paper.  I went ahead and bundled up a dataset that’s suitable for distribution.  If anyone out there is interested in checking out this small-ish dataset (~1100 codings across ~500 questions from 3 Q&A sites) just send me an email.  I’d be happy to share.

The Largely Ignored Topic of User Intent


I have a new paper up on First Monday. Maybe you should read it, if you’re interested in Q&A sites and the types of things people ask about.

Question types in social Q&A sites
by F. Maxwell Harper, Joseph Weinberg, John Logie, and Joseph A. Konstan

This paper is a collaboration between me and my advisor (HCI researchers) and two cool dudes from the humanities, in the field of writing studies (formerly: rhetoric). As it turns out, computer scientists and rhetoricians tend to think very differently about ways of contributing research to the world! This paper represents an early attempt to fuse some old skool Aristotelian rhetorical theory with some new school data mining.

Really, this paper reflects my personal frustration with much of the literature investigating user behavior on Q&A sites (like Yahoo Answers and Ask Metafilter). Most of the lit conveniently ignores the fact that there are a bunch of different types of questions that people are asking in online forums, and this fundamentally changes how we should view users’ intentions. It is indeed easier to assume that all questions are factual, and that “best answers” go to the best-written and most correct response. But these assumptions will lead to system designs that ignore lots of users.

This paper takes a shot at a more formal taxonomy of question types. I hope you find it useful or interesting.

Photo by matthileo

Pizza Lucé + Laptop Stand

I use a laptop as my primary development machine, with a nice external monitor to keep things ergonomically feasible. It was bugging me that the two monitors weren’t horizontally aligned, so I started shopping for a laptop stand. However, the coolest thing that I found was this post:


This inspired me to build the Pizza Luce laptop stand. I just winged it, but it only took about an hour.

pizza luce computer stand

A couple of “improvements” on the original:

  • Triple up on the structural cardboard
  • Add a V-shaped cross-brace for additional lateral stability
  • Use Pizza Luce pizza boxes to represent the ‘hood


Make Better Decisions: Thinkmeter

My friend Dan F and I have been working to start a new business these days (Blue Shift Lab). One of our shared interests – we met through working in the GroupLens research lab – is in the area of decision-making.  How can software help us to make better decisions, or to be happier with the decisions that we do make?  How can we facilitate decision-making where everyone (even the quiet people) have a voice?

We’ve started a blog to discuss the topics of decision-making and brainstorming. It’s at blog.thinkmeter.com. If you read this blog, please subscribe! We’re going to do our best to start a lively discussion that crosses academic though, business insights, and fun.

One more time: please subscribe to the thinkmeter blog!

Also, you can try the preview release of our new decision-making tool, Thinkmeter.  Let me know what you think!


Screenshot of thinkmeter.com

Screenshot of thinkmeter.com


Facts or Friends in Live QnA

I was just poking around Live QnA (Microsoft’s Q&A site) and saw that they are asking users to classify questions as conversational or informational now. This is the split that we investigate in our Facts or Friends paper. Cool! I would love to chat with the product team to get their impressions of how this is working out.

The screen for posting a question in Live QnA

The screen for posting a question in Live QnA

I Have Subjected Myself to the Mercy of Pagerank

Last week, I wrote an introduction to the talk I’m going to give at CHI. It’s about the creation of archival quality knowledge in the world through social discourse in Q&A sites.

The intro that I wrote depends on Google search results. Before an interaction, there were no good search results; after the interaction the results had greatly improved. In particular, I’m using an example from Ask Metafilter where I wish to learn about the search string fix hydraulic chair. When I wrote the intro, Google search results looked like this (the relevant link appears third in the list):

Original Google search results, from last week

Original Google search results, from last week. The third hit (indented) is the link of interest for the purposes of my talk.

Today, Google doesn’t have any results for the queries fix hydraulic chair, fix hydraulic chair site:metafilter.com or even fix hydraulic chair pneumatic cylinder site:metafilter.com.  Where have you gone, my precious example? (I note that Yahoo lists it as the top hit, but nobody I know uses Yahoo search.)

I think I’m going to persist with my example – it’s too much work to revisit the entire intro simply because some quirk of PageRank has determined that this post is in fact not worth indexing. Perhaps my link in this post will restore it to its former glory?

Facts or Friends? at CHI 2009

I wrote a paper this past fall that I hope you will read and enjoy.  I welcome feedback.

Facts or Friends? Distinguishing Informational and Conversational Questions in Social Q&A Sites (pdf)

The abstract:

Tens of thousands of questions are asked and answered every day on social question and answer (Q&A) Web sites such as Yahoo Answers. While these sites generate an enormous volume of searchable data, the problem of determining which questions and answers are archival quality has grown. One major component of this problem is the prevalence of conversational questions, identified both by Q&A sites and academic literature as questions that are intended simply to start discussion. For example, a conversational question such as “do you believe in evolution?” might successfully engage users in discussion, but probably will not yield a useful web page for users searching for information about evolution. Using data from three popular Q&A sites, we confirm that humans can reliably distinguish between these conversational questions and other informational questions, and present evidence that conversational questions typically have much lower potential archival value than informational questions. Further, we explore the use of machine learning techniques to automatically classify questions as conversational or informational, learning in the process about categorical, linguistic, and social differences between different question types. Our algorithms approach human performance, attaining 89.7% classification accuracy in our experiments.

I wrote this paper with Daniel Moy, an undergraduate at the University of Minnesota, and Joe Konstan, my advisor.  Also, many of you (my friends and colleagues) helped this research by coding data – thanks!

This paper will be published in the proceedings of CHI 2009, and is currently nominated for a best paper award, which is a real honor.  Also of note is the fact that our Q&A paper at last year’s CHI conference was wedged in a session with a medley of completely unrelated work (e.g., public interactive displays).  This year, Q&A has it’s own session with some outstanding researchers.

Voting Systems Research

I just saw a piece over at Stack overflow’s blog, called Vote Fraud and You – Blog – Stack Overflow.  Stack overflow is a Q&A site geared towards programming.  I think it’s a high-quality site, with many wiki-like features.  It’s designed more for the creation of archival quality information than for ephemeral Q&A (see Yahoo Answers).

This post reminds me that voting systems are still poorly understood, and a rich research area.  For example, the article discusses the development of technology that automatically detects “revenge voting patterns”.  Very cool.  However, I’m sad that most of this work appears to be happing (for the time being) in industry, where it’s harder for the rest of us to learn from these innovations.

Researchers: let’s see more work on voting systems!  If I build a points system into my community site, how should it work?