Activity-Balanced Clustering at RecSys 2007

Shilad, Dan, and I wrote a short paper called “Supporting Social Recommendations with Activity-Balanced Clustering“.  I presented the paper at Recommender Systems 2007 here in Minneapolis.  The idea is that standard clustering algorithms such as k-means simply don’t work well if you’re trying to create groups of users that manifest similar levels of activity.  In MovieLens, k-means placed 84% of the active users in a single cluster.  Thus, we mashed up some techniques from the clustering literature to create an algorithm that improved on k-means in terms of cluster quality and computational complexity, while giving us the balanced properties we sought.  One of the key insights is that stable matching algorithms are a clever way to evenly divide sets of things based on similarity metrics.

I received lots of feedback from conference attendees.  One of the most interesting questions (that I have not investigated) is whether there are some unique properties of the users that I clustered that led to the severe lack of balance with k-means.  I don’t think so – the 20,000 or so MovieLens users that we used as our data have two characteristics that seem particularly regular:

  • Power law activity
  • Normal distribution of ratings (1-5 scale)

Another person suggested that I simply didn’t tweak k-means well enough (and the other algorithms I tried).  Well, I turned plenty of knobs, many times.  In the process of the knob turning, I realized that clustering metrics are a fairly bad way of gauging the “feel” of a cluster of users.  I’d like to dig more into clustering metrics that reflect the different ways that groups of people might come together.



One thought on “Activity-Balanced Clustering at RecSys 2007

  1. Max, can you send your paper to me on the clustering?
    Many thanks–! i have the feeling as well that the k-clustering is not capturing the full dynamic non-linearity of the social web

Comments are closed.