The Ascent of Ranking Algorithms
Algorithmic ranking is on the rise. Everywhere I turn, something or the other is being ranked analytically.
Ranking web pages based on relevance, pioneered by Google’s PageRank, maybe the best-known example of algorithmic ranking.
Also ubiquitous are ranking algorithms inside recommender systems. Given an individual’s behavior (browsing history, rating history, purchase history and so on), the idea is to rank the huge universe of things (e.g., books, movies, music) out there based on likely appeal to the individual and show the top-rankers. If you are an Amazon or Netflix customer, you have doubtless been at the receiving end of these ranked recommendations for books and movies that you may find of interest. Plenty of complex and occasionally elegant math goes into quantifying and predicting “likely appeal” (Netflix Prize winning approach).
Despite its age, recommendation ranking is far from mature and different flavors of recommender systems are popping up every day. Just last week, BusinesWeek had a story on The Filter, a new recommendation ranking system that is allegedly leaving the other approaches in the dust (aside: One of the founders of The Filter is Peter Gabriel, legendary musician and member of Genesis, one of my favorite rock bands).
So far, I have listed “old” examples of ranking: web pages, books, movies, and music. But recently, I came across something new: SpotRank.
Skyhook Wireless, the company that provides location information to Apple devices (when you fire up Google Maps on your iPhone, your exact location is pinpointed using a combination of GPS information and Skyhook’s wifi database – details) announced SpotRank a few months ago.
By tracking the number of “location hits” their servers receive from Apple devices, Skyhook can determine which spots are popular and when they are popular. They capture this in the form of a popularity score and, as the name suggests, SpotRank ranks locations by their popularity score.
Next time you are in a strange part of town, have time to kill and are looking for popular spots, maybe SpotRank can help you (at least if you like hanging out with Apple fans).
Now that places are being ranked, what’s next? Ranking people?
It is already being done. Heard of UserRank?
UserRank was created by Nextjump, a NYC-based company that runs employee discount and reward programs for 90,000 corporations, organizations and affinity groups. Next Jump connects 28,000 retailers and manufacturers to the over 100 million consumers who work in the companies in its network, typically getting the merchants to offer deep discounts.
NextJump calculates a UserRank for every one of the 100m consumers in its database.
The more a user shops on our network, the higher their UserRank™ will be. Users with high UserRank™ are more likely to spend and are typically your best customers.
NextJump creates value by allowing retailers/merchants to use UserRank in offer targeting. For instance, an offer can be targeted only to consumers with a minimum UserRank.
I wonder what my UserRank is?
My final example is from the field of drug discovery. In a recent article, MIT News describes fascinating work done by researchers at MIT and Harvard on applying ranking algorithms to this area.
The drug development process typically starts with identifying a molecule that’s associated with a disease. Depending on the disease, this “target” molecule either needs to be suppressed or promoted. A drug that’s successful in treating the disease is a chemical (which, of course, is just another molecule) that suppresses or promotes the target molecule without causing bad side-effects.
How is such a drug found? Over the years, researchers have amassed a large catalog of chemicals that can help suppress or promote target molecules. From this library, drug developers find the most promising ones to use as drug candidates for further testing and clinical trials. Unfortunately,
majority of drug candidates fail — they prove to be either toxic or ineffective — in clinical trials, sometimes after hundreds of millions of dollars have been spent on them. (For every new drug that gets approved by the U.S. Food and Drug Administration, pharmaceutical companies have spent about $1 billion on research and development.) So selecting a good group of candidates at the outset is critical.
This sounds like a ranking problem: given a target molecule, rank the chemicals in the database according to their likely effectiveness in being a viable drug for the chosen target.
The drug companies weren’t slow to recognize this, of course. They have been using machine-learning algorithms since the 90s with some success. However, the MIT-Harvard researchers showed that a
rudimentary ranking algorithm can predict drugs’ success more reliably than the algorithms currently in use.
What was the key idea?
At a general level, the new algorithm and its predecessors work in the same way. First, they’re fed data about successful and unsuccessful drug candidates. Then they try out a large variety of mathematical functions, each of which produces a numerical score for each drug candidate. Finally, they select the function whose scores most accurately predict the candidates’ actual success and failure.
The difference lies in how the algorithms measure accuracy of prediction. When older algorithms evaluate functions, they look at each score separately and ask whether it reflects the drug candidate’s success or failure. The MIT researchers’ algorithm, however, looks at scores in pairs, and asks whether the function got their order right.
(italics mine)
Rather than scoring each drug candidate in isolation and then ranking them all, the key idea was to build pairwise ranking into the construction of the matching algorithm itself.
As the data deluge gets larger and larger, finding information most relevant to one’s needs (be they mundane needs like in shopping or profound needs like in drug discovery) gets harder and harder. Perhaps this is why we are seeing ranking algorithms everywhere.
Have you seen any interesting examples of algorithmic ranking at work? Please share in the comments.
(HT to Karan Singh and Florent De Gantes for making me aware of the MIT News article and NextJump, respectively)



April 20th, 2010 at 1:04 am
now if only you could only rank ranking algorithms …
seriously, isn’t there a danger that ranking algorithms skew their data? take the “MOST POPULAR” list at nytimes.com. the very fact that this list displays the top 10 stories attracts people to read those stories (i know i do) thereby amplifying the popularity of the stories who make the list relative to those that barely miss the cut.
is this where the pandora approach to establishing conceptual links rather than links discovered by analyzing behaviour comes in?
April 20th, 2010 at 1:56 am
One can quite comfortably posit the equivalence between “scoring” and “ranking”, with the former clearly being more granular and the latter trivially derived from the former. If you accept this, then one of the more ubiquitous applications of ranking comes from the credit scoring applications pioneered by FICO (www.fico.com) in the 1980′s. In a very McDonalds-ish statement the company states that to date “more than 100 billion FICO scores have been generated and delivered” to rank – yes, people. I have personally been involved in developing and deploying a very powerful and currently industry dominant credit card fraud risk scoring (ranking) application.
As you say, there are probably hundreds (if not thousands) of such examples of algorithms out there for ranking everything from wine to women (and men). Of course, some are cleverer than others.
April 20th, 2010 at 11:24 am
@narayan: I agree with your take on the skewing potential of the “most popular” lists (as well as the benefit of non-behavioral approaches like Pandora). In fact, the effect is so pronounced that it is being exploited.
My favorite example of this is the Apple Appstore: if an app makes it to the top 10 list (and the list is generated based on recent download volume), it gets a huge boost in the ensuing days. As a result, publishers of new apps take out ads on other popular apps as soon as they launch; if the ads drive volume for the new app, then the new app may make it to the top 10 list and from that point on, they can sit back and watch sales go through the roof, without incremental spending on ads.
@badri: Fully agree that FICO is a great example of ranking. I omitted it since it is very well-known and “old” at this point. And, yes, there are numerous examples of ranking out there (“top B-schools”, “top cities to live in”, …) but I was looking for non-trivially algorithmic and/or new examples.
April 24th, 2010 at 2:44 pm
Page Rank is interesting. But probably more interesting is the algorithm that ranks the ads to display in Google’s search results. From the inventory they must have, to finding the best 5 to show you. Must be interesting. Nice job by them in keeping the focus on the other rank
Met the founder of Aardvark a few weeks back. He built a product focused on subjective search, e.g. “what’s the best restaurant in Boston”, and used the social graph to help rank the results. Google and others are not particularly good at subjective search. Google bought Aardvark.
April 25th, 2010 at 9:16 am
@Al: A fair amount is known about how Google ranks the ads. As you probably know, the ads are chosen based on a real-time auction for the keyword(s) used in that particular search. But they don’t just show the ads in decreasing order of the bid amount; they also take into account the historical click-through-rates for each ad, the quality of the landing page when the ad is clicked on, etc. As a result, even if your bid is way above everyone’e else, you may not make it to the first slot (or even the first page) if your ad/landing-page are deemed to be of low quality.
April 25th, 2010 at 10:27 am
Behind ranking actually is just scoring. Potent scoring techniques and technology have been in corporations for at least 30 years. The appeal of the progress in the original blog stems largely from its real-time, ‘all-coverage’ nature. Some sites got to a level of credibility by compromising on predictive accuracy. For example, corrabarative filitering and link analysis can effectively recomend a book you may like or somebodyyou may like to link to, but quick recoomendations often can not afford to build itself on ‘deep drivers’. Often, therefore, the hit comes fast and goes first.
April 26th, 2010 at 8:00 am
Good points, Jason. Thanks for your comment.
April 29th, 2010 at 1:41 pm
Great post. Quantitative models used as inputs to Portfolio Management are Ranking Algorithms too ?
April 29th, 2010 at 3:12 pm
@Krishnan: Yes, I’d agree. Many stock selection models rank the equity universe based on metrics like earning revision, price- to-book-value etc. and feed the output into portfolio construction/optimization algorithms.