Saturday, August 05, 2006

del.icio.us API/Hack

A while after my last post on del.icio.us, Techcrunch wrote about it raising a couple of the issues that I talked about. They published some disappointing traffic stats and questioned their ability to achieve mainstream adoption. Later in the day, they published what amounted to a retraction. Oh well…

I am going to talk a bit more in this post about what I meant by referring to a del.icio.us API in my previous post. I wrote this fairly straightforward python script that takes your del.icio.us username, password and a URL and spits out the tags that people have used to describe it. Definitely install the relavent python packages before expecting this to work :)

import urllib2
import ClientForm
import ClientCookie
import re

cookieJar = ClientCookie.CookieJar()
opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cookieJar))
opener.addheaders = [("User-agent","Mozilla/5.0 (compatible)")]
ClientCookie.install_opener(opener)

fp = ClientCookie.urlopen("https://secure.del.icio.us/login")
forms = ClientForm.ParseResponse(fp)
fp.close()

form = forms[0]
form["user_name"] = "username"
form["password"] = "password"
mainurl = "http://del.icio.us/username?url="
url = "URL"
fp = ClientCookie.urlopen(form.click())
fp.close()

fp = ClientCookie.urlopen(mainurl + url)
items = fp.readlines()
fp.close()

for item in items:
    item_s = item.strip()
    l = re.findall("var\stagRec\s=\s", item_s)
    if len(l)==1:
        list1 = re.split("var\stagRec\s=\s", item_s)
        print list1[-1]
    r = re.findall("var\stagPop\s=\s", item_s)
    if len(r)==1:
        list1 = re.split("var\stagPop\s=\s", item_s)
        print list1[-1]

The power of python ensures that this does enough to login, maintain cookie state, parse out relavent HTML and spit out the tags. So this script does a little more than the length of it might imply :-) This is one example of an API call that a user might want. Using del.icio.us tags not just to search but to classify as well. Note that traditional classification methods rely almost exclusively on algorithms (as opposed to user-generated content).

Why would Yahoo! want to go this route in the first place? The answer lies in the fact that while tagged search may be **one** way to improve search results, it most certainly is not the only way. When Yahoo! does end up integrating del.icio.us into their search engine, tags are probably one of many factors (each with its own weight) being considered. So by staying the course, YHOO ends up losing del.icio.us in the noise of the search wars. But if they make their tag data available to their competitors, they actually end up making money off this thing.

The Techcrunch retraction mentioned the fact that del.icio.us now has over a 100 servers in action right now. The kind of infrastructure that can handle gazillions of queries per day from GOOG, MSFT, etc perhaps?

0 Comments:

Post a Comment

<< Home