UnShorten WordPress Plugin Uses TheRealURL

I recently found that Jon Rogers, a developer from the UK, released the UnShorten WordPress plugin, which uses TheRealURL to unshorten links displayed by the Twitter Tools plugin.

That’s pretty cool, TheRealURL was designed as a web service with exactly this type of use in mind.

Between this and other users TheRealURL now serves over 40,000 requests a day. It’s nice to see Google App Engine handling that with barely using any of the various daily quotas (except for incoming bandwidth… Need to check on that one):

TheRealURL GAE Quota Usage

TheRealURL Adds Page Titles

I needed this for a project I’m working on, so I added a new feature to TheRealURL unshortening service: JSON/P requests now return the page title (scraped from the HTML <title> tag) as well as its original URL.

For example, http://therealurl.appspot.com/?format=json&url=bit.ly/a returns:

{ "url" : "http://www.apple.com/", "title" : "Apple" }

The plain text format remains as is – nothing but the unshortened URL – so I don’t think there should be any issues for existing API users. Response times don’t appear to be affected much either. If you do get any issues, please let me know in the comments or at niryariv@gmail.com.

The Long Poll: AJAX Push(like) Chat with Comet

Recently I’ve been working on an AJAX based chat application (in development..). The obvious way to do it is send an XMLHttpRequest every few seconds to check for new messages. Unless it’s a particularly animated conversation most requests won’t return any new content, so I added a simple Conditional-GET like system based on the chat’s text size. Here’s the client side implementation:

function refresh_chat() {
	  	url: "/chat",
	   	data: "format=xhr&chat_id={{chat_id}}&cur_len=" + chat_content.length,
		  complete: function(xhr){					
				if (xhr.status == 200) render_chat(xhr.responseText);
				setTimeout("refresh_chat()", 5000)

And the server code that handles it:

cur_len = self.request.get("cur_len", 0)
if len(chat.content) == int(cur_len):
	self.error(304) # return 304 Not Modified
	self.response.out.write(chat.content) # return new content

That’s basically the standard approach. Pretty simple, works ok (could be optimized a bit, for example return only the actual new content etc). It’s not exactly an elegant design, though. Trying to use HTTP, designed as a Pull protocol, for an application that requires Push results creates this system of frequent server requests with empty responses, kind of like the “Are we there yet?” conversations with kids on long road trips.

Jack Moffitt’s JSConf talk introduced me to the concept of Long Polling, aka Comet or (with a lot added) BOSH, as a way to simulate HTTP Push. Rather than have the client sending a lot of short, frequent requests and the server responding to each as fast as possible, long polling turns it around: the server holds the requests as long as it can, returning a response only when it has new data or a timeout limit was hit. So, instead of sending request every 3 seconds, for example, you can send one every 30 seconds.

Client side code remains almost the same:

function refresh_chat() {
	  	url: "/chat",
	   	data: "format=xhr&chat_id={{chat_id}}&cur_len=" + chat_content.length,
		  complete: function(xhr){					
				if (xhr.status == 200) render_chat(xhr.responseText);
				setTimeout("refresh_chat()", 1000);

But on the server side, there’s a bit of new logic to keep checking for new content while the server holds the response:

cur_len = self.request.get(“cur_len”, 0)
end_by = int(time.time()) + 30

while int(time.time()) < end_by: if len(chat.content) != int(cur_len): return self.response.out.write(chat.content) # return new content time.sleep(1) self.error(304) # return 304 Not Modified [/sourcecode] If you have any experience building web applications, you've spent a lot of effort making sure servers respond quickly to requests. Delaying the response is counter-intuitive, which in itself makes Comet useful to know, if only for its new perspective. However, this also makes production use a bit complicated, since most web server stacks are optimized for maximum requests/second rather than long concurrent requests. Content-rich sites often use separate servers for big media content for this reason, and Comet also has its own server (er "HTTP-based event routing bus") in Cometd.

FeedVolley: Messages From Iran

I just put up a quick hack I made with FeedVolley (more about FV here), that aggregates Twitter (and other media) feeds coming from inside Iran: Messages From Iran

I don’t know about news value, but it’s pretty cool to be able to refresh that page now and then and get a snapshot of the current mood and happenings, in these possibly historic times there.

It was also cool to find another use for FeedVolley, which I neglected a bit recently ;) I added some page caching on top of the existing feed caching, to allow it to handle some traffic (Slicehost’s 256MB slices seem to start sending swap alerts as soon as traffic rises above negligible). The sources are basically the ones listed here, with a few additional ones I’m trying to find. In fact, if you really want to keep a close watch on what’s going on, you may want to watch the FriendFeed stream – the FeedVolley page is really just an HTML skin to make the feed look a little nicer (hopefully).

(Favorite tweet so far: “@jonobacon IRC is blocked. Tell our regards to Ubuntu Global Jam from Iran. I’m twitting the #iranElection story from a Kubuntu machine :)“. Makes me think of starting to use Twitter again..)

JSONP, Quickly

I discovered JSONP just recently, following Chriscomment. Though I initially didn’t intend to support JSON, JSONP made enough difference that I rewrote most of the TheRealURL code (all 20 lines of it) to support it. Since it took me some time to figure out JSONP initially, perhaps a quick guide might help those who follow.

JSONP allows you to make an HTTP request outside your own domain, which enables consuming Web Services from JavaScript code. It relies on a JS quirk: while XMLHttpRequest is blocked from making external requests, there’s no such limit on <script> elements. What JSONP does is add a <script src=> element to the DOM, with the external URL as the SRC target.

To serve JSONP simply return the JSON data inside a function. e.g., this JSON:

{ "hello" : "Hi, I'm JSON. Who are you?"}


some_function({ "hello" : "Hi, I'm JSON. Who are you?"})

(The reason is that the latter is actually code that will run inside the created by the JSONP client, so it needs to be executable code rather than plain JSON data)

some_function is provided by the calling client, usually in the ‘callback’ parameter. So, a query like this:


Should return:

getthedata({ "hello" : "Hi, I'm JSON. Who are you?"})

On the server side, this means adding some code similar to this:

// assume $json holds the JSON response
if ($GET['callback'] != '') $json = $GET['callback']."( $json )"; 
return $json;   // my PHP is rusty but you know what I mean

On the client side, modern JS frameworks include JSONP support (or you can DIY). For example, in jQuery <= 1.2 adding &callback=? to the query string in getJSON method’s URL sends a JSONP request.(jQuery transparently replaces the ‘?’ with a unique string). Here’s how you get the unshortened URL for ‘bit.ly/a’ using therealurl:

	function(data){ alert(data.url) }

That’s about it. JSONP probably won’t feature in the next Beautiful Code edition and obviously you need to watch the URLs you’re accessing so you don’t get malicious JS code executed, but, until cross site XHR is resolved, JSONP can get the job done.

The Real URL

[UPDATED on April 21st, 2009 to reflect the JSON/P additions. Since it’s <24 hours after the initial release, I hope it won’t cause anyone problems.]

The Real URL began as a joke – after discovering, while working on another project, over 80 URL shortening services, I figured there must be room for a service that un-shortens all these URLs. (The web is overflowing with hype and blog posts/articles complaining about it just add to the noise, so it’s better to make your point by building something. My favorite example is the Twittering Office Chair).

Turns out there are already several out there: (eg, trueurl) but I built it anyway, since I had a slightly different approach in mind. The Real URL is meant to be used as a web service rather than on its own. It returns the “real” URL in either raw text, JSON or JSONP format – examples and details are on the homepage. (I added JSON mostly for JSONP, per Chris’ comment – admittedly I didn’t even know it existed ;) This enables cross site JS requests which might actually make The Real URL useful.

While I do want The Real URL to be solid and reliable in the long term, I don’t want to spend much time/money keeping it up. It’s a sustainability issue – building a system that will work reliably over a long time while requiring minimal care and resources. I made a few design decisions to that end:

  • Keep it simple (always a good idea). Real URL does only one thing and is accessible in only one way (the homepage demo uses XHR to access the service, to keep it so). It now supports text/JSON/JSONP, but it’s just the same output formatted differently. Sometimes you give up some elegance to make the product useful. As in the following item:
  • Deploy with Google’s App Engine. Initially it was nice, super-minimal Sinatra code. Unfortunately Google App Engine doesn’t support Ruby yet and there’s no service that offers comparable cost/stability ratio, so I rewrote in slightly less minimal Python for GAE.
  • Use App Engine’s domain (therealurl.appspot.com). Buying a domain and keeping it renewed isn’t a big deal, but it still requires some attention – especially if you happen to hit a nice domain name which people try to grab or piggyback on. Sticking with appspot.domain minimizes this issue. (if the need rises I might add a “real” domain later on, but in any case therealurl.appspot.com will remain active)

If you find a use for The Real URL this or have an idea for one, please comment here or email me at niryariv@gmail.com. Let the street find its own uses etc ;)

List of URL Shortening Sites

I’ve been compiling this list of URL shortening services for some time now, for use in one of my projects, and thought it might help developers who need it for their own work (or VCs who seek to place a couple $mil on one)

Anyway, here are the 74 82 (thanks commentors!) sites I got so far. If you use Ruby, just stick %w{ } around it and you’ve got an array. If you own one of these sites, put “Twitter-compatible” on your homepage, who knows ;):

adjix.com b23.ru bit.ly budurl.com canurl.com cli.gs decenturl.com dolop.com dwarfurl.com easyurl.net elfurl.com ff.im fire.to flq.us freak.to fuseurl.com g02.me go2.me idek.net is.gd ix.lt kissa.be kl.am korta.nu krunchd.com ln-s.net loopt.us memurl.com miklos.dk moourl.com myurl.in nanoref.com notlong.com ow.ly ping.fm piurl.com poprl.com qicute.com qurlyq.com reallytinyurl.com redirx.com rubyurl.com rurl.org shorl.com short.ie shorterlink.com shortlinks.co.uk shorturl.com shout.to shrinkurl.us shurl.net shw.me simurl.com smallr.com snipr.com snipurl.com snurl.com starturl.com surl.co.uk tighturl.com tinylink.com tinypic.com tinyurl.com tinyvh.com tr.im traceurl.com twurl.nl u.mavrev.com ur1.ca url-press.com url.ie url9.com urlcut.com urlhawk.com urli.ca urlpass.com urlx.ie xaddr.com xrl.us yep.it yuarel.com yweb.com zurl.ws

UPDATE: I moved the list to listable.org, per Karan’s suggestion, which allows easily exporting the date to SQL, JSON or text. Future updates will all be there: http://www.listable.org/show/url-shortening-sites

UPDATE #2: As a result of this post I ended up building a URL unshortening service, which I now think might actually have some uses. More here.