Monday, February 25, 2013

Expand short urls with one bash command

Warning: This post is pretty much for techies/programmers only. Sorry, but I just had to share this cause it ended up being pretty cool.

The rise of url shorteners, while useful, has made it kinda uncertain where any particular click will take you. Even if you're not as security-conscious as I am, sometimes you might be wondering whether some link will take you to some annoying spam page.

There are wonderful services like LongURL and Long URL Please, which try to make it possible to see where you're going before you click, but sometimes they're tripped up by unknown url shorteners or multiple levels of redirection. Plus, it takes a few clicks to get to those services in the first place.

Thing is, I know that it's possible to make a generalized service that simply looks for any HTTP redirects and follows them until the end of the chain. For the longest time I've meant to make this, probably as a web tool. But then I started messing with curl's -I option (which prints just the HTTP response header), and realized I could make it much more simply. Eventually I ended up fitting it into 6 lines of bash! So I thought I'd share:
longurl () {
  url="$1"
  while [ "$url" ]; do
    echo "$url"
    line=$(curl -sI "$url" | grep -P '^[Ll]ocation:\s' | head -n 1)
    url=$(echo "$line" | sed -r 's/^[Ll]ocation:\s+(\S.*\S)\s*$/\1/g')
  done
}
Just paste the url after the command "longurl" and it'll follow the redirect chain, printing each url. For example:
$ longurl http://t.co/8VzDpOP0Xz
http://t.co/8VzDpOP0Xz
http://ow.ly/hU93Q
http://www.quora.com/Lincoln-2012-movie/How-historically-accurate-is-Lincoln-the-movie
Note: As an optional feature, you can add the line "echo -n "$url" | xclip -selection clipboard" at the top of the loop to use xclip to automatically paste the final url into your clipboard*. But it only works on Linux systems and xclip isn't a default package, so I left that line out. Oh, and a disclaimer while we're at it: I really should be checking the HTTP response code, yadda yadda yadda, didn't read the relevant RFC's, etc. But this is simple, it should work in most cases, and when it doesn't, you'll know.

Anyway, if you're the kind of person who usually has a terminal sitting open, this might prove pretty convenient. Just paste the function into your .bashrc file to have the command available in every session. Oh, and make sure you have curl installed. But you should already have that, shouldn't you?

If you need any more convincing, here's an example I just ran into of a nice, long redirect chain that did indeed end up at a spammer site. Glad I checked it first:
$ longurl http://t.co/oZ2IWUfW9m
http://t.co/oZ2IWUfW9m
http://is.gd/5TIIkF/ubeldynl
http://steve.omeuemail.com.br/7voxe1rz0m1hwcrsOmngucq/Qznqh4x-Ninlkk0yiq7kdmlyx-Rje1ieyqgkmbtqxhswaxmcl/5rwc6eyhfxqbp/Sw0yazi5lqmew5fxszvte0/Nvefuwsqe9q3zbjvvlsiswyv0Kmbbqpmgawedcrtkhv/Rdwoy5iwkfxigllbuqzvxfyw-D3qvi1z7f
http://gift-card-rewards.com/?r=y


*Now, I actually have a modified version that uses sed to paste just the domain name into my clipboard because my most common use case is to immediately paste the domain into Web of Trust to see if the link actually goes somewhere nasty. So as an FYI, here's my version of the line:
echo -n "$url" | sed -r 's/^https?:\/\/([^/]+).*\/.*$/\1/g' | xclip -selection clipboard

Update: If you're looking for some interesting links to try it on, I suggest using any of the links in the weekly Ars Technica "Dealmaster" posts. These seem to always go through incredible numbers of redirects via various tracking, advertising, and analytics companies. For example, http://bit.ly/1b5KFTr gets you a total of 14 redirects! It actually fails on the last one because it's a relative URL, but you can just use the one before it. I don't have a problem with these links, since I believe the redirects give credit to Ars and helps support them. Still, it shows how this little tool can shed light on a lot of stuff going on behind your back that you wouldn't have ever noticed otherwise.