Monday, February 05, 2007

What search terms did they Google to get to this site?

I've seen a few sites on the web that somehow know what you typed into Google to get to them. If you arrived at a page via a Google search the site might display some sort of welcoming text. How do the websites know what you searched for? Did they use the Java Mind Reading Library (JMRL)? Are they working closely with the NSA?

It actually turns out to be pretty simple. If you click on a Google search result link then the Google server will set the HTTP referer header with the terms you searched for (plus more info) and then send an HTTP redirect to your browser. At your website you can parse the referer header and do whatever you want with that data.

For example, if I do a Google search on "robert maldon" the referer header will look something like:


Referer: http://www.google.com/search?q=robert+maldon&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official


A little javascript can parse the referer and welcome anybody who came from Google. e.g.


<script language="JavaScript" type="text/javascript">
<!--
if (document.referrer && document.referrer!="") {
if (document.referrer.search(/google\.*/i) != -1) {
var start = document.referrer.search(/q=/);
var searchTerms = document.referrer.substring(start + 2);
var end = searchTerms.search(/&/);
end = (end == -1) ? searchTerms.length:end;
searchTerms = searchTerms.substring(0, end);
if (searchTerms.length != 0) {
searchTerms = searchTerms.replace(/\+/g, " ");
searchTerms = unescape(searchTerms);
document.write("<h3>Welcome Googler looking for search terms [<i>" + searchTerms + "</i>]</h3>");
}
}
}
//-->
</script>

A clever use of this referer information would be to automatically run a search using the same terms in your own sites search engine (if it has one).

2 comments:

Anonymous said...

You may be missing out on a lot of searches. Google comes in flavors other than .com; see googe.dk, google.ca, etc. etc.

Robert Maldon said...

Thanks, good point. I'm too used to using google.com. I've replaced the regular expression \.com with .*