Skip to main content
Happy 2 Share BLOG

Peeking Thoughts, Ideas & Concepts.

Google Crawls Techniques

To understand why this works you need a brief understanding of how Google currently crawls the web.

Googlebot, as it is widely known, has always been a text-based web crawler, capturing the web by recording and organizing sites and pages looking at the code that makes up a site.

In recent years the appearance of visual snapshots and an understanding of headless browsers and the theory that Google uses its Chrome browser as part of that crawl have pushed us toward the belief that Google actually “sees” the web page too.

The problem is that the trick being used here suggests that those two crawls aren’t in parallel, or don’t talk to each other at least, to match what the text crawler is seeing to that of the visual crawler.

The Trick

I say this because in the case of several successful black hat sites they appear to be using a clever CSS trick, hiding links in powerful places that pass huge chunks of link equity, while part fooling Googlebot, buying them precious time at the top.

A lot of key links are “placed” in a position so high up on the page that they are “invisible” to the normal user, often sat in the header in pixel position -9999px or similar. That way the user, and visual crawler, doesn’t see the link and so it takes Google much longer to find out how that site is actually ranking.

Here’s what the offending script usually looks like:

As an added bonus, as well as buying time for the site, Google may also be seeing this link as a header link, passing even more link juice across because of it. A 2004 patent application by Google suggested they planned on assigning greater relevance to links in such positions and I wrote a little more about personalized Page Rank in “Is Google Afraid of the Big Bad Wolfram?”

Those making money out of the sites know this, and they also know that by the time Google’s crawlers piece together the picture from their main “base” crawl, not just their regular visual and “fresh” crawls, that they have already made a chunk of money.

The time comes, of course, when the site will be taken out, either by sandbox or by a Panda or Penguin crawl, but by that time the money is made and time bought to simply line up another site. And the process is then repeated.

Thanks to All

by

Zuan Technologies

Leave a Reply

Your email address will not be published. Required fields are marked *