Filter out spammers and click bait from Google Analytics

During the last few months, a new wonderful type of spam became part of my life: the Google Analytics spam.

As this article describes, what happens is that you start seeing some blatantly bogus traffic coming from a bunch of websites like semalt.com, buttons-for-website.com, darodar.com, or ilovevitaly.com.

Google announced an Automatic Bot and Spider filtering, but as some users on hacker news reported, it doesn’t work reliably.

So far, the only solution to this problem that worked for me is setting a filter, and add spammers to it as they come. There doesn’t seem to be that many as of today, so this approach is still usable.

[Update – July 2015]: if you have a public HTTP/PHP server available, and are willing to invest half a day to install it, piwik is a nice free, open-source Google Analytics alternative. Piwik uses a community-maintained list of spammers that can also be used in Google Analytics. They wrote a blog post about it, too.

I’ve been using piwik for a few weeks now, and I’m happy with it so far. The nice thing is that updates are very easy to apply, and they include the most recent list of spammers available. The thing that could be improved is the installation process, it’s not as easy as it could be (at least if you’re using Nginx as web server). They also have a Cloud-hosted version, but I guess that if you’re using Google Analytics for free, you’re more interested in free alternatives!

To add a filter in Google Analytics:

  1. go to your Administration page (last tab on your home page)
  2. All filters (on the leftmost column)
  3. New filter
  4. Choose Filter type “Custom” > “Exclude”
  5. Choose “Referral” from the Filter Field menu
  6. Set this as Filter pattern:
    semalt\.com|ilovevitaly\.co|priceg\.com|forum\..*darodar\.com|blackhatworth\.com|hulfingtonpost\.com|buttons-for-website\.com
  7. Select the views that you want to be filtered (I chose “All web site data”)
  8. Save

The filter pattern is a regular expression, so every time you find a new source of spam, simply add another “|spammersite\.com” (remember to escape dots with a backslash, as they mean “any character”).

It’s playing catch-up with spammers, but as long as Google doesn’t find a way to reliably detect them, it’s the only way to get rid of them. I’ve collected those 7 websites in a couple of months, and I’ve seen them being reported by other users as well. Since after setting the filter I’m no longer getting any bogus traffic, it looks like the problem is still relatively small and can be patched on case-by-case basis.