How bots mess up clickthrough stats and can be used for evil

on June 29th, 2010 by Richard Orelup in Web tech
| Share

Jump down to the “After the Technical” section if you don’t want to have an greater understanding of the technical side of how URL lookup twitter bots and tracking traffic on the internet works.

In the first couple seconds of putting up a shortened URL on twitter I had all this bot activity come to my site.

From access.log of http://tada.me/ae

From access.log of http://ticketstada.com/

That’s 22 bots, more could have came later but this was just the initial blast. This is from an account that is followed by few so it has nothing to do with follow size.

But here is the thing. On ever single one of those hits on my server, php is ran to generate that page. This can be quite an annoyance to people with a page that has a long load time since that many concurrent hits could really screw you up (it shouldn’t but that’s another article.)

For most of us though this means little to nothing. If you use bit.ly they won’t count as a clickthrough because they have tracked things for a long time and know which ones are bots. This is one of the biggest issues with is.gd, they don’t seem to do any filtering (just checked, still do, and got 28 bots) so your numbers all include bot lookups. For people like myself who don’t have the vast data bit.ly does I just have to kind of work with the user agents and track other IPs that seem strange, as well i can just track if they later loaded some JS on the destination pages (that’s a whole nothing product :) )

When it comes to most website tracking, bots don’t run Javascript (well most you will deal with, some do but again not the ones we are dealing with.) or even grab the JS files as you can see in the logs. So the Google analytics code is never run so you don’t even see they were there.

Okay now that we got a bit of the technical stuff behind us lets look at the issue that I brought up related to this. First you need to see this neat report from backtweets.com that shows 29 bots who RT’d this link. The people who actually RT’d it were only 4. Now it is not uncommon that you may get a random RT or 2 from a bot because you hit some keyword they are just repeating, but 29 is not very common (if it was tons of VL articles would have 1000 hits.) What this speaks more to is that someone fed this address to the bot network to be RT’d.

After the technical stuff

What isn’t going to be obvious to most is what real advantage there is to this. In most cases this is more a search or @mention type of spam just hoping for a little traffic to be found that way. But this is why I had to put the technical stuff out there first because that’s where you will start to see how this would be even more advantagous to someone like VL who is doing things a little differently.

If you go to any VL article, at the bottom you will see a hit counter. That hit counter is being run through the PHP portion of the page. So unlike the other tracking methods that use JS and are unaffected by bots, this counter is. So the second an article is tweeted the hit counter is going to go up. If you would like to see for yourself just grab a random article you think noone will be reading right now frmo a few days ago. Get the original hits. Then create an http://is.gd/ link for it. Go tweet that link from an account that you expect noone to see and ask people not to click it. If you add a “-” to the end of the url from is.gd you will get the stats. Notice how you had around 20+ “accessed directly” times and if you go back to the VL article you originally picked you will see that it’s that number of times+1 (because you just went to it again) more hits. Here are some numbers for one I just did on a VL article

Original VL article Hits: 26
is.gd link 21 “clickthroughs”
Post tweet Hits: 49

Now okay who cares, why would anyone trust these numbers? Well thats where the issue comes in at, they make their money based on advertising. When they are trying to get someone to pay for ads they will point them at articles and say see how many hits they get. The problem is now those numbers are all out of whack and completely inaccurate. Does this mean that other tracking methods are perfect? Hell no, It’s easy to trick your own analytics software to believe whatever you want it to. But when it comes to tricking Google Analyitcs on your own blog who cares and takes more effort then it’s really worth. But when you have something as simple as this that is going to be WAY overblown especially when you are using your Twitter account as an RSS feed.

As well, is it not only loading their hit counter every time a bot hits the page, it’s also loading the ads as they are hard coded into the HTML. I don’t know what they offer to their advertisers to see or if they are using other methods of logging to see if the ads were actually displayed, but I really doubt it since they don’t seem to be savvy enough to understand why that would be needed. I would bet their advertisers are not aware of these facts. I’ll try to get more information on this whole situation tomorrow at the Social Media Day event.

So this leaves a lot of unanswered questions. Is the marketing material are they talking about Page views and uniques from Google analytics or from whatever they are using to track the hit counters? Why have the hit counter there when it is obviously flawed unless you are trying to give off a feeling that more people are in fact coming to the site? Why would you use these numbers as selling points to potential advertisers in meetings? Were these things done intentionally or just a complete lack of understanding of what any of this actually means? Does ignorance remove you from being ethically responsible for your actions towards a client? Can you really plead ignorance when you are the one hyping and selling it as being real and of great accomplishment? If you are going to sell a product based on your word, shouldn’t you have understanding of the product you are selling and not be completely ignorant of the nature of it? Is ignorance just a go to response when something is pointed out?

I know lots of ethical things to debate in there and I have a feeling that tomorrow will be a fun day going over it. I have a lot of things to say on those questions but that’s enough for now. The Synergizer might show up with my view on ethics in technology and how martketing people abuse it (well I would more argue you can’t abuse ethics when you have none :) ) with no desire to ever really understand it.

I am also not saying that all these things happened here and that there is any directly intentional unethical behavior on VL’s side. That is near impossible to know concretely without someone actually coming forward and admitting to it. Probably tomorrow both VL and I will have a better understanding of their tracking and the numbers they are selling.

So hopefully for those who read the tech side they have a greater understanding of what’s going on behind the scenes and battles us developers have with tracking. For those who only read the later part, I hope you have a better understanding where there are issues with the methods of tracking used in this case and can see where someone would gain financial benefits from doing something like this. I know that was hard for some people to comprehend when they aren’t really tech people who have not put the research into this that I have on my own projects.

Comments: 6 Tags:

6 Responses to “How bots mess up clickthrough stats and can be used for evil”

  1. Joe Porter says:

    So what would your thoughts about W3Counter Stats http://www.w3counter.com/ be. They use the javascript format and trigger. So they would fall into “more accurate” stats category?

    Nice break down and explanation.

  2. […] This post was mentioned on Twitter by Joe Porter. Joe Porter said: RT @ripsup: Okay all so here is my post on how bots can be used to boost traffic. http://ow.ly/250vL @ValpoLife @wadeshull #nwindiana […]

  3. First off I don’t believe for a second that VL would want to deceive their advertisers, as I work with a few of these advertisers and they are very pleased with actual traffic from the site. Perhaps this kind of tracking is more important when a publisher is selling page view related ad pricing?

    You mastery of the details behind the scenes far exceeds mine, no argument there.

    But I do wonder what started this research project, I obviously missed a conversation in the last couple days about traffic counts?

  4. Wasn’t really much in the way of research as it was quite obvious where the numbers were coming from to anyone who has significant experience on the development side of the web. Just did a couple checks to make sure it was right before asking for clarification.

    Someone mentioned to me how many hits the counter had for the NWITweetup article and wondered if that was correct. I said no it most likely wasn’t and the person mentioned how it had a TON of RT’s. That is why I started to wonder about why those ever happened. The numbers point towards someone using one of the many bot networks on twitter. Does that mean VL did it, no? It very well could have been someone else playing with it to see what could be done. But that doesn’t change that they do gain by having their hits inflated.

    We will have to talk more about the traffic others are seeing cause again, I don’t have those numbers to really comment about them. See you all at the event.

  5. Just went and did a review of what w3counter is all about and not really a fan of it. Did some tests and it’s easy to game in a bad way. Though luckily it showed me something to look out for when working on my own stuff. :)

  6. Joe Porter says:

    Glad I could show you what not to do lol. Like I said before it a has a great interface for quickly getting the stats. Always been a fan though of using multiple stat tracking for comparison. Look forward to what you came up with.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>