The SEO industry relies on data and measurement to justify its actions, strategy and very existence. Without public data or competitive intelligence, sites run the risk of existing in black box silos with everyone but the Webmaster guessing at possible visitor traffic and pageviews. So we rely on estimation tools, such as Compete, Alexa, Quantcast and ComScore to not only spy on competitors but also present a public snapshot of a site to possible advertisers without giving away all that valuable data.
Rand Fishkin recently discussed the accuracy of visitor data for SEOMoz site traffic and the sometimes egregiously incorrect data provided by these tools. He rallied a call-to-arms for Webmasters to share the level of deviation on their own sites and form an industry consensus. We took Rand’s spirited dissertation to heart and analyzed a large (anonymous) site, comparing actual data with the public estimates.
Alexa.com – Lots of Data (Most of It Wrong)
Alexa gets credit for offering more data than other free public services, including search keyword traffic, clickstream data and audience demographics. The only issue being the extremely inaccurate nature of this information.
The top queries from search traffic were not indicative of actual organic search activity, with a few queries not appearing at all in the real data. Perhaps most inexcusable was the “% of traffic from US”, comparing Alexa’s 74% with the actual 96%. This is an important metric to advertisers and the discrepancy here could very much impact our business opportunities.
The demographic of site visitors has been shown to be Males, 35-44, (confirmed through a number of methods) but Alexa interpreted our visitors instead as Females, 45-54, which is the exact opposite from reality.
Compete.com – Under-reporting Visitors & Proud of It
Compete reported Unique Visitors at 57.92% lower than actual data, though some penance is given with the option to export CSV data. They might be forgiven a margin of error for sudden swing months, though this large discrepancy is averaged over 7 months time.
Actual Data: 4,596,476 unique visitors
Compete Data: 1,934,332 unique visitors
DoubleClick – Google Gets It Wrong
You would expect the master of data to have accurate numbers, especially considering that the crux of their revenue stream depends on adoption of AdWords and AdSense. The margin of error is not much different than Compete and Alexa at 63% under-reported Unique Visitors and 24% under-reported Pageviews.
- DoubleClick: 68,000
- Actual: 185,733
- Difference: 63.39%
- DoubleClick: 830,000
- Actual: 1,086,206
- Difference: 23.59%
Quantcast – Great if Quantified (Otherwise Not)
Electing to place a Quantcast tag on your site (hereafter known as being Quantified!) allows you to publically share real data. The tag does not estimate, nor does it guess, but instead provides accurate and reliable data. However, not everyone is comfortable with this level of transparency and instead relies on Quantcast to estimate their traffic. A quick review of 3 large sites shows unique estimates under-reported by 85% – 90%. Not even close to reality. Similar to Alexa’s missteps on audience demographic, Quantcast viewed our Males, 35-44, site instead as Females, 50+.
ComScore – You Pay for the Best of the Worst
The only paid service mentioned here is noticeably more accurate than the alternatives, showing only a 12% pageview under-reporting discrepancy compared with real data. Being the service most widely respected, it was refreshing to see ComScore actually coming close to providing true analytics.
Actual Data: 10,161,527 pageviews
ComScore Data: 13,519,743 pageviews
The goal of these traffic tools is to offer a public face for your site without giving away all that valuable data. So in that context estimations are perfectly useful, though a margin of error between 10%-15% is not an unreasonable request for general accuracy. All tools error on the side of conservative under-reporting, which is an understandable tendency, though the degree of discrepancy should still fall within moderate boundaries. It is interesting, however, that many tools show a similar magnitude of error around 60% under-reported, perhaps indicate that the same (inaccurate) data source is referenced by multiple tools.