We examine the common technical SEO errors on websites and offer some advice on how to resolve them.
We have been collating data on technical SEO errors we come across in various places for some time. Over the last few months though (since the New Year), we’ve begun to use a more rigorous system to record information on these errors, so that we can look over and learn from what we’ve found in the past. Hopefully, we can use this general data to at least partly predict the future of our clients’ websites. In this article, we’ll look at the most common technical SEO errors we’ve found, and show our data in some (hopefully self-explanatory) charts.
We see a lot of SEO errors here, and a lot of opportunities for website optimisations. It’s easy to fail to draw a bright line between the two, which would result in pretty vague and open-ended results, but this data definitely addresses only the former, not the latter.
Why We Shared What We Shared
We’re sharing our data because we like to be transparent, because we love to share, and (obviously) because we figure we’ll get a few people landing on this page from organic search.
We’re also hoping that others will stumble across this page and contribute their own ideas – whether that may be in the form of anecdote or hard data.
Here are some caveats before drawing conclusions from our data, which also happen to function as suggestions for anyone who might like to create a more complete resource.
Some sites that look like they have serious structural problems have a disproportionate presence on this list. That probably skews the data a bit.
Also, some of the most common technical SEO errors won’t be recorded here. They’re too small, low-impact and easy to fix in practice to bother planning around or deliberately spending time to seek them out. That includes small numbers of less important 404s, missing elements, attributes or metadata, but we’ve kept data on broken HTML that significantly affects the page or looks as though it is dynamically generated.
Finally, the importance and complexity of errors are bounded by our own expertise. Security and server errors involve backend programming and software architecture issues, and so we may have a tendency to overrate the complexity of simple fixes. On the other hand, since we’re very familiar with soft 404s we may overrate the importance of the errors due to the wealth of horror stories we have to draw on.
The sample size is currently quite small, as we’ve only been keeping detailed-enough records of technical errors since the start of this year, however they should give a decent idea of the kinds of technical SEO errors that are most common in 2015.
SEO Error Frequency
Each error we record receives a rating for priority/severity (out of ten) and for complexity/time required to fix (also out of ten). These are recorded in the chart as importance and complexity respectively. Both are assigned based on an estimate and then adjusted over time.
SEO Error Charts
What Do The Groups Mean?
These groupings of errors refer to problems as they affect SEO, not in general – we’re not pentesters, so the security errors we find are significantly more limited in type than they might otherwise be. The way we’ve divided up errors is to some extent subjective, if you have comments, questions or criticisms we’d be glad to hear about them.
Duplication refers to duplicate content, whether that’s generated by server-side programming errors and oversights or manufacturer’s descriptions on every page of the site.
Redirects are usually about 302s being used instead of 301s, but may also include meta refreshes and even errors in the implementation of a redirect technique when the website owner appears to be particularly wedded to that technique. It also covers, for instance, redirects being stacked too deeply (e.g. 301->301->301->200).
We’ve not included all 404 errors in this group. We think it’s possible to fix small numbers of 404s as and when they appear, and some 404s are more about a failure to redirect than a broken link so they’ll go into that category. This category is more for site-wide errors, relative URL mishaps leading to infinite loops (this one is particularly awful for munching your daily Googlebot credits, but any decent crawler will crash into the problem at some point), and other such significant disasters.
Again, only security issues that affect or risk affecting SEO. One example was an incomplete removal of a commenting plugin that allowed spammers to post comments to a page with no comment submission form. These comments were hidden from view – I initially thought there was an oddly-specific page load speed issue until I viewed the source and found thousands of spam comments and links to sites of extremely poor repute.
Sitemaps failing to index what they definitely should, and succeeding in indexing what they definitely should not, is distressingly common. Sitemap errors can result in problems such as sending mixed signals to Google about the version of the site to index (https vs http) or, in the case of very large eCommerce sites, leading to Google remaining ignorant of some sections of the site.
Errors in the robots.txt, or in meta attributes that have the same effect. These have a fairly straightforward and predictable effect on your SEO.
Predominantly errors made when choosing which HTTP status to return for a 404 page, occasionally due to sites mistakenly generating empty pages.
Leaking elements, un-semantic misleading or archaic markup, bad characters, and broken nesting. We’ve not recorded minor errors, these are again sitewide issues.
Site structure can dramatically affect SEO. While it would be a stretch to call a structural decision an “error”, they can sometimes generate potentially infinite duplicate pages, or make crawling the site (and by extension, using the site under peak load) extremely difficult.
CSS errors that make the site unusable severely affect SEO. While display on a mobile device is important for SEO as well, we’ve not always recorded it as an outright error – this may change in the coming months.
Some sites are heavy, and that’s fine. Some sites load a lot of files, and that will be fine in the near future. Others load resources that are essentially duplicates of each other, or resources that will never actually be used on most pages. Unnecessary loading of resources can often cost entire seconds of site load time, in an age in which we are theoretically aiming for under a second’s site load for the entire site.
Issues on the server-side or with the server that are causing 5xx errors of some kind or severe outages. This includes things like Drupal’s performance issues when uncached.
What We (Think We) Found
We think we found some interesting and relevant information about SEO errors by looking over the sites we’ve been coming across, but take our conclusions with a pinch of salt. Please get in touch if you’ve found a mistake in our reasoning!
Our Most Common Errors
Refer to our “Frequency” chart to see the most common errors we found since the start of the year. Generally, we found more errors that are more closely associated with the front-end (URLs, HTML, Unnecessary Loads), and fewer that are more closely associated with the back-end (Server Issues, Security).
These do seem to be less important errors, so it makes sense that we’d find more of them – they’re harder to spot and fix for developers already working on the site. Many were spotted by lucky experimentation, or close familiarity with the site, or while reviewing and analysing the site’s content.
By contrast, back-end errors are generally uncovered during exhaustive crawling or a more methodical manual exploration of website pages, which we obviously can’t do as frequently. This explains the disparity, but it also emphasises how easy it is to find errors in the front-end code. Keep an eye out for opportunities and hit “F12” or “CMD-Alt-I” regularly.
If you refer to the “Average Importance” chart, you quickly see that sitemaps are disproportionately high priority. This might seem odd, because sitemaps can be ignored by search engines and frequently are ignored by search engines – a high priority would be understandable, but such a very high priority is bizarre.
The prominence of sitemap errors is probably due to small sample size, but there is another possible reason. Sitemap errors are more common on sites that are already very complex – and these sites are exactly the sites that desperately need the sitemap to help search engines out, making sitemap errors a high short-to-mid-term priority. In this sense the high importance is potentially misleading, as part of the problem is site structure.
Security issues, duplication and soft 404s are also of high average importance. Check regularly for compromised site content, whether it’s coming from inside or outside!
We’re defining a ‘quick win’ here using our records of both priority and complexity. A fix that can be solved quickly but isn’t a very high priority isn’t as quick a win (“win a quick”?), in our world, as a fix that can be solved in an average amount of time but is a very high priority. It’s not what you might expect but hopefully it makes sense! Refer to our “Average Importance Vs Complexity” chart for an overview.
Our quickest wins as of April 2015 have revolved around sitemap errors, and soft 404s. For soft 404s, the reason for this should be readily apparent – the most common cause of a soft 404 error is simply that you’re returning a HTTP 200 status for your 404 page. Fix this, and you fix the problem.
Sitemaps for larger, dynamic sites are at least partly automatically generated by third-party plugins, which also explains why the fixes are not that complex. Switch the plugin, update the plugin, tweak some features, maybe exclude query strings or tell it to obey your robots.txt and the problem is often resolved.
For these reasons, we think having an occasional look for soft 404s, and for missing or inappropriate sitemap indexing, is well worth it. Soft 404s may fail to show up in Webmaster Tools on occasion (highly customised and dynamic 404 pages with a lot of content can look like a valid category or search page, and I’m pretty sure I’ve come across other edge cases in the past), so do continue to check for suspicious duplicate pages, 404’ed pages showing up in Analytics, and occasionally just check out pages that should be 404’ed with Redirect Checker or cURL.
This information isn’t in the charts yet, because the variety of the sites we have been looking at is fairly limited. There are three main CMS used across the sites we looked at, and two main backend programming languages used across two servers. Other than that, they’re mostly one-offs and custom static sites. I’ve found it difficult to work out a way to chart or graph this.
We’ve grabbed what information we have from Wappalyzer, a decent app with some privacy issues.
Although I assumed that big, dynamic eCommerce sites would also have a disproportionately high number of errors, I found that the majority of errors come from sites that are effectively static, but unusually large for static sites. This makes sense – dynamic sites have fewer errors, they’re just bigger and more noticeable when they happen, while static sites tend to struggle under an unmaintainable number of smaller errors that don’t form an easily-identifiable pattern. These sites are also only effectively static. The site structure and file structure is static, there is no content or product database and no third-party CMS, but there are some dynamic elements in each file that seem to be causing a significant minority of issues.
Another common issue is a partial implementation of incomplete REST-like architectural styles on dynamic eCommerce sites. This is because of the representational aspect of REST architectural styles – it is an issue that can be avoided with canonicalisation or careful site architecture, but it is something to keep in mind. REST has been more-or-less codified relatively recently, so it also makes sense that there would be associated SEO issues when some of the ideas from RESTful best practices are used in combination with other ways of doing things.
The specific content management system that we have found had the most technical SEO issues was Magento 1 (excluding custom CMS), and we also saw significant over-representation from sites using ASP.NET.
Our key takeaways are common knowledge throughout most of the SEO world, but we’re sharing our information in the hopes of starting a conversation, increasing the amount of sharing that goes on within SEO, and increasing the amount of data that can be used to inform SEO decisions.
Checking for soft 404s, duplication, and sitemap errors is very important.
We’d also suggest that you exercise caution with Magento 1 and ASP.NET sites, but more importantly that you avoid attempting to roll your own large website or CMS. Wherever possible, if you expect to eventually generate large amounts of content for your site, use a content management system.
To avoid security issues, at the very least keep your CMS and plugins up to date at all times. A compromised site can lose SERPs quickly, and it might not always recover. I suspect this is because the SERP collapse associated with a security compromise allows other sites to have a shot at the big-time. If the social signals are good, and people start to link to the competing resource, you might find that your pages are permanently gone from the first page of results.
Crawl sites regularly, for sure, but remember that there’s a very real case to be made for manual experimentation and poking things until they break in interesting ways, even if the errors aren’t always as important.
Get In Touch!
If you’ve got more data on technical SEO errors, criticisms of this post (constructive or otherwise), or questions about anything I’ve written, get in touch.
Technical SEO is one of our 4 pillars of SEO, without good health your website will struggle to rank. Read more about our Technical SEO services.
Insights & inspiration
Digital marketing expertise delivered direct to your inbox.