Crawl Errors

What Are Crawl Errors?

Crawling is the process wherein a search engine sends a bot or a “web crawler” to a website so that it can add the website and its crawlable pages to the search engine’s index of websites. Pages that aren’t in a search engine’s index won’t show up in that search engine’s results pages.

Crawl errors happen when these web crawlers (sometimes called spiders) fail to access pages on your website.
There are different kinds of crawl errors with varying ranges of severity, but it’s in your best interest to fix these errors ASAP. If your pages aren’t showing up in the SERPs, that’s a lot of traffic and business opportunities you’re missing out on.

Crawl errors are categorised into two types: site errors and URL errors.

Site Errors

Crawl errors that affect your entire website are considered as site errors. Having a site error is an urgent problem, as that means search engines can’t crawl any part of your site.

Site errors are generally caused by the following reasons:

1. DNS Errors

Search engine crawlers have to connect to your site’s DNS server before they can access your site. A failure to make that connection results in DNS errors. These errors can be further subcategorised into the following:

  • DNS timeout — Your DNS server didn’t respond to a search engine’s crawl request fast enough.
  • DNS lookup — Your DNS server couldn’t find your domain name after responding to the search engine’s crawl request.

These kinds of errors are usually temporary, but if they don’t go away by themselves, you should contact your DNS provider.

2. Server Errors

If search engine bots can connect to your website’s DNS server and find its domain name but still can’t load any of its pages, it’s most likely a server error that’s causing the issue.

There are plenty of reasons why a server error happens, but the most common one is that your site’s server takes too long to respond to a search engine crawler’s request. Way too long response times usually means your site is getting way too much traffic for the server to handle or there are problems in your site’s code.

3. Robots.txt failure

A robots.txt file tells search engine crawlers what pages in a website to avoid indexing. If crawlers detect the presence of a robots.txt file, they will check it first before doing anything else. If the robots.txt file can’t be loaded, the entire site won’t be indexed.

If you are using a robots.txt file, make sure it is properly configured to prevent this problem. It is actually better to have no robots.txt file at all if you are okay with search engines crawling each page on your site, just to avoid the issue altogether.

That said, check your robots.txt for any flaws if you really have to use one.

URL Errors

These are crawl errors that are restricted to specific pages on your site are URL errors. There are four types of URL errors you’re likely to run into:

1. Soft 404s

Search engines treat pages as soft 404s when they load properly but are mostly empty. This is because such pages technically work, returning a “200 HTTP status,” but they ultimately provide no value to users like pages that actually return the 404 not found error code.

You can have a custom 404 page that indicates to users that the page they’re looking for doesn’t exist. But if it isn’t returning a 404 code, it can still be tagged as a soft 404 error that needs fixing.

2. Page Not Found

Pages that search engine bots try to crawl on your site but don’t or no longer exist are actual 404s or “page not found” errors.

This is only a problem if it’s an important page that is returning a 404 error code, such as a page that users expect to exist, has a lot of quality links, and gets a ton of traffic.

3. Access Denied

An access denied error occurs when a search engine bot is prevented from crawling a particular page.

Like with 404s, you don’t have to worry about pages that have this issue if you don’t want them crawled and indexed. If it’s a page that you need to have indexed, try out the following fixes:

  • If page requires log-in details, remove that requirement
  • Remove the page from the site’s robots.txt file
  • Ask your hosting provider if they are blocking search engines from crawling your site

4. Not Followed

When a search engine crawler can’t follow a URL all the way through, that probably means there’s an issue with your site using Flash or JavaScript or your redirects.

If you get a not followed error with a page that isn’t a high priority, you should still look into it. It’s indicative of coding or site structure wrinkles that you want to smooth out.

5. URL-specific DNS and Server Errors

DNS and server errors as mentioned above can be restricted to certain URLs without affecting the entire site.

Crawl Error Considerations

Not all crawl errors are worth stressing over, but there are some that require your immediate attention. Take the time to regularly check for such issues so you can learn to discern which errors to pour effort into fixing.