Google Index Coverage Complete Guide 2021
Here is how to Solve All Google Index Coverage Errors in Google Search Console
Please scroll down to for video tutorial in Hindi/Urdu languages of how to Solve ALL Google Index Coverage Errors in Google Search Console. براہ کرم گوگل سرچ کنسول میں گوگل کے تمام انڈیکس کوریج غلطیوں کو حل کرنے کا طریقہ کے ہندی / اردو زبانوں میں ویڈیو سبق حاصل کرنے کے لئے نیچے جائیں۔
How Does Google Search Work?
Before we dive into how to understand and solve Google Index Coverage Errors in Google Search Console (GSC), first we quickly look into how Google Search works. For that, we need to learn how Google discovers, crawls, and ranks web pages.
What Google does, it basically gets information from many different sources, including Web pages, User-submitted content such as Google My Business and Maps user submissions, Book scanning, Public databases on the Internet, and many other sources.
Google Three Steps to Generate Results
Google follows three basic steps to generate results from web pages. These include crawling, indexing and ranking.
The first step is finding out what pages exist on the web. Google constantly search for new pages and add them to its list of known pages. This process of discovery is called crawling.
Some pages are known because Google has already crawled them before. Other pages are discovered when Google follows a link from a known page to a new page. Still, other pages are discovered when a website owner submits a list of pages (a sitemap) for Google to crawl.
Once a page is discovered, Google tries to understand what the page is about. This process is called indexing.
Google analyzes the content of the page, catalogues images and video files embedded on the page, and otherwise tries to understand the page. This information is stored in the Google index.
When a user types a query, Google tries to find the most relevant answer from its index based on many factors. Google tries to determine the highest quality answers, and factor in other considerations that will provide the best user experience and most appropriate answer, by considering things such as the user’s location, language, and device (desktop or phone). For example, searching for “mobile repair shops” would show different answers to a user in India than it would to a user in Pakistan.
Remember, Google doesn’t accept payments to Crawl, Index or Rank pages higher. If anyone tells you otherwise, they’re wrong.
Understanding Google Index Coverage
There are many aspects of the Google Coverage status report. We’ll look into it one by one. Keeping in consideration the following image, it will be much easier to understand things.
The Primary crawler value on the summary page (on top-right) shows the default user agent type that Google uses to crawl your site: Smartphone or Desktop, simulating a user on a mobile device or a desktop, respectively.
Google crawls all pages on your site using this primary crawler. Google may also crawl a subset of your pages using a secondary crawler which is the other user agent type.
For example, if the primary crawler for your site is Smartphone, the secondary crawler is Desktop and vice-versa. The purpose of a secondary crawl is to try to get more information about how your site behaves when visited by users on another device type.
URL discovery dropdown filter
Use the dropdown filter (on top-left) to filter index results by how Google discovered the URL. The following values are available:
All known pages
It is the default value. It shows all URLs discovered by Google through any means.
All submitted pages
It shows only pages submitted in a sitemap to this report or by sitemap ping.
Specific sitemap URL
It shows only URLs listed in a specific sitemap that was submitted using this report. This includes any URLs in nested sitemaps.
A URL is considered submitted by a sitemap even if it was also discovered through some other mechanism (for example, by organic crawling from another page).
Google index coverage Status
Each page can have one of the following status values: See above pic.
The page is not indexed. See the specific error type description below to learn more about the error and how to fix it. You should concentrate on these issues first. See the details of Coverage Errors below with the solutions.
Valid with Warning
The page is indexed but has an issue that you should be aware of. Or, the page was indexed, despite being blocked by robots.txt (Google always respects robots.txt, but this doesn’t help if someone else links to it).
Why pages are marked with Warning?
This is marked as a warning because we’re not sure if you intended to block the page from search results. If you do want to block this page, robots.txt is not the correct mechanism to avoid being indexed.
How to solve pages with Warning status?
To avoid being indexed you should either use ‘noindex’ or prohibit anonymous access to the page using auth. You can use the robots.txt tester to determine which rule is blocking this page. Because of the robots.txt, any snippet shown for the page will probably be sub-optimal. If you do not want to block this page, update your robots.txt file to unblock your page.
The page is not indexed, but Google thinks that it was your intention. (For example, you might have deliberately excluded it by a noindex directive, or it might be a duplicate of a canonical page that Google has already indexed on your site).
Why Pages are Excluded (not indexed)?
These pages are typically not indexed, and Google thinks that is appropriate. These pages are either duplicate of indexed pages or blocked from indexing by some mechanism on your site, or otherwise not indexed for a reason that we think is not an error.
How to solve Excluded (not indexed) pages?
Excluded by ‘noindex’ tag
When Google tried to index the page it encountered a ‘noindex’ directive and therefore did not index it. If you do not want this page indexed, congratulations! If you do want this page to be indexed, you should remove that ‘noindex’ directive.
Blocked by page removal tool
The page is currently blocked by a URL removal request. If you are a verified site owner, you can use the URL removals tool to see who submitted a URL removal request.
Removal requests are only good for about 90 days after the removal date. After that period, Googlebot may go back and index the page even if you do not submit another index request. If you don’t want the page indexed, use ‘noindex’, require authorization for the page, or remove the page.
Blocked by robots.txt
This page was blocked to Googlebot with a robots.txt file. You can verify this using the robots.txt tester. Note that this does not mean that the page won’t be indexed through some other means.
If Google can find other information about this page without loading it, the page could still be indexed (though this is less common).
To ensure that a page is not indexed by Google, remove the robots.txt block and use a ‘noindex’ directive.
Blocked due to unauthorized request (401)
The page was blocked to Googlebot by a request for authorization (401 response).
If you do want Googlebot to be able to crawl this page, either remove authorization requirements or allow Googlebot to access your page.
An unspecified anomaly occurred when fetching this URL. This could mean a 4xx- or 5xx-level response code; try fetching the page using the URL Inspection (URL Inspection tool is located on the left side of the Google Search Console) tool to see if it encounters any fetch issues. The page was not indexed.
Crawled – currently not indexed
The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling.
Discovered – currently not indexed
The page was found by Google, but not crawled yet. Typically, Google tried to crawl the URL but the site was overloaded; therefore Google had to reschedule the crawl. This is why the last crawl date is empty on the report.
Alternate page with the proper canonical tag
This page is a duplicate of a page that Google recognizes as canonical. This page correctly points to the canonical page, so there is nothing for you to do.
Duplicate without user-selected canonical
This page has duplicates, none of which is marked canonical. We think this page is not the canonical one. You should explicitly mark the canonical for this page. Inspecting this URL should show the Google-selected canonical URL.
Duplicate, Google chose different canonical than user
This page is marked as canonical for a set of pages, but Google thinks another URL makes a better canonical. Google has indexed the page that we consider canonical rather than this one.
We recommend that you explicitly mark this page as a duplicate of the canonical URL. This page was discovered without an explicit crawl request. Inspecting this URL should show the Google-selected canonical URL.
Not found (404)
This page returned a 404 error when requested. Google discovered this URL without any explicit request or sitemap. Google might have discovered the URL as a link from another site, or possibly the page existed before and was deleted.
Googlebot will probably continue to try this URL for some period of time; there is no way to tell Googlebot to permanently forget a URL, although it will crawl it less and less often.
404 responses are not a problem, if intentional. If your page has moved, use a 301 redirect to the new location.
Page removed because of a legal complaint
The page was removed from the index because of a legal complaint.
Page with the redirect
The URL is a redirect and therefore was not added to the index.
The page request returns what we think is a soft 404 response. This means that it returns a user-friendly “not found” message without a corresponding 404 response code.
We recommend returning a 404 response code for truly “not found” pages, or adding more information to the page to let us know that it is not a soft 404.
Duplicate, submitted URL not selected as canonical
The URL is one of a set of duplicate URLs without an explicitly marked canonical page. You explicitly asked this URL to be indexed, but because it is a duplicate, and Google thinks that another URL is a better candidate for canonical, Google did not index this URL. Instead, we indexed the canonical that we selected. (Google only indexes the canonical in a set of duplicates.)
The difference between this status and “Google chose different canonical than user” is that here you have explicitly requested indexing. Inspecting this URL should show the Google-selected canonical URL.
Important Tip – You should not expect all URLs on your site to be indexed. Your goal is to get the canonical version of every page indexed.
Any duplicate or alternate pages will be labelled “Excluded” in this report. Duplicate or alternate pages have substantially the same content as the canonical page. Having a page marked duplicate or alternate is a good thing; it means that we’ve found the canonical page and indexed it. You can find the canonical for any URL by running the URL Inspection tool (URL Inspection tool is located on the left side of the Google Search Console).
Do you want to know about Duplicate Content? Its Causes, Identification and Solution, follow this link for a comprehensive guide about Duplicate Content with practical examples.
The page is indexed. It is further classified into two categories:
Submitted and indexed
You submitted the URL for indexing, and it was indexed.
Indexed, not submitted in sitemap
The URL was discovered by Google and indexed. We recommend submitting all important URLs using a sitemap.
Ideally, you should see a gradually increasing count of valid indexed pages as your site grows. When you add new content, it can take a few days for Google to index it. You can reduce the indexing lag by requesting indexing.
Google index coverage Details page
As shown above, you can click on a row in the summary page to open a details page for that status + reason combination. You can see details about the chosen issue by clicking Learn more on the details page. The graph on this page shows the count affected pages over time.
The table shows an example list of pages affected by the issue:
- You can open a URL.
- Inspect a URL.
- When you’ve fixed all instances of an error or warning, you can ask Google to Validate your fixes.
Google Index Coverage Errors
As stated earlier, by error it means that there is a problem in indexing some pages (it is different from excluded pages).
These are the Index Coverage Errors in Google Search Console:
Server error (5xx)
Your server returned a 500-level error when the page was requested.
How to fix Server errors?
There are a number of ways which you can fix server errors:
- Reduce excessive page loading for dynamic page requests. A site that delivers the same content for multiple URLs is considered to deliver content dynamically (for example,
www.example.com/shoes.php?color=red&size=7serves the same content as
www.example.com/shoes.php?size=7&color=red). Dynamic pages can take too long to respond, resulting in timeout issues. Or the server might return an overloaded status to ask Googlebot to crawl the site more slowly. In general, we recommend keeping parameter lists short and using them sparingly.
- Make sure your site’s hosting server is not down, overloaded, or misconfigured. If connection, timeout or response problems persist, check with your web hoster and consider increasing your site’s ability to handle the traffic.
- Check that you are not inadvertently blocking Google. You might be blocking Google due to a system-level issue, such as a DNS configuration issue, a misconfigured firewall or DoS protection system, or a content management system configuration. Though protection systems are an important part of good hosting and are often configured to automatically block unusually high levels of server requests. However, because Googlebot often makes more requests than a human user, it can trigger these protection systems, causing them to block Googlebot and prevent it from crawling your website. To fix such issues, identify which part of your website’s infrastructure is blocking Googlebot and remove the block. The firewall may not be under your control, so you may need to discuss this with your hosting provider.
- Control search engine site crawling and indexing wisely. Some webmasters intentionally prevent Googlebot from reaching their websites, perhaps using a firewall as described above. In these cases, usually, the intent is not to entirely block Googlebot, but to control how the site is crawled and indexed. If this applies to you, check the following: i) To control Googlebot’s crawling of your content, use a robots.txt file and configure URL parameters. ii) If you’re worried about rogue bots using the Googlebot user-agent, you can verify whether a crawler is actually Googlebot. iii) If you would like to change how frequently Googlebot crawls your site, you can request a change in Googlebot’s crawl rate. Hosting providers can verify ownership of their IP addresses to enable this.
Google experienced a redirect error of one of the following types: A redirect chain that was too long; a redirect loop; a redirect URL that eventually exceeded the max URL length; there was a bad or empty URL in the redirect chain.
Use a web debugging tool, such as Lighthouse, to get more details about the redirect.
Submitted URL blocked by robots.txt
You submitted this page for indexing, but the page is blocked by your site’s robots.txt file.
- Click the page in the Examples table to expand the tools side panel.
- Click Test robots.txt blocking to run the robots.txt tester for that URL. The tool should highlight the rule that is blocking that URL.
- Update your robots.txt file to remove or alter the rule, as appropriate.
- You can find the location of this file by clicking See live robots.txt on the robots.txt test tool.
Submitted URL marked ‘noindex’
You submitted this page for indexing, but the page has a ‘noindex’ directive either in a meta tag or HTTP header.
If you want this page to be indexed, you must remove the tag or HTTP header.
Use the URL Inspection tool in Google Search Console to confirm the error:
- Click the inspection icon next to the URL in the table.
- Under Coverage > Indexing > Indexing allowed? the report should show that noindex is preventing indexing.
- Confirm that the noindex tag still exists in the live version:
- Clicking Test live URL
- Under Availability > Indexing > Indexing allowed? see if the noindex directive is still detected. If noindex is no longer present, you can click Request Indexing to ask Google to try again to index the page. If noindex is still present, you must remove it in order for the page to be indexed.
Submitted URL seems to be a Soft 404
You submitted this page for indexing, but the server returned what seems to be a soft 404.
What is a Soft 404 URL?
A soft 404 is a URL that returns a page telling the user that the page does not exist and also a 200-level (success) code. In some cases, it might be a page with little or no content–for example, a sparsely populated or empty page.
Returning a success code, rather than 404/410 (not found) or 301 (moved), is a bad practice. A success code tells search engines that there’s a real page at that URL. As a result, the page may be listed in search results, and search engines will continue trying to crawl that non-existent URL instead of spending time crawling your real pages.
How to Solve Soft 404 URL?
- If your page is no longer available and has no clear replacement, it should return a 404 (not found) or 410 (Gone) response code. Either code clearly tells both browsers and search engines that the page doesn’t exist. You can also display a custom 404 page to the user, if appropriate: for example, a page containing list of your most popular pages, or a link to your home page.
- If your page has moved or has a clear replacement, return a 301 (permanent redirect) to redirect the user as appropriate.
- If you think that your page is incorrectly flagged as a soft 404, use the URL Inspection tool (URL Inspection tool is located on the left side of the Google Search Console) to examine the rendered content and the returned HTTP code. If the rendered page is blank or nearly blank, it could be that your page references many resources that can’t be loaded (images, scripts, and other non-textual elements), which can be interpreted as a soft 404. Reasons that resources can’t be loaded include blocked resources (blocked by robots.txt), having too many resources on a page, or slow loading/very large resources. The URL Inspection tool should list which resources could not be loaded, and also show you the rendered live page.
Submitted URL returns unauthorized request (401)
You submitted this page for indexing, but Google got a 401 (not authorized) response. Either remove authorization requirements for this page or else allow Googlebot to access your pages by verifying its identity.
You can verify this error by visiting the page in incognito mode.
Submitted URL not found (404)
You submitted a non-existent URL for indexing.
What is a 404 Error URL?
404 error URLs are URLs that you explicitly asked Google to index, but were not found. 404 excluded URLs are URLs that Google discovered through some other mechanism.
How to solve a 404 URL Error?
Here’s how you should handle 404 URL errors:
- Decide if it’s worth fixing. Many 404 errors are not worth fixing because 404s don’t harm your site’s indexing or ranking.
- If it is a submitted URL (an error), it is worth fixing.
- If it is a deleted page that has no replacement or equivalent, returning a 404 is the right thing to do. The report should stop showing the 404 after about a month.
- If it is a bad URL generated by a script, or that never have existed on your site, you probably don’t need to worry about it. It might bother you to see it on your report, but you don’t need to fix it unless the URL is a commonly misspelt link (see below). 404 errors should be dropped from the report after about a month.
- If the URL was submitted for indexing, (the status is Error),
- Inspect the URL to see where it was submitted from by clicking the submit icon next to the URL and look at the Discovery information. Update the sitemap as necessary.
- If the content has moved, add a redirect.
- If you have permanently deleted content without intending to replace it with newer, related content, let the old URL return a 404 or 410. Currently, Google treats 410s (Gone) the same as 404s (Not found). Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Such pages are called soft 404s, and can be confusing to both users and search engines.
- If the URL is unknown: You might occasionally see 404 errors for URLs that never existed on your site. These errors can occur when someone browses to a non-existent URL on your site – perhaps someone mistyped a URL in the browser, or someone mistyped a link URL. If it’s a very common error, you might create a redirect for it.
<a href="helloworld.pdf" onClick="_gaq.push(['_trackPageview','/download-helloworld']);"> Hello World PDF</a>When Googlebot sees this code, it might try to crawl the URL
http://www.example.com/download-helloworld, even though it’s not a real page. In this case, the link may appear as a 404 (Not Found) error in the Crawl Errors report. Google is working to prevent this type of crawl error. This error has no effect on the crawling or ranking of your site.
- Don’t create fake content, redirect to your homepage, or use robots.txt to block 404s—all of these things make it harder for us to recognize your site’s structure and process it properly. We call these soft 404 errors. (Once Google has successfully crawled a URL, it can try to crawl that URL forever. Issuing a 300-level redirect will delay the recrawl attempt, possibly for a very long time.) Submitting a URL removal request using the URL removal tool will not remove the error from this report.
Submitted URL has crawl issue
You submitted this page for indexing, and Google encountered an unspecified crawling error that doesn’t fall into any of the other reasons. Try debugging your page using the URL Inspection Tool.
[Video Tutorial] Here is how to Solve ALL Google Index Coverage Errors in Google Search Console in 2020 [Urdu/Hindi]
- [02:59] Skip Introduction (Beginners don’t Skip Intro)
- [03:04] How to access Google Search Console (GSC)?
- [03:47] Understanding Index and Coverage features in GSC
- [04:19] Primary and Secondary Crawler (in Google Index Coverage)
- [05:03] URL discovery dropdown filter (All known pages/All submitted pages/sitemap.xml)
- [05:22] How Google Search Works (Crawling/Indexing/Ranking)
- [08:54] Google Index Coverage Status (Error/Valid with Warning/Valid/Excluded)
- [10:20] What is meant by Coverage Error (in Google Index Coverage)?
- [11:03] What is meant by Valid with Warning?
- [11:52] What is meant by Valid in Google Index Coverage status?
- [12:34] What is meant by Excluded in GSC Coverage report?
- [13:44] Importance of understanding Google Index Coverage Errors
- [15:44] Types of Google Index Coverage Errors in GSC (Detailed Analysis)
- [17:48] Server error (5xx). How to fix Server errors?
- [24:31] Redirect error. How to solve Redirect Errors?
- [26:04] Importance of using Lighthouse – Tools for Web Developers
- [26:57] Submitted URL blocked by robots.txt. How to solve this error?
- [27:32] Robots.txt tester – how to use robotx.txt tester?
- [29:56] Submitted URL marked ‘noindex’. How to solve this error?
- [32:30] How to Test Live URL (URL Inspection in Google Search Console)? Adding manually/quickly Indexing a URL in Google?
- [35:43] Submitted URL seems to be a Soft 404. Difference b/w 404 and soft 404. How to solve soft 404 errors?
- [41:50] Submitted URL returns unauthorized request (401). How to solve 401 errors in GSC coverage?
- [42:38] Submitted URL not found (404). How to solve 404 errors in index coverage?
- [43:22] Submitted URL has a crawl issue. How to solve crawl issues?
5 thoughts on “Google Index Coverage Complete Guide 2021”
I need help. My google search console account is invaliding all my site posts and pages from valid section to excluded section. It is decreasing day by day. I had 5K pages in the valid section but it is only 1900. I am worried about it. winaster.com is my site URL. Can any expert help me here? What should I do to fix this problem?
There can be many reasons for this. Check what type of issue you’re facing?
can you please help me sir jee ma jab google search console ma apne post add karta hu to muja 5xx ka error a raha ha plz muja guide karna 5xx or page fetch ka errorr a raha ha muja batana agar ap muja ya solve kar do to ap ki maharbane ho gi
Can you please tell me how can I remove that noindex from my site, I checked the whole articles, using other plugin also but please help me.
This line due to this by post on WordPress site not showing up, just crawled and discovered but not showing.
I am using Yoast and while writing I have selected index and discoverable by search engine but still not showing so please help me how can I solve this issue…
Are you talking about a particular link/post/archive,etc? Please let me know. I’ll guide you accordingly.