Crawl budget is a page indexing budget. In simpler terms, this concept defines the number of pages that Google's robot can index during a single visit. We can say that it is the level of interest of Google robots in your site, which is determined by the ability to index the page. This, in turn, affects the regularity of display in search results.
In this article, we will discuss the concepts of crawl rate limit and crawl demand. We will answer the question of how to optimize a site to take care of the indexing budget which, in the case of larger sites, can have an impact on visibility and better reach.
Table of Contents:
At the moment, there are more than 1.5 billion websites on the Web. How do robots reach your site specifically? The process takes place in several stages.
Googlebots crawl your site first so that they can index the content later. This ensures that the posted content is in Google's index. However, in order for the search engine to find all the content belonging to you, sitemaps will be useful, which will make the bots' job much easier and give you the expected traffic.
Remember, however, that the crawling process happens very quickly on sites with a relatively small number of subpages. It is completely different for extensive sites. Then you need to prioritize and think about which content to crawl. How to do it? You will find out in a moment!
Google robots have certain resources to use when scanning websites on a daily basis. Googlebot tries to index the optimal number of subpages, but in order to avoid a situation in which a website is scanned by robots too intensively, a crawl limit parameter has been introduced.
Indexing limit
Crawl Rate limit is the number of simultaneous connections that can occur when crawling a site. Google tries to index the optimal number of pages, so it adjusts this element according to the performance of the site or server.
The limit is designed to balance the indexation process and stop Google from over-indexing the site which overloads the server on which the site is located. By the same token, it can be said that thanks to the rate limit there will be no situation in which when the robot crawls the site, it will work slower for the user.
In this way, the indexing process is balanced, which reduces the risk of overloading the server on which the site is hosted. The purpose of setting a crawl limit is to optimize the speed of the site.
This parameter depends primarily on the speed of the site. If the page is slow or the server response time is long, it significantly prolongs the process of indexing the page. In the case when our site loads quickly, has a site map, internal linking leading to subsequent pages there is a good chance that googlebot will index most of the subpages during one visit to the site. Thanks to these elements, the robot can index all subpages in just one visit.
Where and how to check the speed of your site?
We can check the speed of the website with online programs:
https://developers.google.com/speed/pagespeed/insights/
https://gtmetrix.com
They will give us a full picture of html code construction, page loading time and server response, which not once can unpleasantly surprise because the value can be more than 1 second.
The parameter can be set in Google Search Console, however, there is no guarantee of improved indexation.
The Crawl Demand parameter determines the frequency of indexation. This one will be higher for popular and more frequently updated sites. In simpler terms, sites that are readily visited by users are also attractive to the robot that does the indexing.
What does crawl demand depend on?
It can be said that Google "sees" how much is going on around your site. So it is worthwhile to ensure its popularity. Another factor is to update your content frequently. Robots are more likely to reach fresh content to display useful information to users. Keep in mind, however, that your strategy for adding content to your site must be well thought out.
Remember
Frequent addition of low-value content will not increase Crawl budget at all. On the contrary.
To find out to what extent the site is indexed in Google, just enter the parameter site:nazwadomeny.pl. After validation, we will see the search results within our domain, i.e. the home page and all the sub-pages that Google 'sees'.
It is worth reviewing the results manually to see if there are any indexed sub-pages that should not be in the index, e.g.: filter sub-pages, registration, login, working sub-pages, or other sub-pages with 'sticky' characters duplicating our main target sub-pages. It is then worth listing such subpages and reporting them for removal.
Removing subpages from the index
Removing unwanted subpages from the index can be done by setting appropriate redirects on them :
301 redirect
This redirect will take the user to the subpage we want them to ultimately go to from the erroneous subpage
Redirection 410
This redirect will give information to google to remove the subpage from search results.
Simple URL structure, no nesting of subpages above level 3, and consequently fast indexation, will make you enjoy the functionality of the site and the possibility of further expansion for years to come.
However, if Google does not index your site, it will not appear in search results, and the consequences can be downright disastrous. Improper management of a large e-commerce site with a huge number of subpages can result in Google simply not reaching them. This will negatively translate into conversions.
So you need to pay attention to how many subpages you add and how many redirects the site's budget crawl consumes. Remember that even a seemingly small site can include thousands of links.
When optimizing a crawl budget, analysis in Google Search Console ( see Google Search Console tutorial ) will be essential. Just visit the Status tab to get an indication of whether the site has errors. It's also worth checking for issues with the sitemap, duplicate subpages, subpages with redirects or alternative pages that contain the correct canonical page tag.
After opening the report, GSC will show us the potential problems on the site and what statuses of the pages returned in the panel. This will give us information about potential problems on the site.
What to look out for:
It is worth taking care of the speed of the site and the correct structure, i.e. information architecture, as this will positively affect the process of its indexation and subsequent seamless expansion with further sub-pages.
In order to properly optimize a page, it is necessary to pay attention to several technical aspects of the site.
If you want to see the data, regarding the activity of the robot, Google Search Console will come to your aid. The Indexing Statistics section includes several useful items with which you will determine the crawl rate.
Constant observation of the above indicators will allow you to respond to many crawl budget problems.
Duplication of content and short in the content of the sub-page
If the same content is repeated across several pages, or if the amount of repeated content is significant, our site may be weakened in terms of quality. Google likes unique content that exhausts the topic.
Sub-pages short in content generate so-called 'thin content', which gives information that the sub-page does not exhaust a given topic and thus should not be rewarded in search results. On the other hand, sub-pages with repeated content are considered the same among themselves and there may be competition between them to appear in search results or lower the quality of the entire site.
404 errors
They appear when the server returns a 200 code on a non-existent page instead of a 404 code. This can significantly reduce the crawl budget, although the problem is easy to monitor. Just find the indexing errors tab in Google Search Console
Facet navigation
Sub-pages generated by selecting parameters from, for example, sidebar filters on online stores can generate a very large number of url's thus creating duplicates within the site and significantly burdening site crawling. If these subpages do not generate traffic and represent larger volumes it is worth excluding them from indexing by introducing the 'noindex' parameter on these types of subpages
Internal linking
A helpful technique to help index a site is to link subpages among themselves using links in the content. Internal linking also helps with search engine positioning of subpages. By placing a key phrase naturally in the content of an article in the form of a link leading to a sub-page instead of the typical 'see' or 'more', we positively influence the building and transfer of power from one sub-page to another on specific keywords by indicating to google a suggested keyword that acquires value.
Exclusion of subpage indexing
For subpages diagnosed as unnecessary for indexation, consider using the 'noindex' meta tag. This is a signal to Google not to index these subpages. Using this parameter, we can save the crawl budget for valuable subpages.
Add a sitemap to Google Search Console
We can help the robot find all the subpages by reporting in the Google panel a site map with a list of all the subpages available for indexation.
Beware of content cannibalization
The use of the same key phrases and similar sub-page concretions for two or more sub-pages will certainly not help your SEO, and will even make Google have trouble showing sub-pages for certain key phrases.
Avoid hacking attacks
Hacking attacks also lower the chance of being indexed. So it's worth keeping your site secure.
Server logs
By analyzing the logs you can see how the robot moved around your site. It is best to analyze the last month, although for large sites the ideal range is two weeks.
Check robots.txt
It is worth checking which blocked addresses are crawled and removing unnecessary rules.
Check how many internal addresses are not canonical.
The canonical tag today is very often overlooked by the search engine.
Higher indexation rate of a page is very important however it is not a ranking factor, so it does not determine SEO. Remember, however, that crawling has a big impact on improving the health of the site and maximizing the budget for value pages. An SEO audit will be the tool to determine if SEO is going well.
A link marked as nofollow is not taken into account for indexation by Gooogle. It is also thanks to this parameter that we can control the flow of power on the site and appropriately transfer seo power through the link to specific subpages. An alternative parameter is a link marked as dofollow that transmits indexing power and seo power.
The described optimization methods will help you significantly increase the crawl budget of the site. The factors are quite numerous, and every detail, even the smallest, can matter. The absolute basis is to eliminate errors and avoid duplication. Detailed analysis in Google Search Console, as well as paid tools will be extremely useful.
You will check this by typing site:domain name in the search window. Then you will get information about the approximate number of indexed addresses. This will allow you to assess whether this number corresponds to the actual number of sub-pages.
First of all, make sure that the addresses in the sitemap return a response code of 200. Also, avoid URL adrs that contain meta robots with "noindex", pagination pages and pages that are blocked with robots.txt file. Incorrect implementation of the map or its incorrect content can reduce the crawl budget.
The indexing robot does not take into account links marked as nofollow. This allows you to prioritize and transfer SEO power to other subpages.
For this purpose, just take a look at some online tools. You can check it out at https://developers.google.com/speed/pagespeed/insights/ or https://gtmetrix.com, for example.
It's hard not to notice how important a website budget crawl is. Meanwhile, many inexperienced webmasters or online store owners overlook this aspect. With proper data analysis and optimization, you will take care of the crawl rate limit and crawl demand to stay ahead of the competition. So there is no doubt that these elements have a significant impact on traffic and, consequently, on conversions.