What is a website crawl budget and how it affects SEO

What is a crawl budget?

What is Crawl budget

Crawl budget is a page indexing budget. In simpler terms, this concept defines the number of pages that Google's robot can index during a single visit. We can say that it is the level of interest of Google robots in your site, which is determined by the ability to index the page. This, in turn, affects the regularity of display in search results.

In this article, we will discuss the concepts of crawl rate limit and crawl demand. We will answer the question of how to optimize a site to take care of the indexing budget which, in the case of larger sites, can have an impact on visibility and better reach.

‍

Table of Contents:

what is crawl rate limit?
what is crawl demand
why take care of crawl budget?
how to optimize the site for the indexing budget
Is crawl budget a ranking factor?

‍

How does the robot find the pages?

‍

At the moment, there are more than 1.5 billion websites on the Web. How do robots reach your site specifically? The process takes place in several stages.

Scanning - robots scour the web to detect all data. Most often, scanning starts with the eagerly visited sites to end with the least popular ones.

Indexing - robots recognize the subject matter of the page. At this stage they check whether the content is unique, whether the author has posted duplicates. In addition, the content is grouped based on relevance.

Displaying results - based on user queries, the search engine displays indexed content.

Rendering - can be said to be the process of "seeing" (not reading) the content by Google. This element is also important for crawl budget.

‍

How crawling works

‍

Googlebots crawl your site first so that they can index the content later. This ensures that the posted content is in Google's index. However, in order for the search engine to find all the content belonging to you, sitemaps will be useful, which will make the bots' job much easier and give you the expected traffic.

Remember, however, that the crawling process happens very quickly on sites with a relatively small number of subpages. It is completely different for extensive sites. Then you need to prioritize and think about which content to crawl. How to do it? You will find out in a moment!

‍
‍

‍

What is crawl rate limit?

Google robots have certain resources to use when scanning websites on a daily basis. Googlebot tries to index the optimal number of subpages, but in order to avoid a situation in which a website is scanned by robots too intensively, a crawl limit parameter has been introduced.

‍

Indexing limit

‍

Crawl Rate limit is the number of simultaneous connections that can occur when crawling a site. Google tries to index the optimal number of pages, so it adjusts this element according to the performance of the site or server.

‍

The limit is designed to balance the indexation process and stop Google from over-indexing the site which overloads the server on which the site is located. By the same token, it can be said that thanks to the rate limit there will be no situation in which when the robot crawls the site, it will work slower for the user.

‍

In this way, the indexing process is balanced, which reduces the risk of overloading the server on which the site is hosted. The purpose of setting a crawl limit is to optimize the speed of the site.

What does the crawl rate limit depend on?

This parameter depends primarily on the speed of the site. If the page is slow or the server response time is long, it significantly prolongs the process of indexing the page. In the case when our site loads quickly, has a site map, internal linking leading to subsequent pages there is a good chance that googlebot will index most of the subpages during one visit to the site. Thanks to these elements, the robot can index all subpages in just one visit.

‍

Where and how to check the speed of your site?

We can check the speed of the website with online programs:

https://developers.google.com/speed/pagespeed/insights/

https://gtmetrix.com

They will give us a full picture of html code construction, page loading time and server response, which not once can unpleasantly surprise because the value can be more than 1 second.

Where to set the rate limit ?

‍

The parameter can be set in Google Search Console, however, there is no guarantee of improved indexation.

‍

What is crawl demand?

‍

The Crawl Demand parameter determines the frequency of indexation. This one will be higher for popular and more frequently updated sites. In simpler terms, sites that are readily visited by users are also attractive to the robot that does the indexing.

‍

What does crawl demand depend on?

‍

It can be said that Google "sees" how much is going on around your site. So it is worthwhile to ensure its popularity. Another factor is to update your content frequently. Robots are more likely to reach fresh content to display useful information to users. Keep in mind, however, that your strategy for adding content to your site must be well thought out.

‍

Remember

Frequent addition of low-value content will not increase Crawl budget at all. On the contrary.

How to check the indexation rate of our site in Google?

‍

jak sprawdzić stopień indeksacji domeny w Google — site:domain.pl command to check the indexation of the site in Google

To find out to what extent the site is indexed in Google, just enter the parameter site:nazwadomeny.pl. After validation, we will see the search results within our domain, i.e. the home page and all the sub-pages that Google 'sees'.

It is worth reviewing the results manually to see if there are any indexed sub-pages that should not be in the index, e.g.: filter sub-pages, registration, login, working sub-pages, or other sub-pages with 'sticky' characters duplicating our main target sub-pages. It is then worth listing such subpages and reporting them for removal.

Removing subpages from the index

Removing unwanted subpages from the index can be done by setting appropriate redirects on them :

301 redirect

This redirect will take the user to the subpage we want them to ultimately go to from the erroneous subpage

Redirection 410

This redirect will give information to google to remove the subpage from search results.

‍

Why take care of the crawl budget?

Simple URL structure, no nesting of subpages above level 3, and consequently fast indexation, will make you enjoy the functionality of the site and the possibility of further expansion for years to come.

However, if Google does not index your site, it will not appear in search results, and the consequences can be downright disastrous. Improper management of a large e-commerce site with a huge number of subpages can result in Google simply not reaching them. This will negatively translate into conversions.

So you need to pay attention to how many subpages you add and how many redirects the site's budget crawl consumes. Remember that even a seemingly small site can include thousands of links.

‍

How to optimize the site for the indexing budget?

When optimizing a crawl budget, analysis in Google Search Console ( see Google Search Console tutorial ) will be essential. Just visit the Status tab to get an indication of whether the site has errors. It's also worth checking for issues with the sitemap, duplicate subpages, subpages with redirects or alternative pages that contain the correct canonical page tag.

After opening the report, GSC will show us the potential problems on the site and what statuses of the pages returned in the panel. This will give us information about potential problems on the site.

‍

optymalizacja strony pod budżet indeksowania — page indexing statistics in GSC

What to look out for:

a large number of 5xx errors - this is an indication that there are problems with the handling of requests on the server side,
site map issues,
alternative pages containing the correct canonical page tag,
duplicate subpages,
sub-pages that were not found, i.e. substory 404,
sub-pages containing redirects,

‍

analiza google search console pod względem crawl budget — subpage status errors

It is worth taking care of the speed of the site and the correct structure, i.e. information architecture, as this will positively affect the process of its indexation and subsequent seamless expansion with further sub-pages.

In order to properly optimize a page, it is necessary to pay attention to several technical aspects of the site.

‍

Google robot activity statistics

‍

If you want to see the data, regarding the activity of the robot, Google Search Console will come to your aid. The Indexing Statistics section includes several useful items with which you will determine the crawl rate.

Number of pages indexed per day - determines how many URLs in the domain have been visited by the robot. Sudden drops in the graph may mean that the site is struggling with problems that should be detected and eliminated as soon as possible. Constant monitoring of this parameter will definitely be a good practice.

Number of kilobytes of data downloaded per day - the graph shows how much data the robot downloaded during the visit. This element is related to the amount of data on your site. However, high values should not worry you if page speed is maintained at an optimal level.

Time spent downloading the page - this parameter should always be kept as low as possible. Remember that 51% of online store users abandon their shopping carts if the site takes too long to load.

‍

Constant observation of the above indicators will allow you to respond to many crawl budget problems.

‍

How do you take care of optimizing your indexing budget?

‍

Duplication of content and short in the content of the sub-page

If the same content is repeated across several pages, or if the amount of repeated content is significant, our site may be weakened in terms of quality. Google likes unique content that exhausts the topic.

‍

Sub-pages short in content generate so-called 'thin content', which gives information that the sub-page does not exhaust a given topic and thus should not be rewarded in search results. On the other hand, sub-pages with repeated content are considered the same among themselves and there may be competition between them to appear in search results or lower the quality of the entire site.

404 errors

‍

They appear when the server returns a 200 code on a non-existent page instead of a 404 code. This can significantly reduce the crawl budget, although the problem is easy to monitor. Just find the indexing errors tab in Google Search Console

Facet navigation

‍

Sub-pages generated by selecting parameters from, for example, sidebar filters on online stores can generate a very large number of url's thus creating duplicates within the site and significantly burdening site crawling. If these subpages do not generate traffic and represent larger volumes it is worth excluding them from indexing by introducing the 'noindex' parameter on these types of subpages

Internal linking

A helpful technique to help index a site is to link subpages among themselves using links in the content. Internal linking also helps with search engine positioning of subpages. By placing a key phrase naturally in the content of an article in the form of a link leading to a sub-page instead of the typical 'see' or 'more', we positively influence the building and transfer of power from one sub-page to another on specific keywords by indicating to google a suggested keyword that acquires value.

Exclusion of subpage indexing

‍

For subpages diagnosed as unnecessary for indexation, consider using the 'noindex' meta tag. This is a signal to Google not to index these subpages. Using this parameter, we can save the crawl budget for valuable subpages.

Add a sitemap to Google Search Console

‍

We can help the robot find all the subpages by reporting in the Google panel a site map with a list of all the subpages available for indexation.

‍

‍Beware of content cannibalization

‍

The use of the same key phrases and similar sub-page concretions for two or more sub-pages will certainly not help your SEO, and will even make Google have trouble showing sub-pages for certain key phrases.

‍

Avoid hacking attacks

‍

Hacking attacks also lower the chance of being indexed. So it's worth keeping your site secure.

‍

Server logs

‍

By analyzing the logs you can see how the robot moved around your site. It is best to analyze the last month, although for large sites the ideal range is two weeks.

‍

‍Check robots.txt

‍

It is worth checking which blocked addresses are crawled and removing unnecessary rules.

‍

‍Check how many internal addresses are not canonical.

‍

The canonical tag today is very often overlooked by the search engine.

‍

Does crawling affect positioning?

Higher indexation rate of a page is very important however it is not a ranking factor, so it does not determine SEO. Remember, however, that crawling has a big impact on improving the health of the site and maximizing the budget for value pages. An SEO audit will be the tool to determine if SEO is going well.

‍

Does the nofollow parameter affect the crawl budget?

‍

A link marked as nofollow is not taken into account for indexation by Gooogle. It is also thanks to this parameter that we can control the flow of power on the site and appropriately transfer seo power through the link to specific subpages. An alternative parameter is a link marked as dofollow that transmits indexing power and seo power.

‍

How can I take care of the crawl budget?

The described optimization methods will help you significantly increase the crawl budget of the site. The factors are quite numerous, and every detail, even the smallest, can matter. The absolute basis is to eliminate errors and avoid duplication. Detailed analysis in Google Search Console, as well as paid tools will be extremely useful.

Does the robot index all my subpages?

You will check this by typing site:domain name in the search window. Then you will get information about the approximate number of indexed addresses. This will allow you to assess whether this number corresponds to the actual number of sub-pages.

What to pay attention to when creating a site map?

First of all, make sure that the addresses in the sitemap return a response code of 200. Also, avoid URL adrs that contain meta robots with "noindex", pagination pages and pages that are blocked with robots.txt file. Incorrect implementation of the map or its incorrect content can reduce the crawl budget.

Does the nofollow parameter affect the crawl budget?

The indexing robot does not take into account links marked as nofollow. This allows you to prioritize and transfer SEO power to other subpages.

How can I check the speed of the website?

For this purpose, just take a look at some online tools. You can check it out at https://developers.google.com/speed/pagespeed/insights/ or https://gtmetrix.com, for example.

Summary

It's hard not to notice how important a website budget crawl is. Meanwhile, many inexperienced webmasters or online store owners overlook this aspect. With proper data analysis and optimization, you will take care of the crawl rate limit and crawl demand to stay ahead of the competition. So there is no doubt that these elements have a significant impact on traffic and, consequently, on conversions.

‍