May 14, 2026 · 4 min read

Technical SEO: fixing crawling and indexing problems

You can write the best content in your industry, but if Google can't crawl it or decides not to index it, as far as the search engine is concerned it doesn't exist. Crawling and indexing problems are the least visible part of technical SEO and one of the most frequent causes of traffic that never arrives: pages published months ago that don't appear in the SERPs, entire sections of the site ignored, duplicate versions stealing positions from each other. The good news is that they can be diagnosed methodically.

Crawling and indexing are not the same thing

It's worth distinguishing the two steps, because the remedies are different. Crawling is the scan: Googlebot visits your pages by following links and sitemaps. Indexing is the subsequent choice: Google decides whether to include the crawled page in its index. A page can be crawled and not indexed, for example because Google considers it duplicated or of little value; or it may never be crawled because no link reaches it or because robots.txt blocks it. Figuring out at which of these two points your site gets stuck is the first question of every diagnosis.

Diagnosis starts with Search Console

Google Search Console is free and contains almost everything you need. The things to look at, in order:

the page indexing report, which lists how many pages are indexed and, above all, the reasons for exclusion: blocked by robots.txt, with a noindex tag, duplicate without user-selected canonical, crawled but not indexed;
the URL inspection tool, to check a single page: when it was crawled, which canonical version Google chose, whether it's indexable;
the crawl stats, useful on large sites to understand how Googlebot spends its time;
the sitemaps report, to verify they are being read and error-free.

Faced with a drop or with pages that have disappeared, the exclusion reasons report is almost always where the cause becomes visible.

Robots.txt and sitemap: the two files to keep in order

The robots.txt tells crawlers what not to crawl. The classic mistakes: blocking CSS and JavaScript resources needed to render the page, leaving a blanket Disallow forgotten from a test environment, or using robots.txt to hide pages from the index, which it doesn't do, because a blocked page can stay indexed if it receives links; to exclude a page you need noindex. The XML sitemap is the list of pages you want crawled: it must contain only canonical URLs, reachable and with a valid response, and update itself when you publish. A sitemap full of redirects, errors and excluded pages sends Google contradictory signals.

Canonicals and duplicate content

The canonical tag tells Google which version of a page to consider official when variants exist: with and without parameters, different sort orders of a category, print versions. eCommerce is the typical breeding ground for duplicates: filters and faceted navigation can generate thousands of nearly identical URLs that dilute crawling. The basic rules: every indexable page declares its own canonical; variants point to the main version; the canonical points to an indexable page, not a blocked or redirected one. And remember that for Google the canonical is a hint: if the signals are inconsistent, it chooses on its own, and not always the way you'd want.

The mistakes we find most often in audits

In the technical checks we run on clients' sites, the recurring culprits are few: the noindex forgotten after a redesign or migration, which switches off entire sections; redirect chains accumulated over the years, which waste crawling and dilute signals; orphan pages, published but linked from nowhere, which Google struggles to find; pagination and filters without rules, which multiply URLs; and staging environments that end up in the index because they lack protection. Each of these has a standard fix: the difficulty is noticing them, which is why a periodic check of Search Console is worth more than any emergency intervention.

Get your site's foundations checked

If you have pages that don't appear in the SERPs, a drop you can't explain, or an eCommerce with thousands of URLs to govern, a technical audit lines up causes and priorities. With our websites and eCommerce service we build technically sound sites and fix the crawling and indexing of existing ones. Book a free call: we'll look at your Search Console together and tell you where to intervene.

Technical SEO: fixing crawling and indexing problems

Crawling and indexing are not the same thing

Diagnosis starts with Search Console

Robots.txt and sitemap: the two files to keep in order

Canonicals and duplicate content

The mistakes we find most often in audits

Get your site's foundations checked

Related articles

Editing AI-written texts: the checklist we use

Two-factor authentication on WordPress and PrestaShop: a quick guide

Claude Fable 5 and Mythos 5: what changes with the Claude 5 family