What is an XML sitemap?

An XML sitemap is a structured file, usually at yourdomain.com/sitemap.xml, that lists the URLs on your site you want search engines to discover. Each entry sits inside a urlset and contains a loc (the URL) and optionally a lastmod date. It does not guarantee indexing, but it helps crawlers find your important pages efficiently, which matters most for large sites, new sites, and pages that are not well linked internally.

What should I include in my sitemap?

Include only canonical, indexable URLs that return a 200 OK status and that you genuinely want in search results. Exclude pages marked noindex, pages blocked by robots.txt, redirected or error URLs, duplicate or parameter variants, and non-canonical versions. A sitemap is a curated list of your best, indexable pages, not a dump of every URL the server can return. Keeping it clean helps Google trust and use it.

How many URLs can a sitemap contain?

A single XML sitemap can contain up to 50,000 URLs and must be no larger than 50MB when uncompressed, according to the sitemaps protocol that Google follows. If your site exceeds either limit, split your URLs across multiple sitemap files and list those files in a sitemap index file, which itself can reference up to 50,000 sitemaps. You can also gzip sitemaps to reduce file size.

How do I submit a sitemap to Google?

Open Google Search Console, select your property, go to the Sitemaps report, enter your sitemap URL (such as sitemap.xml) and submit it. Google will fetch it and report how many URLs it discovered and any errors. You should also reference the sitemap in your robots.txt with a Sitemap line, so any crawler can find it automatically without manual submission.

Does a sitemap guarantee my pages get indexed?

No. A sitemap helps search engines discover and understand your URLs, but it does not force indexing. Google decides what to index based on quality, crawl budget, duplication and many other factors. A sitemap improves the odds that your important pages are found and considered, especially on large or poorly linked sites, but the page itself still has to earn its place in the index.

How to Create & Submit an XML Sitemap

If search engines cannot find your pages, they cannot rank them — and an XML sitemap is the simplest way to hand them a clean list of the pages you care about. In short: an XML sitemap is a file (typically at yourdomain.com/sitemap.xml) that lists your important, indexable URLs so crawlers can discover and prioritise them; you should include only canonical, indexable pages, keep each file within Google's limits of 50,000 URLs and 50MB uncompressed, and then submit it in Google Search Console and reference it in robots.txt. This guide covers what a sitemap is, the exact format, what to include and exclude, the size limits, how to generate one, and how to submit it.

It works hand in hand with your robots.txt file and matters for AI crawlers too.

What an XML sitemap is and why it helps

An XML sitemap is a machine-readable file that lists the URLs on your site you want search engines to know about. Think of it as a table of contents you hand to a crawler: rather than relying entirely on following links to discover your pages, the crawler can read the sitemap and learn about every important URL directly.

It matters most in a few specific situations:

Large sites, where deep pages might take a long time to be discovered by crawling alone.
New sites, with few external links pointing in to help discovery.
Poorly linked pages, that are not reachable through your normal navigation.
Sites with rich media or news, where specialised sitemaps add useful metadata.

It is important to be clear about what a sitemap does not do: it does not guarantee indexing. Google uses it as a discovery and prioritisation aid, but the decision to index any given page still depends on quality, duplication, crawl budget and many other factors. A sitemap improves the odds that your good pages are found and considered; it does not override Google's judgement.

The XML sitemap format

The format is defined by the sitemaps protocol (sitemaps.org), which Google, Bing and others support. A basic sitemap is an XML file whose root is a <urlset> element, containing one <url> entry per page. Each entry has a required <loc> (the full URL) and may include optional fields:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-05-01</lastmod>
  </url>
  <url>
    <loc>https://example.com/blog/</loc>
    <lastmod>2026-05-20</lastmod>
  </url>
</urlset>

The key fields are:

Field	Required?	What it means
`<loc>`	Yes	The full, canonical URL of the page
`<lastmod>`	Optional	The date the page last changed
`<changefreq>`	Optional	A hint at how often the page changes
`<priority>`	Optional	A relative priority hint (0.0–1.0)

A practical note from Google's own guidance: it largely ignores priority and changefreq, and treats lastmod as a useful signal only if it is accurate. So the field worth getting right is a truthful lastmod; do not bother fabricating priorities or change frequencies, and never set every page's lastmod to today, which destroys the signal's value.

What to include — and what to leave out

A good sitemap is curated, not exhaustive. The guiding principle is that it should list only the URLs you genuinely want indexed and that are actually indexable. Concretely:

Include:

Canonical URLs only (the version you want to rank).
Pages that return a 200 OK status.
Indexable pages — those not blocked by robots.txt or marked noindex.
Your genuinely important content: key pages, products, articles, categories.

Exclude:

noindex pages, which you are telling Google to keep out of the index anyway.
URLs blocked by robots.txt.
Redirecting (3xx) or error (4xx/5xx) URLs.
Duplicate or parameter variants, and non-canonical versions.
Thin, utility, or low-value pages you would not want as a search entry point.

Mixing in non-indexable or duplicate URLs sends Google mixed signals — you are simultaneously asking it to index a page and telling it elsewhere not to. Keeping the sitemap clean makes it more trustworthy and useful, and it makes the Sitemaps report in Search Console far easier to read when you are diagnosing coverage problems.

Size limits and sitemap index files

Sitemaps have hard limits you must respect. Per the sitemaps protocol that Google follows, a single sitemap file can contain at most 50,000 URLs and must be no larger than 50MB uncompressed. If your site exceeds either limit, you split your URLs across multiple sitemap files.

To manage multiple sitemaps, you use a sitemap index file — a special sitemap that lists other sitemaps rather than pages. It uses a <sitemapindex> root with <sitemap> entries:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2026-05-20</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-05-18</lastmod>
  </sitemap>
</sitemapindex>

A sitemap index can itself reference up to 50,000 sitemaps, which gives effectively unlimited capacity. Two practical tips: you can gzip sitemap files to reduce their size (the 50,000-URL and 50MB-uncompressed limits still apply to the uncompressed content), and splitting by content type (posts, products, categories) makes the Search Console reporting much more useful, because you can see indexing health per section.

How to generate a sitemap

You rarely need to write a sitemap by hand. The common routes are:

CMS or SEO plugin (most sites). Platforms like WordPress with an SEO plugin, or Shopify, Wix, Squarespace and most modern frameworks, generate and update an XML sitemap automatically and keep it current as you publish. This is the best option because the sitemap stays in sync with your content with no effort.
A crawler (screaming-frog-style tools). Desktop and cloud crawlers can crawl your site and export a sitemap, which is handy for static sites or for auditing what a crawler actually finds.
A build step. Static-site generators and frameworks often produce a sitemap at build time from your routes and content.
Online generators. For small static sites, a web-based generator can produce a one-off file.

Whatever you use, after generation, spot-check the output: confirm it contains canonical, indexable URLs, that it excludes the things it should, and that it validates as well-formed XML.

How to submit your sitemap to Google

Once your sitemap is live at a public URL, there are two complementary things to do:

Submit it in Google Search Console. Open your property, go to the Sitemaps report, enter the sitemap URL (for example, sitemap.xml or your index file), and submit. Google fetches it and reports how many URLs it discovered, when it last read the file, and any errors it encountered. This report is your primary tool for confirming the sitemap is healthy and being read.
Reference it in robots.txt. Add a line such as Sitemap: https://example.com/sitemap.xml to your robots.txt. This lets any crawler — not just Google — discover your sitemap automatically, including Bing and other engines. It is covered in the robots.txt guide.

Doing both means your sitemap is explicitly submitted to Google and passively discoverable by everyone else. You generally do not need to resubmit after every change — Google re-fetches submitted sitemaps periodically — but you can use the report to prompt a re-read after a major update.

Specialised sitemaps: image, video and news

Beyond the standard sitemap, there are extensions for specific content. Image and video sitemap fields let you provide extra metadata about media on your pages, which can help that media surface in image and video search. Google News has its own news sitemap format for publishers in Google News, with constraints like only including recent articles. Most sites do not need these — standard sitemaps cover the common case — but media-heavy sites and news publishers should know they exist and consider them where the extra metadata is worth providing.

What Google actually does with your sitemap

It helps to understand how Google treats a sitemap, because it shapes what you should and should not expect from it. When Google fetches your sitemap, it uses the file for discovery and scheduling: the URLs become candidates for crawling, and an accurate lastmod can influence when Google recrawls a page that has changed. What the sitemap does not do is override Google's own assessment of a page. Google still decides whether each URL is worth crawling and indexing based on quality, duplication, internal linking and crawl budget — the sitemap simply makes sure the URL is on Google's radar in the first place.

This distinction explains a frequent source of confusion. Site owners sometimes assume that "submitted" in the Sitemaps report means "indexed," and panic when the indexed count is lower than the submitted count. In reality, a gap between submitted and indexed URLs is normal and expected; it reflects Google's selectivity, not a sitemap failure. The sitemap has done its job once Google has discovered the URLs. Whether those URLs then earn a place in the index is a separate question answered by the Pages (Index Coverage) report, where Google explains why specific URLs are or are not indexed. Reading the two reports together — Sitemaps for discovery, Pages for indexing outcomes — is how you diagnose coverage properly rather than chasing a number that was never meant to match.

It is also worth knowing that Google caches your sitemap and refetches it on its own schedule rather than every time it crawls. That means changes you make to the sitemap are not picked up instantly, and there is rarely any need to resubmit after every small edit. For an established site with an auto-updating sitemap, the right posture is to let the system run and only intervene when the Sitemaps report shows an actual error, such as a fetch failure or a parsing problem.

Troubleshooting the Sitemaps report in Search Console

The Sitemaps report in Google Search Console is the place to confirm your sitemap is healthy, and a handful of statuses cover most situations:

Couldn't fetch / fetch error. Google could not retrieve the file. Check that the sitemap URL returns a 200 status, is served as XML, and is not blocked by robots.txt or behind authentication.
Has errors. The file was fetched but contains problems — malformed XML, URLs on a different domain than the sitemap, or entries that break the protocol. The report names the issue and the offending lines.
Success, but fewer URLs discovered than expected. Often a sign that the generator is omitting pages, or that many URLs were skipped because they pointed to non-canonical or blocked locations.
Success, with a discovered count. The healthy state: Google read the file and registered the URLs as candidates.

A practical troubleshooting flow is to first open the sitemap URL in a browser to confirm it loads as valid XML, then check it is referenced correctly and not disallowed in robots.txt, then submit or re-submit it in the report and read the result. If Google reports errors, fix the file at the source — usually the CMS, plugin or build step that generates it — rather than hand-editing the output, since a hand-edit will be overwritten on the next regeneration. For large sites split across many sitemaps, submitting the sitemap index lets you monitor each child sitemap's discovery and errors separately, which is far more diagnostic than one monolithic file.

Common mistakes to avoid

Including non-indexable URLs — noindexed, blocked, redirected or duplicate pages that send mixed signals.
Letting it go stale — a hand-maintained sitemap that no longer reflects the site; automate it instead.
Fake lastmod dates — setting everything to today, which makes the signal worthless.
Exceeding the limits silently — going past 50,000 URLs or 50MB without splitting into an index.
Forgetting robots.txt — not referencing the sitemap, so non-Google crawlers have to guess its location.
Treating it as an indexing guarantee — a sitemap aids discovery; it does not force pages into the index.

A quick checklist

The sitemap lives at a public URL and is valid XML.
It contains only canonical, indexable, 200-status URLs.
It excludes noindex, blocked, redirected and duplicate URLs.
Each file stays under 50,000 URLs and 50MB uncompressed.
Multiple sitemaps are organised under a sitemap index.
lastmod dates are accurate, not faked.
It is submitted in Google Search Console and referenced in robots.txt.

Go deeper

Control crawling: how to write a robots.txt file.
Make sure AI engines get in too: can AI crawlers access your site?
The bigger strategy: what is GEO?
Audit everything: how to check if your site is ready for AI search.

Want to check your sitemap, robots rules and crawlability in one pass? Analyse any URL with StackOptic — a full technical and AI-readiness report, free.

How to Create an XML Sitemap and Submit It to Google

What an XML sitemap is and why it helps

The XML sitemap format

What to include — and what to leave out

Size limits and sitemap index files

How to generate a sitemap

How to submit your sitemap to Google

Specialised sitemaps: image, video and news

What Google actually does with your sitemap

Troubleshooting the Sitemaps report in Search Console

Common mistakes to avoid

A quick checklist

Go deeper

Frequently asked questions

Analyse any website with StackOptic

Related articles

How to Write SEO-Friendly Content That Ranks

How to Optimize Images for SEO

How to Do an SEO Audit (Step by Step)