What is sitemap.xml ?

October 9, 2017 | Posted By Lakshay Anand

What is sitemap

A great XML Sitemap is a sitemap created for search engines. The XML Sitemap is all of the all the URLs on your site that you want search engines like google to examine and index. The Sitemap also provides information on when pages get up-to-date and how important they are really. Search engines do not guarantee they will totally abide by the sitemap, but search engines use XML Sitemaps for assistance in crawling the web.

Sitemaps are xml or code files that list away every single URL on your website, along with important meta data for every single URL that includes when it was last current, how relatively important it is within your website structure and how often you choose updates to it.

This document describes the XML schema for the Sitemap protocol.

The Sitemap protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.

The Sitemap must:

  • Begin with an opening <urlset> tag and end with a closing </urlset> tag.
  • Specify the namespace (protocol standard) within the <urlset> tag.
  • Include a <url> entry for each URL, as a parent XML tag.
  • Include a <loc> child entry for each <url> parent tag.

All other tags are optional. Support for these optional tags may vary among search engines. Refer to each search engine’s documentation for details.

Also, all URLs in a Sitemap must be from a single host, such as www.example.com or store.example.com.

Why Do Sitemaps Matter?

Sitemaps are a core part of an internet site and critical to locate engine optimization – xml sitemaps allows search motors to simply crawal website and index each webpage so that it shows up in search engine results. HTML sitemaps are also important, and are more geared towards human users – they help your website visitors more easily find this article they’re looking for on your website.

According to Google, there are a few specific reasons a client would benefit from a sitemap:

  • Their website is new with very few backlinks
  • Their website is very large
  • Their website content isn’t well-linked internally, making it difficult to navigate
  • Their website uses a lot of rich-media content

    XML Sitemaps Protocol and Derived Formats

    You can find many derived formats of the standard XML sitemaps protocol, most created by Google.

    If you are interested in creating XML sitemaps or any of its derived formats, check these tutorials:

    • Standard XML sitemap
    • Image sitemap
    • Video sitemap
    • Mobile sitemap
    • News sitemap

    Types of sitemap

    Besides the standard XML Sitemap, there is also a Sitemap index and four more specialized sitemaps (the code search sitemap is now basically useless since Google Code Search has been deprecated this yr. ) If you want to boost traffic to videos, images, your mobile site, or news articles, use specialized Sitemaps (Sitemap extensions).

    The 6 Types of Sitemaps:

    • Video
    • Images
    • Mobile
    • News
    • Sitemap Index if you have multiple sitemaps
    • Standard Sitemap

     

    Build Sitemap

    You could either allow generator go and do its thing or you can tweak adjustments to create the Sitemap that presents the engines just how you want your site crawled.

    • Sitemap tags
      Attribute Description <sitemapindex>requiredEncapsulates information about all of the Sitemaps in the file. <sitemap>requiredEncapsulates information about an individual Sitemap. <loc>requiredIdentifies the location of the Sitemap.This location can be a Sitemap, an Atom file, RSS file or a simple text file.

      <lastmod>optionalIdentifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in W3C Datetime format.

      By providing the last modification timestamp, you enable search engine crawlers to retrieve only a subset of the Sitemaps in the index i.e. a crawler may only retrieve Sitemaps that were modified since a certain date. This incremental Sitemap fetching mechanism allows for the rapid discovery of new URLs on very large sites.

    • Sitemaps segmentation — divvy up individual Sitemaps by type and by a structure that will best help you diagnose indexation shortcomings. Give them descriptive names as well.
    • Exclude URLs that should NOT be indexed
      • Exclude URLS disallowed in robots.txt (good time to make sure you’re disallowing the right urls)
      • Exclude URLs disallowed via meta noindex tags
      • Exclude duplicate URLS
      • Exclude private pages

    Uploading Sitemap:
    Once you run the sitemap, you will publish it to your internet site, ideally at the main listing like so: www.example.com/sitemap.xml. Theoretically, you don’t need to stick it at the main, but you will see some limitations.

    Limitations of Sitemap

    You can provide a simple text file that contains one URL per line. The text file must follow these guidelines:

    • The text file must have one URL per line. The URLs cannot contain embedded new lines.
    • You must fully specify URLs, including the http.
    • Each text file can contain a maximum of 50,000 URLs and must be no larger than 50MB (52,428,800 bytes). If you site includes more than 50,000 URLs, you can separate the list into multiple text files and add each one separately.
    • The text file must use UTF-8 encoding. You can specify this when you save the file (for instance, in Notepad, this is listed in the Encoding menu of the Save As dialog box).
    • The text file should contain no information other than the list of URLs.
    • The text file should contain no header or footer information.
    • If you would like, you may compress your Sitemap text file using gzip to reduce your bandwidth requirement.
    • You can name the text file anything you wish. Please check to make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs
    • You should upload the text file to the highest-level directory you want search engines to crawl and make sure that you don’t list URLs in the text file that are located in a higher-level directory.