What is an XML sitemap? An XML sitemap (sitemap.xml) is a text file in XML (Extensible Markup Language) format that contains a list of all ...
What is an XML sitemap?
An XML sitemap (sitemap.xml) is a text file in XML (Extensible Markup Language) format that contains a list of all the subpages of a website as a link. As such, it can be uploaded to Google Search Console or Bing Webmaster Tools to notify search engine crawlers of all available and relevant pages to speed up and optimize the indexing process.. XML sitemaps must meet the requirements of the Sitemap protocol, which was accepted as a standard by Google, Yahoo and Microsoft in 2006, with the aim of improving the quality of the search results provided over the long term. For this, UTF-8 encoding and XML markup language among others, as well as the use of entity codes for certain characters (such as "& gt" instead of ">"), are required.
Note
XML sitemaps are different from the sitemaps that many CMSs automatically display in the Front-End. It is the table of contents of the site, which aims to facilitate navigation for visitors. By default, sitemaps are not visible to users , although it is technically possible to make them accessible via a URL.
The advantages of an XML sitemap
While there is no guarantee that Google and other search engine indexing will be optimized due to the use of XML sitemaps, structured link directories increase the chances that this will be the case. The table of contents designed for the web crawler can also be profitable, especially for sites whose dynamic content is subject to constant change. The same applies to larger websites which have many subpages, but not (yet) a large backlink structure.. Sites like these tend to be monitored too irregularly for changes to be noticed or even not detected by search engine radars. With sitemap.xml, you can help them get noticed more quickly by crawlers .
An added benefit besides listing subpage urls: XML sitemaps can also list media files like videos or images. For these, there are even additional tags that tell the crawler what type of content is being used (eg <image>, <video>). In addition, attributes describing the content in more detail or specifying the duration or size can be used so that search engines can identify it optimally. There is also a special version of the XML sitemap for news portals, which promises optimal indexing of articles through specific attributes such as genre, date of publication or title.
advice
The effort of manually creating an XML sitemap, just to make sure your website has a structural directory, can be seen as a drawback. With XML sitemap generators like XML-Sitemaps.com online generator , there is no need to generate convenient XML sites on your own. Additionally, there are plugins for most content management systems that automatically create XML sitemaps.
Structure of an XML sitemap: the most important components
XML sitemap formatting works with XML tags, just like every document in the Extensible Markup Language. According to the current standard "Sitemaps 0.9", three tags are required for it to be considered an XML sitemap.
sitemap.xml : mandatory tags | |
<urlset>, </urlset> | Each XML sitemap file must start with an opening <urlset> tag and end with a closing </urlset> tag. The function of the tag is to summarize the file and link it to the current protocol standard. |
<url>, </url> | The opening and closing <url> tags are subordinate to individual URL entries and indicate the start and end of a listed subpage . |
<loc>, </loc> | The <loc> tag identifies individual website pages or its URLs. The URL should always start with the protocol (for example, "http") and end with a closing slash (if the web server requires it). A maximum length of 2048 characters is also defined. |
In addition to these required XML attributes, the <priority>, <lastmod>, and <changefreq> sitemap tags provide three additional tags to specify individual URL entries. However, the support for these optional tags depends on the respective search engine. For example, the Google bot primarily uses <lastmod> tags for indexing, while it largely ignores the other two attributes or only allows them to fit into the crawling process.
sitemap.xml : optional tags | |
<lastmod>, </lastmod> | Via the <lastmod> tag, the date (in W3C format) of the last modification of the page can be specified. The tag is independent of the "if modified since" header that the web server may return as part of an HTTP 304 response. |
<changefreq>, </changefreq> | The <changefreq> tag provides the web crawler with general information on how often a page should be updated (per hour, per day, per month, etc.). Documents modified on each access are marked with the value "always" and archived URLs are marked with "never". |
<priority>, </priority> | This tag is used to express the priority of a URL on an entire website on a scale of 0.0 to 1.0 (default priority: 0.5). This way, bots can be made aware of which pages are particularly indexable. |
Since an XML sitemap file can contain a maximum of 50,000 URLs and should not exceed 50MB , the collection of larger website URLs can also be spread across multiple documents. In this case, each sitemap document should be listed in an additional index file whose structure is similar to that of sitemap files: <sitemapindex> and <sitemap> tags should be used instead of <urlset> and <url >.
Note
It is possible to compress sitemap files (with gzip, for example), but only to reduce bandwidth requirements. The maximum size of an XML sitemap cannot be increased in this way, as the limit always applies to the unzipped version of the file.
XML sitemap example
The easiest way to understand the structure of an XML sitemap is to use a concrete example:
<!--?xml version="1.0" encoding="UTF-8"?-->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></urlset>
<url></url>
<loc>http://one-test.website/</loc>
<lastmod>2018-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
<url></url>
<loc>http://one-test.website/page1/</loc>
<lastmod>2018-03-05</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
<url></url>
<loc>http://one-test.website/page2/</loc>
<lastmod>2018-03-08</lastmod>
<changefreq>weekly</changefreq>
<priority>0.3</priority>
In this case, the sample XML sitemap lists the main one-test.website URL and the URLs of two subpages (page1 and page2). Search engine crawlers can see in the document that the webmaster has given the main page the highest priority and that changes are made about once a month. The last adjustment was made on January 1, 2018. Page1 has the default priority value (0.5), but unlike the main page, it was estimated that it would be adjusted weekly, with the last modification being on the 5th. March 2018 If the crawler uses the sitemap priority attribute, it knows that it should pay the least attention to page 2 when indexing (<priority> value: 0.3). The subpage is modified every week (last modification on March 8, 2018).
Creating and submitting an XML sitemap: how it works
Considering the amount of work involved in creating XML sitemaps manually, choosing plugins or online tools is a good idea, provided you use them correctly. Reasonable XML sitemaps can be generated without specific configuration, but structure directories can only be shaped as desired when the appropriate individual parameters are correct. For our example, we present the possibilities offered by the online XML-sitemaps.com generator and the WordPress Google XML sitemaps plugin for the creation and integration of XML sitemaps.
How to generate XML sitemaps using online XML-sitemap.com generator
XML-sitemaps.com online generator offers users a convenient solution to create their own XML sitemaps. The web service is free for web projects with up to 500 subpages . It is also possible to create sitemaps for larger projects, but that user will have to pay for the Pro subscription. The procedure is very simple: after accessing the web application, insert your website URL in the address field provided.
Download the generated XML sitemap file and upload it to your website directory. To notify the Google web crawler about the file, for example, simply submit the file to the Google Search Console . You can also specify the sitemap path in the robots.txt file :
Sitemap: http://your-site/sitemap.xml
Google XML Sitemaps: How to Create XML Sitemaps with the WordPress Plugin
For more than ten years, the WordPress Google XML Sitemaps plugin , developed by Arne Brachhold, has made it possible to create XML sitemaps and it's as easy as child's play. To use this popular plugin (over 2 million active installations worldwide) for your WordPress site, you must first install it through the plugin center of the content management system. Select the menu item "Plug-ins" then "Install" and enter "Google XML Sitemaps" in the search field. Clicking on "Install Now" starts the extension installation process, which should appear at the top of the results shown:
You can also download Google XML Sitemaps manually and place it in your WordPress plugin directory. If you activate the plugin, you can access it directly in WordPress via "XML Sitemap" in the " Settings " menu . Compared to XML-Sitemaps.com, a much larger number of configuration options are available in the following seven areas:
- Basic options : Here you define the basic settings and determine, for example, whether Google and Bing should be notified automatically of changes or whether the sitemap should be automatically compressed.
- Additional pages : here you can add files or URLs that do not belong to the WordPress project but are running on the same domain
- Priority posts : the adjustments in this menu are particularly interesting for blogs and news portals, if you use the <priority> tag for your sitemap, you can define here if and how the plugin should calculate the priority of an article .
- Sitemap Content : Use this menu to select the categories of pages to include in the XML sitemap (e.g. home pages, static pages, archive pages, etc.).
- Excluded Items : If you want to exclude categories or individual posts from indexing, you can do so here
- Change Frequencies : Google XML Sitemaps offers the ability to predefine the <changefreq> tag, and the update frequency can even be set separately for different page types.
- Priorities : below you can define the same parameters for the <priority> attribute
Once you have designed the configuration of the XML sitemap according to your wishes, save the changes using the corresponding button. By clicking on the " Your sitemap " link after saving, you send your XML sitemap to the selected search bots.