How to scrape all links from a XML Sitemap

  • by

Most of the websites has XML Sitemap. Which usually contains all product or other pages links. With XML Sitemap you can fast and easy scrape all website links.

Note: Not all XML Sitemaps has actual data. If you find sitemap with old data, try to scrape actual links using “URLs” tab or Website crawler tool (ctrl+7)!





First, you need to locate XML Sitemap URL.

Just open robots.txt file. Usually this file located on: https://scrapedominion.com/robots.txt (add /robots.txt to any website domain). If a website does not have this file just try to type /sitemap.xml instead /robots.txt

How to locate a website sitemap URL


This XML Sitemap has child maps.

Main XML Sitemap structure

All URLs (links) in this sitemap are located in <loc>…</loc> (that is container).


If you want to extract links from single XML Sitemap.

Scraping XML Sitemap links


If you want to extract all child XML Sitemaps data from parent sitemap, put parent sitemap URL and check “HasChildrenMaps”.

How to extract child XML Sitemaps


Also you can apply URLs filters to XML Sitemap links.

How to use URLs filters

Just check “perform URLs filters”, set URLs filters (Ctrl+3), clear main links list (F7) and press GET URLs button is XML Sitemap scraping tool.


1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

Loading...

Leave a Reply

Your email address will not be published. Required fields are marked *