, ,

wget – Recursive GET – download and crawl websites

recursively download and crawl web pages using wget

wget -erobots=off -r http://www.guguncube.com

in case downloading hangs due to missing/incorrect robots.txt, just use “-erobots=off”.
This skips downloading robots.txt altogether.

Multiple URLs

wget -erobots=off -i url-list.txt

References

1. http://skeena.net/kb/wget%20ignore%20robots.txt