recursively download and crawl web pages using wget
wget -erobots=off -r http://www.guguncube.com
in case downloading hangs due to missing/incorrect robots.txt, just use “-erobots=off”.
This skips downloading robots.txt altogether.
Multiple URLs
wget -erobots=off -i url-list.txt