I did such job crawling popular e-commerce shops sites for getting web pages and parsing their content into word, excel, access and text files.
I have a set of great tools, which can automatically scan pages, parse any content on a web page, and clean up database.
I crawled sites with more 10 min pages.