I need a perl web crawler that is thoroughly commented and easy to understand.
-This program only needs to crawl HTML pages that have the HTTP protocol.
-Page link that have already been crawled shouldn't be crawled again
-Must obey Robots Exclusion Protocol
Along with this source code, I need a separate text file that explains how it was implemented (e.g. what data structures were used), what design decisions and other assumptions were made (e.g. LIFO or FIFO queue?), etc.
There's a few more specific details that need to be implemented, so I can send my project description if it's needed.. but this is the GIST.