We need to automate scraping from website and there is no need to do design. If this project is completed well, there will be more work immediately. The requirements are as follows.
The website to scrap publishes legal files or Cases. The Cases are classified this way: by its instance (1st instance or 2nd instance), then by “matter” or subject (civil law, family law, etc) and then by court (first court, second court, etc).
Each file/case is composed by something called an “Acuerdo”, whose purpose is to know that something has happened in the case…and everyday the site publishes the Acuerdos of each Case that had activity the previous day. Finally, it is worth mentioning that each Case has a new Acuerdo from time to time, not everyday.
This explanation can be understood easily by checking this link: [login to view URL]
What we need is to scrap all information of every day and save it in a couple of databases according to the indications and tables attached. This can be easily accomplished considering that the url’s used by the site are straightforward… you just have to generate new url’s for everyday and scrap (more details below).
Technical/skill requirements
-The script/application must be object oriented
-the database must be done with MySQL
-the language to use must be PHP (or C#)
-the application must be able to scan the site everyday several times (for instance, once every hour during the afternoon)
-the application must be able to scan any given past date. Actually, it must be able to do a first scan from 2004 until today (if because of language limitations the application must be executed several times, it must happen automatically).
For completion of this project, please have this up and running on a development webserver we can access to test its functionality as described.
Very Important: To separate you from the spammers, please write I AM REAL as the first line of your bid. We will delete all bids that do not start with this phrase, since most bidders never read the requirements. Thank you for being one who does.