Find Jobs
Hire Freelancers

Create a scraper program to download and process files for CFR regulations

$250-750 USD

In Bearbeitung
Veröffentlicht vor mehr als 10 Jahren

$250-750 USD

Bezahlt bei Lieferung
File Scraper, downloader, and file processing. This project consists of two parts: 1. Spider through a website ad download all files that result from the spidering 2. Format each file downloaded to a specific format Part one: You will be given a batch of starting URL's that look like this: [login to view URL] You will follow each of these URL's that will lead to another page with links that look like this: [login to view URL] You will follow each of these URL's that will lead again to another page with links that look like this: [login to view URL] You will now follow each of these links that leads to a page that links to specific documents. The links within the pages tend to look like this: <table width="480"><tr> <td><table width="120"> <tr><td> <a class="tpl" href="/cgi/t/text/text-idx?c=ecfr&SID=f68f503ab8017206c54fb367aaaa7851&amp;rgn=div8&amp;view=text&amp;node=10:1.0.1.1.4.1.56.1&amp;idno=10"> &sect;5.100</a></td></tr> </table></td> <td><table width="354"> <tr><td>Purpose and effective date.</td></tr> </table></td> </tr></table> each of these links leads to a page that needs to be saved with the following naming structure that looks like this: [login to view URL] other examples of naming structures: 6cfrAppendix A to Part [login to view URL] Part two of this project: After you have downloaded each file, you will need to put each file into a specific html page structure. 1. You will first strip all of the information before <!-- startDynamic --> and after the <!-- endDynamic --> 2. You will now need to create a header for each record that looks like the files that are part of the samples. 3. You will need to replace the string in the text when it comes across a graphic: example string: Please replace: <img src="/graphics/ With this string: <img src="[login to view URL] AND replace this string: <a href="/graphics/pdfs/ With this string: <a href="[login to view URL] 4. You will need to create a footer at the bottom of each section, after the p class=” cita, that looks this this example: <p class="cita">[54 FR 53314, Dec. 28, 1989]</p> <br><p><center>Copyright 2013 Compliance Publishing Corporation (877) 500-6737</center> </body> </html> 5. You must be able to accommodate both regular regulations and the Appendix sections 6. Some of the titles have one less level. This program must be able selectable to how many levels deep the individual text is located. 7. All of the search and replace definitions must be kept ‘outside’ of the program in text files that can be modified as needed. 8. We require the source code as well as the finished program at the end of the project 9. Attached is a program that completed most of these tasks, but no longer works correctly because of a minor change in the text formatting (the programmer is no longer available). You may wish to use this program as a guide. 10. Attached are raw data documents and finished documents to be used as a guide. Please review the information carefully before you provide a bid, as there will be no changes to the contract price once we accept your bid. Please view the attached file for a sample of what the file format will be when completed. There are both regular and appendix text in this sample. Program must work in Windows Server 2008 We provide all funds in a Freelancer escrow account. You must complete this project within 30 days (or less) You must reply to all communications within 24 hours
Projekt-ID: 5338385

Über das Projekt

12 Vorschläge
Remote Projekt
Aktiv vor 10 Jahren

Möchten Sie etwas Geld verdienen?

Vorteile einer Ausschreibung auf Freelancer

Legen Sie Ihr Budget und Ihren Zeitrahmen fest
Für Ihre Arbeit bezahlt werden
Skizzieren Sie Ihren Vorschlag
Sie können sich kostenlos anmelden und auf Aufträge bieten

Über den Kunden

Flagge von UNITED STATES
Edina, United States
4,9
142
Zahlungsmethode verifiziert
Mitglied seit Aug. 13, 2008

Kundenüberprüfung

Danke! Wir haben Ihnen per E-Mail einen Link geschickt, über den Sie Ihr kostenloses Guthaben anfordern können.
Beim Senden Ihrer E-Mail ist ein Fehler aufgetreten. Bitte versuchen Sie es erneut.
Registrierte Benutzer Veröffentlichte Jobs
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Vorschau wird geladen
Erlaubnis zur Geolokalisierung erteilt.
Ihre Anmeldesitzung ist abgelaufen und Sie wurden abgemeldet. Bitte melden Sie sich erneut an.