Find Jobs
Hire Freelancers

Build an Online Store

min $50000 USD

Geschlossen
Veröffentlicht vor etwa 7 Jahren

min $50000 USD

Bezahlt bei Lieferung
Large Scale Crawler Looking for a developer (or company) to build a robust web crawler system. There are approximately 20,000+ websites that we want to crawl and extract data from. We want to be able to extract these data within 3-6 months. 1. Design the architecture of the crawler or use existing open source crawler as a template. Because we’re dealing with large volume of data the architecture needs to be: • Robust and scalable • Efficient and Fast • Support proxies (to bypass anti-scraping systems) 2. Create Admin dashboard where Admin can: a. Add, Edit, View, Delete, Stop, Search crawler b. Input the URL to crawl c. Specify the data that needs to be extracted (ie. Title, Title URL, etc.) d. View, Edit, and Delete extracted data e. Option to download the data in JSON, XML, CSV f. API of the data (either via Authorization Tokens or other means) for upload and integration h. Users Management with ACL (Access Control List), Create, Edit, View, Delete users 3. Data normalization and clean up. The data coming in are unformatted and unstructured; an example would be the location or city, some site list location or city as Houston, TX, while other list as Houston, Texas or USA-TX-Houston. Therefore, the location or city data needs to be formatted, we use Google Location. 4. Because the data changes daily on these 20,000+ websites, there needs to be notifications put in place to notify the system of the changes (ie. what’s been added and what’s been removed) and update the data automatically. 5. Once the data is verified and cleansed, it will be available for search either via Solr or ElasticSearch or any other recommendation. Some of the technical challenges that need to be addressed from the beginning: • Make sure that the crawler compresses the data before fetching it otherwise it will uses a huge amount of storage • No need to re-crawl a website every 1-2 days, because it would be a waste of resources, however we do want the data every 1-2 days • Ways to prevent crawler from DoS (Denial of Service) • Ways to prevent the system from crashing and overloading because there are so many crawlers running • System should be scalable to handle crawling 100,000 – 200,000 websites • Queuing: does the crawler start right away or does it run in batches at a certain time? How does it scale when we start adding more sites to crawl? Example Day 1: Admin adds 100 sites to crawl Day 2: Admin adds 200 sites to crawl Day 3: Admin adds 500 sites to crawl Day 4: etc.
Projekt-ID: 13528239

Über das Projekt

12 Vorschläge
Remote Projekt
Aktiv vor 7 Jahren

Möchten Sie etwas Geld verdienen?

Vorteile einer Ausschreibung auf Freelancer

Legen Sie Ihr Budget und Ihren Zeitrahmen fest
Für Ihre Arbeit bezahlt werden
Skizzieren Sie Ihren Vorschlag
Sie können sich kostenlos anmelden und auf Aufträge bieten
12 Freelancer bieten im Durchschnitt $53.931 USD für diesen Auftrag
Avatar des Nutzers
Hello sir I hope you are doing well. I have read your requirements carefully and I am very much confident to execute your requirements successfully. I am very expert in PHP , Laravel Framework ,Magento ,WordPress & woo-commerce, Drupal, Joomla and Website Design. I have 6+ year in Website Design and development. Please have a look at my profile, I have successfully done many projects. I work round the clock and available for discussions anytime. I am available now and ready to start the project immediately. I can provide work samples in private chat. Message me for further discussion. Many thanks for providing the opportunity to bid on the project. Thanks & Regards Gamdur Singh
$50.000 USD in 10 Tagen
4,9 (164 Bewertungen)
7,2
7,2
Avatar des Nutzers
Hello, I want to show you all relevant Demo and Designs which is similar to your project completed previously. To make sure about the requirement set and customizations, I want to discuss this project with you further on personal chat. Let me know the best suitable time for you to schedule the meeting, Feel free to message me at any time, I use to be online 24x7 on Freelancer so probably you will get a quick response from my end. Following are my Expertise Area: 1)PHP with CodeIgniter and Laravel Framework. 2)Node JS 3)Angular JS 4)Mobile App Development Thanks
$51.546 USD in 40 Tagen
5,0 (20 Bewertungen)
6,7
6,7
Avatar des Nutzers
Hi mate, I’d be glad to assist for web development . I have read description carefully understand requirement and planned to proceed with your requirement. I am excited for this opportunity and I have strong feeling that I could be the best fit for this job. I have 5+ years experience with web development. Proven experience in MySQL, HTML5, CSS3, JavaScript, Ajax & Strong JQuery. Excellent command over MVC framework. Good experience on working on large projects. We can discuss more about work on chat. Thanks Vishal
$50.000 USD in 60 Tagen
4,8 (68 Bewertungen)
5,9
5,9

Über den Kunden

Flagge von NETHERLANDS
Netherlands
0,0
0
Mitglied seit März 26, 2017

Kundenüberprüfung

Danke! Wir haben Ihnen per E-Mail einen Link geschickt, über den Sie Ihr kostenloses Guthaben anfordern können.
Beim Senden Ihrer E-Mail ist ein Fehler aufgetreten. Bitte versuchen Sie es erneut.
Registrierte Benutzer Veröffentlichte Jobs
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Vorschau wird geladen
Erlaubnis zur Geolokalisierung erteilt.
Ihre Anmeldesitzung ist abgelaufen und Sie wurden abgemeldet. Bitte melden Sie sich erneut an.