Building a scalable web scraper for a large number of different websites

Geschlossen Veröffentlicht vor 3 Jahren Bezahlt bei Lieferung
Geschlossen

The goal of the project is to build a scalable web scraper which should scrape data from more a dozen different websites at first. Later on, it should be possible to upscale the scraper to a few thousand websites.

Those websites are known and should be added iteratively to the scraper. The websites have a different structure each which is why the development and maintenance costs per site need to stay as small as possible. The aim is to scrape the websites on a weekly basis at first. Later on, the scraping intervals should be reduced to a daily basis or even shorter. The scraped data needs to be stored in an useful and efficient way in a database in the cloud. Furthermore, the scraping must be intolerant to changes in the designs of the websites and it must prevent being blocked.

Currently, a simple scraper in Python exists which can scrape a few websites by using the Selenium library. However, this does not need to be continued at all cost.

The following tasks are part of your engagement for the project:

o Developing a modular and scalable software architecture for the web scraping project (preferably with Python)

o Containerizing the program in Docker

o Deploying and managing the containers in the cloud, probably with AWS and Kafka

o Implementing different measures to prevent blacklisting and being blocked

o Setting up a SQL database, probably PostgreSQL with AWS

The following tasks might be part of a further engagement:

o Implementing the web scrapers for a large number of different websites

o Maintaining and monitoring the scrapers for the websites

o Adding a web crawler to find additional websites

o Parsing the stored data and processing them into a more useful format

Your qualifications:

o Web Scraping (Importance: 9/10)

o Python (Importance: 7/10)

o Docker (Importance: 8/10)

o AWS (Importance: 5/10)

o Kafka or other Pipelining/Queuing Tools (Importance: 8/10)

o Cloud Databases (Importance: 6/10)

o PostgreSQL (Importance: 10/10)

You are expected to work closely together with our developer in Germany. The tasks above need to be coordinated and done in cooperation with him. Therefore, a willingness to work between 10 AM and 10 PM Central European Time is required.

We wish to get to know you first by working together in a limited project scope. If you are a fit for our team, we are willing to intensify our cooperation with you and hire you for future projects.

Web Scraping Python Docker Amazon Web Services PostgreSQL

Projekt-ID: #28930972

Über das Projekt

8 Vorschläge Remote Projekt Aktiv vor 3 Jahren

8 Freelancer bieten im Durchschnitt €10/Stunde für diesen Job

TheScorpion93

we are using python in scraping Please, contact me and send me the link to the site so I could make a FREE SAMPLE Please, contact me and send me the link to the site so I could make a FREE SAMPLE Hi there, I’ve read Mehr

€8 EUR / Stunde
(100 Bewertungen)
6.7
yanakhokhlova199

Hello there. I am very interested in your project. *** As web scraping and python expert ***. I can handle this and am confident of winning. So I have rich experience in scraping app development with python , seleni Mehr

€10 EUR / Stunde
(7 Bewertungen)
4.9
amineghennou3

Hello, This is Amine from Malaysia, a full stack web developer, who has working 5 years of working experiences in this field. I am fully feeling comfortable working with Python, web Scraping, AWS, PostgreSQL.. I will Mehr

€10 EUR / Stunde
(4 Bewertungen)
3.8
webxtor

Hello. An experienced web extractor doing projects mainly in PHP but Python might also be an option. Thanks for considering Eugene

€15 EUR / Stunde
(6 Bewertungen)
4.0
stepinnsolution

Hi Sir Nice to meet you i am expert in python with web scraping at high level. I agree with your time zone confidential level of skiils you wrote above. Plase come in chat and show me details

€12 EUR / Stunde
(1 Bewertung)
4.1
sokolovicstefan3

Hi, there. Here is an expert web scraping and automation developer who is very familiar with python/Selenium. After checking your job description and skill set, I found this job suits me as well. I can work in the tim Mehr

€12 EUR / Stunde
(4 Bewertungen)
3.4
joseji

This project really caught my eyes. I have the required qualification to do this work. I will be working with python using scrapy framework. There are really javascript heavy website nowadays which really makes it diff Mehr

€8 EUR / Stunde
(15 Bewertungen)
3.1
Krishnamuthyam9

I have strong experiance on below, please give chance to work on this project. qualifications: o Web Scraping (Importance: 9/10) o Python (Importance: 7/10) o Docker (Importance: 8/10) o AWS (Importance: 5/10) o Kafka Mehr

€6 EUR / Stunde
(0 Bewertungen)
0.0