Asynchronous Web Scraper

von RaulBlanko
Asynchronous Web Scraper
Asynchronous Web Scraper
Asynchronous Web Scraper

Asynchronous web scraper works with the ElasticSearch cluster and MongoDB. It takes the list of organizations from MongoDB database and creates the bulk of tasks (coroutines) that run asynchronously. For every organization in the database, it crawls all the pages, clears the texts, and updates ES cluster with new text content. It saves all necessary things into draft ES index including sitemaps, visited pages, visiting reports, "robots.txt" rules.

image of username RaulBlanko Flag of Ukraine Kharkiv, Ukraine

Über Mich

Stand with Ukraine! I’m a professional data engineer and web scraping expert. I provide my developing services to clients of all sizes. My areas of expertise: ∙ Web scraping, parsing, and data engineering using Python ∙ Web and routine automation, search automation using Chrome DevTools or Selenium Webdriver ∙ Data processing, mining and manipulating scripts ∙ Data exploring, cleaning and visualization ∙ Data science and Statistical analysis using SciPy, NumPy, Pandas, SKLearn ∙ Tasks and processes automation ∙ Data processing and converting tools from/to XLSX, JSON, XML, PDF, DOCX, CSV etc. ∙ Providing and using API (REST, GraphQL) ∙ Desktop applications, Windows applications using PyQT ∙ SQL, NoSQL Database integration: MongoDB, MySQL, MariaDB, etc. I would be pleased to consider proposals for long-term projects. Feel free to contact me. Thanks!

$30 USD/Std

37 Bewertungen
5.9

Tags