Multithread python crawler for determination of pHash values
$30-250 USD
In Bearbeitung
Veröffentlicht vor etwa 9 Jahren
$30-250 USD
Bezahlt bei Lieferung
I am looking for a guy who can program a multithread python crawler and the user interface.
User interface:
-------------------
- User has the option to add several pictures (of which the pHash value is determined)
- User has the option to add URLs to be crawled detected
Step 1:
---------
Each day, the entire website of the typed in URLs is crawled that way, that the paths of each subpage, etc. is determined.
The paths shall be shared into 10 databases.
(e.g.: A website has 100 subpages - 10 of these URLs are put to the first database, 10 to the second, etc.)
Step 2:
---------
One crawler (one crawler per server) cares for one database, visiting all stored URLs and getting the pHash values of the pictures displayed on the website. The pHash value shall be stored in a central result database together with the server path of the picture)
Step 3:
---------
The pHash value of the originally uploaded picture and the found picture on the websites is compared. If the value is above a certain, by the user determinable value, the found picture is listed in the user interface as a potential match.
Other requirements:
--------------------------
* I need to have the option whether [login to view URL] and meta information shall be respected or not while crawling.
* I need to be able to set an hourly limit of the number of crawls per one website in order not to take to much attention and resources of the foreign websites.
* German freelancers are preferred.