Aggregate news and detect duplicates to show on a website
$250-750 USD
Geschlossen
Veröffentlicht vor fast 13 Jahren
$250-750 USD
Bezahlt bei Lieferung
I want to make a website like [login to view URL] (or [login to view URL]).
It's basically just a news aggregator script like google news (with an algorithm to detect duplicate news). Currently [login to view URL] contains 22 websites that are checked every 5 minutes for new news (most of the websites have a rss feed, but it might be necessary to parse html). I will provide you with a list of websites I want to be checked. It's also possible that an existing news is edited, so the script has to check if the content of older news have changed (e.g. for all news of the last 30 days).
It should look like [login to view URL] (see attachment).
The user should be able to search certain terms (search form on top of the page).
I also need a filter, to show only news from certain websites (via html checkbox).
In my opinion I need a script (in python or java or ...) that is running the whole time and checks if there are new news. If so, it should feed a mysql database with the content and time (just to mention one thing: since this is a german project the three special characters ä, ö and ü need to be encoded).
Another script with the duplication algorithm needs to scan the mysql database for duplicates, so that at the last step the news can be shown at the website (e.g. via php).
Hello, My name is David Stanek (Google me!) and I'd like the opportunity to work on this project with you. I am a Python expert and can get this done quickly and efficiently.