Suggest the best way of clustering of news articles for aggregator
$10-30 USD
Geschlossen
Veröffentlicht vor fast 5 Jahren
$10-30 USD
Bezahlt bei Lieferung
The project is based on Laravel/ PHP.
Here is the case, I get news from several news sources every minute. Basically they are wordpress post, as the script we are using for news aggregator is based on Wordpress Plugin.
Now, we are fetching those post to Laravel site via one of those Wordpress to Laravel([login to view URL]).
So far, we can using TextRank([login to view URL]), we can do following for any posts:
Find sentences,
Remove stopwords,
Create integer values by find and count the matching words,
Change the integer values by the related words' integer values,
Normalize values to create scores,
Order by scores
To be more precise, we can get bag of words from any wordpress Post.
Now, I am gonna need a complete algorithm and guide, preferably on PHP(if there is any library) that will be able to cluster/ group lists of articles into a same Coverage table. Coverage can have any data(as whatever you say to make algorithm good), what I think is we need coverage ID field, and a field that accepts array of post ID that is similar to each other and has same Coverage ID.
We also have a table called newsTag, that has following field: postId, most important topic mentioned. You can ignore the topic mentioned because, it depends on only the topic that is category, so if we cluster based on topic mentioned from newsTag, we will be limiting clustering ability because in some post there are no topic mentioned.
Provide me complete algorithm, based on it, ask me any questions if you need to and send me a PDF file of algorithm and possible an examples.
Hi wintergreen develops more projects in A I for last 9 years...so we have deep knowledge in research concept...
as per requirement:
1. it's like nlp...remove stop words and apply pos tag.
2. Extract the features like term count or frequency
3. apply the clustering algorithm make or group each category.....
We have deep knowledge ..how will interconnect in real time and what problem we have to face and how to solve...
wintergreen is an small research firm and we like to do long term relationship with customers...
if you need any clarification kindly contact through chat....thank you...
I can help you with this project. Writing algorithm in PHP can be painful, workaround would be I can give you a flask based REST API (python) and I can open the endpoints for you to POST coverage_id or <array_of_coverage_ids> and get the response back in JSON which you can parse it in PHP.