Collocations Extraction using Python
$30-250 USD
Bezahlt bei Lieferung
Given a big text (corpus) about 1GB, I want to extract two-word, 3-word, 4-word and 5-word collocations or patterns using Log-Likelihood Ratio.
More specifically,
The requirements are:
(1) Given the corpus, I'd like to get the bigrams, trigrams, 4-grams and 5-grams using LLR
(2) Also, I want to find the collocations for any word which contains three or four specific letters. Like the collocations for words that have the letters "a - d - f" in that order but no matter if they are following one another or they are separated by other letters.
In both cases, I wish to have the output sorted. And of course, as I said earlier, the corpus is 1G so it's really big.
I prefer working with Python but I'm a novice so the code needs to be clear, easy to use and understand.
P.S. Budget limited to $100
Thanks
Projekt-ID: #5323958
Über das Projekt
6 Freelancer bieten im Durchschnitt $128 für diesen Job
I have a great deal of python experience, will complete the project in a timely manner, and do it correctly. Thank you for considering my bid.