I need someone with good RegEx skills who can review business news articles that we have classified and create regular expressions that will efficiently classify future similar articles into the same category automatically.
The goal of this project is to automatically classify news articles. With the current set of regular expressions we have, about 22% of news articles are getting classified. We want to increase the proportion of articles that get classified automatically to at least 50%. At the same time, we want to maintain good accuracy with the regular expressions. So that at least 90% of the automated classifications are accurate. So the challenge is to balance making the regular expressions relatively greedy/broad to maximize their coverage potential, but without sacrificing accuracy.
I have tried a few different approaches for doing this, and so far the method that seems to be most efficient is to focus on articles whose headlines suggest a particular category and to have the regex operate on the headline only. The metric that is important at this stage is what coverage boost (percent) are we getting per minute spent analyzing articles and creating Regexes? So reading a headline and determining a category and regex is more efficient than having to read a whole article, generally. We will also need to periodically check your regexes to ensure that the accuracy of them is maintaining around the 90% level.
In your proposal, please attach some examples of regular expressions you've written that would demonstrate your skill at being able to write RegExes that are both accurate and that capture a high proportion of relevant matches.
Also, please confirm when you apply that you are available for at least 3 hours per day to work on this.