Hello,
I am Hermann, a scientist and software developer. I made an offer to solve your problem in 5 days' time, as I don't know how much text you need formatted, but guess I'll be able to finish the task much faster.
I would need to know a couple of things of you to do the task:
1) I guess your definition of a sentence is a string of text starting either from the beginning of a file or from a full stop and running on to another full stop, right?
2) I'm not sure I understood correctly: Is a bad noun one having a capital first letter or is a bad sentence one without a word with capital first letter except the first letter of the sentence?
3) If I understand correctly, you need two files for each text: the file containing only "good" sentences, and the file containing only "bad" sentences, right? What further information, if any, is required to connect both texts?
4) The best way to parse multiple files would be to convert them to a common format. I can do that easily by creating a script that converts files and reformats the results locally on my linux machine, but I can't create anything that will run on a Windows machine because Windows sucks hard at such tasks without having bouth specialized software for that. Would that be ok with you?
Best regards
Hermann