Hi there,
This sounds like an interesting project. It is also something that has been done before, so although I would intend to deliver you an original solution (i.e. not containing any copyleft code), there are concepts that can be borrowed from existing libraries such as jtrans, sailalign etc.
I have past experience of audio processing projects including speech to text, and I am familiar with several third party libraries in this space. Although it will be non-trivial to provide a solution without using copyleft-licensed code, it will be straightforward to re-implement the required techniques in a self-contained piece of software.
Since you want Windows and Mac compatibility I would intend to use a portable language, either Python or Java. Do you have a preference, since it needs to hook into your larger program?
In what language will the audio / text be? You mentioned that the text file is utf-8. Will it contain characters from non-latin character sets (e.g. cyrillic, arabic etc.)?
Also, what is the duration of the audio files, and is there a large number of files to process? Do you need to be able to perform alignment in real-time or faster?
I have entered a low bid for this project as I am keen to increase my rating on this site, but I believe I will be able to provide as good a solution as the higher bidders!
Thanks for considering my bid, I hope I can work with you!
Rob