
Geschlossen
Veröffentlicht
Bezahlt bei Lieferung
I have a very limited corpus of speech and transcripts—just a few hours—and I need to translate low data language to english in realtime. The goal is to adapt a large ASR/TTS Transformer (Whisper-style architecture) to this low-data setting, squeezing the most out of the dataset through smart training techniques. What matters most is the training strategy: curriculum scheduling, learning-rate tricks, mixed-precision, and any other techniques you know that keep a Transformer stable when the data are scarce. Data augmentation, synthetic data creation, few-shot and self-supervised learning are all on the table; noise injection, pitch or speed variation can be explored if they prove helpful. I prefer to work within proven open-source stacks—Coqui TTS, Mozilla TTS, and Tacotron—so please build and document the pipeline in those environments. The final model should achieve intelligible, natural-sounding speech in the target language and be reproducible on a single high-end GPU. Deliverables • Cleaned, ready-to-train audio/text dataset (scripts included) • Training code and configuration files for the chosen framework(s) • Fine-tuned model checkpoints plus inference script • Short report summarising hyper-parameters, augmentation methods, and evaluation results (WER/MOS) Acceptance criteria • Training scripts run end-to-end from raw data to synthesis without errors • Objective metrics meet or exceed baseline Whisper fine-tune on the same data • Synthesised samples judged 4.0 MOS or higher by at least three native speakers If you enjoy pushing Transformers into low-resource territory, let’s make this language heard.
Projekt-ID: 40065263
24 Vorschläge
Remote Projekt
Aktiv vor 27 Tagen
Legen Sie Ihr Budget und Ihren Zeitrahmen fest
Für Ihre Arbeit bezahlt werden
Skizzieren Sie Ihren Vorschlag
Sie können sich kostenlos anmelden und auf Aufträge bieten
24 Freelancer bieten im Durchschnitt $6.013 USD für diesen Auftrag

Hello, As the leader of a talented team at Live Experts, we've accumulated extensive experience and knowledge in the domains of Deep Learning and Machine Learning. Specifically, within the realm of low-resource settings, we've developed quite an expertise. In fact, challenging boundaries and maximizing the potential of limited data is what excites us the most about AI. Your project's complex training strategy requirements align perfectly with our skillset. With hands-on familiarity in Tacotron, Coqui TTS, and Mozilla TTS among other frameworks, we can create a tailor-made pipeline for your unique dataset. We know how to bend low data limitations by employing smart training methods such as curriculum scheduling, learning-rate tricks, mixed-precision training and even self-supervised techniques. Moreover, our ability to navigate with ease through tasks like synthetic data creation and few-shot/zero-shot learning will undoubtedly prove valuable in this project. This combined with our strong understanding of augmentations using techniques like noise injection, pitch or speed variation will help maximize the use of your limited corpus. Our collaborative efforts with you will not end at training though: our deliverables include a comprehensive report summarizing hyper-parameter choices, augmentation methods used along with transparent evaluation results to know just what went into creating an intelligible model. In conclusion, if you'r Thanks!
$5.000 USD in 3 Tagen
6,8
6,8

With over 10 years of experience in web and mobile development, specializing in AI/ML, blockchain, and more, I understand the unique challenge your project presents. You have a limited corpus of speech and transcripts and need to train a Transformer in a new language using smart techniques. In previous projects, I have successfully implemented innovative strategies in fintech, healthcare, and blockchain domains, achieving outstanding results. My expertise in training AI models with scarce data aligns perfectly with your project requirements. Let's bring your vision to life by leveraging my experience and skills to deliver a high-quality solution. I am excited to collaborate with you on this project and look forward to discussing the details further. Contact me to discuss how we can achieve exceptional results together.
$4.000 USD in 45 Tagen
5,4
5,4

I propose to build a reproducible, low-resource speech-to-English translation system by adapting a Whisper-style Transformer using advanced training strategies designed for scarce data. With only a few hours of labeled speech, the focus will be on how the model is trained rather than model size: curriculum learning (short-to-long utterances), careful learning-rate scheduling with warm-up, mixed-precision training, encoder freezing/unfreezing, and strong regularization to prevent overfitting. The pipeline will be built on proven open-source stacks (Whisper + Coqui TTS/Tacotron). Data efficiency will be maximized through augmentation (speed perturbation, mild noise injection, SpecAugment), pseudo-labeling on unlabeled audio, and self-training. The first phase targets low-resource ASR → English text, followed by high-quality English TTS to ensure natural, intelligible output while keeping MOS high. The system will be optimized for near-real-time inference on a single high-end GPU. Deliverables include fully automated data-cleaning scripts, end-to-end training code and configs, fine-tuned checkpoints, and an inference script. A concise report will document hyperparameters, augmentation choices, and evaluation results (WER and MOS). Success is defined by stable end-to-end training, objective metrics exceeding a baseline Whisper fine-tune, and synthesized speech achieving ≥4.0 MOS from native evaluators.
$50.000 USD in 55 Tagen
5,3
5,3

You’ve already built something powerful — the challenge now is that in a low-data setting, Transformers run out of meaningful signal long before they run out of capacity. That’s why training starts to feel unstable, progress becomes unpredictable, and improvements don’t always translate into better real-world results. I understand that frustration, and I’m here to help you move forward calmly and deliberately. My job is to get you to a model you can trust, retrain, and extend — without wasted runs or evaluation drama. Given your constraints (a few hours of speech, limited phonetic coverage, and a single high-end GPU), this is a strategy problem more than a compute problem. The work is about keeping training stable past the early overfit cliff, using augmentation conservatively, and knowing early whether a run is worth continuing. How I’ll do this: Validate the dataset, check alignment, and run baselines to set realistic expectations. Run disciplined training cycles with controlled augmentation, learning-rate scheduling, and checkpoint evaluation. Refine results, align MOS feedback, and deliver clean configs, checkpoints, and an inference script. I’m comfortable working with Coqui TTS, Mozilla TTS, and Tacotron-based pipelines, and I prioritize reproducibility and documentation so nothing depends on “magic” settings. If this aligns with what you’re aiming for, I’d be happy to walk through the proposed phases and success criteria before we start.
$3.200 USD in 15 Tagen
4,8
4,8

Dear Project Lead, What if you could achieve natural-sounding speech synthesis in your low-resource language without massive datasets? I'd like to build a working demo of the optimized Whisper-style training pipeline—complete with curriculum scheduling and data augmentation—before you commit to the full project, so you can see the approach in action. I specialize in squeezing maximum performance from scarce speech data using proven techniques: mixed-precision training, synthetic data generation, self-supervised learning, and intelligent hyperparameter tuning. Using your preferred open-source stacks (Coqui TTS, Mozilla TTS, Tacotron), I'll deliver a reproducible, GPU-efficient pipeline that achieves 4.0+ MOS scores and exceeds baseline Whisper performance on your exact dataset. Your language deserves to be heard—and I'm confident this approach will get you there. Let's discuss your specific corpus characteristics and audio quality goals so I can refine the strategy and show you a working demo. Regards, Smith
$4.000 USD in 7 Tagen
4,4
4,4

As a versatile Full Stack Developer with expertise extending into Machine Learning, I’m passionate about pushing the boundaries of AI with innovative, bespoke solutions. Your project caught my eye because it specifically requires someone who enjoys navigating challenging low-resource language tasks- and that's exactly me! Being adept at both Python and AI frameworks such as TensorFlow orientates me well to the open-source Coqui TTS, Mozilla TTS, and Tacotron stacks you prefer. In the context of scarce data like yours, my ML skills cut through. I'm practiced at deploying advanced data augmentation, few-shot, self-supervised learning techniques in conjunction with clever curriculum scheduling, learning-rate tricks, and mixed-precision strategies to optimize results even when data is limited. I leverage my fluency in Python (including Django and FastAPI), Java (especially Spring Boot) and understanding of languages like C++, Lua and C# to build powerful models that yield natural-sounding real-time translations.
$3.000 USD in 11 Tagen
3,7
3,7

As an experienced full stack Engineer with a strong background in Web and Mobile Application Development, I am excited about the opportunity to work on the Low Resource TTS Transformer Training project. With over 8 years of experience in developing innovative solutions, I am confident in my ability to tackle the challenges posed by adapting a large ASR/TTS Transformer to a low-data setting. My expertise in curriculum scheduling, learning-rate optimization, and other smart training techniques will be instrumental in maximizing the potential of the limited corpus provided. I am well-versed in working with open-source stacks such as Coqui TTS, Mozilla TTS, and Tacotron, and I am committed to delivering a high-quality, reproducible model that meets your requirements. I am passionate about pushing the boundaries of technology and am eager to collaborate with you on this project to make this language heard. Let's work together to create intelligible, natural-sounding speech in the target language.
$3.000 USD in 7 Tagen
0,0
0,0

Hi, I’m ready to adapt a large ASR/TTS Transformer to translate your low-data language to English in real-time, maximizing performance despite having only a few hours of speech and transcripts. My approach: --> Clean and preprocess your audio and transcripts, including normalization, noise handling, and alignment, producing a ready-to-train dataset. --> Implement curriculum scheduling, adaptive learning rates, mixed-precision training, gradient accumulation, and stability-focused tricks tailored for low-data Transformers. --> Apply noise injection, pitch/speed variation, and potentially synthetic speech generation to expand the effective dataset. Few-shot and self-supervised learning methods will be explored to boost performance. --> Build and document the pipeline within Coqui TTS, Mozilla TTS, or Tacotron environments, ensuring training runs end-to-end on a single high-end GPU. --> Provide fine-tuned model checkpoints, inference scripts, and a short report summarizing hyperparameters, augmentation methods, and evaluation results (WER/MOS). Quick questions: What is the total duration and number of speakers in your dataset? Any specific domain or vocabulary focus for translation that we should prioritize? Preferred target latency or real-time constraints for inference? I will focus on your requirements. I worked on similar projects and can share upon request. Let’s discuss timeline and budget based on these details via chat. Regards, Atta
$4.000 USD in 25 Tagen
0,0
0,0

Boston, United States
Mitglied seit Dez. 15, 2025
$1500-3000 USD
$10-30 USD
$250-750 USD
€250-750 EUR
$250-750 USD
$15 USD / Stunde
€750-1500 EUR
$30-250 USD
₹12500-37500 INR
₹12500-37500 INR
₹12500-37500 INR
$30-250 USD
$750-1500 AUD
₹250000-500000 INR
$30-250 USD
$750-1500 USD
₹12500-37500 INR
$2-8 USD / Stunde
$30-250 USD
$15-25 USD / Stunde