
Geschlossen
Veröffentlicht
I’m leading a team that already fine-tunes Hugging Face models but we’re stalled on the last mile: turning those checkpoints into WebLLM artefacts that run smoothly inside the browser through WebGPU/WebAssembly. I need a short-term partner who has actually walked this path before and can sit virtually with us, show exactly how to compile a model into the WebLLM format, debug any hiccups, and prove the result works in-browser with stable latency. What I expect from you • A step-by-step script or notebook that converts a standard HF model (think Llama-2, GPT-J, BLOOM or similar) into WebLLM format. • Clear explanation of the conversion tools, flags and weight slicing decisions you use, so my engineers can repeat the process later without you. • A minimal demo web page (TypeScript or vanilla JS is fine) that loads the converted model, allocates buffers correctly, and serves a prompt via WebGPU back-end. • Performance metrics (token / s, memory footprint) captured on at least one consumer-grade GPU so we can compare. Acceptance criteria 1. The model loads in an evergreen Chromium-based browser with no console errors. 2. First token latency ≤ 3 s and sustained generation comparable to your benchmark notes. 3. Full reproducibility on our hardware following your instructions. We already have the fine-tuned weights and a dev environment in place; I simply need your expertise to unblock compilation and browser inference. If you have prior commits or public demos with WebLLM, please share a link when you respond so we can hit the ground running.
Projekt-ID: 40068891
9 Vorschläge
Remote Projekt
Aktiv vor 19 Tagen
Legen Sie Ihr Budget und Ihren Zeitrahmen fest
Für Ihre Arbeit bezahlt werden
Skizzieren Sie Ihren Vorschlag
Sie können sich kostenlos anmelden und auf Aufträge bieten
9 Freelancer bieten im Durchschnitt ₹989 INR/Stunde für diesen Auftrag

Bringing over a decade of experience in web development, I'm equipped with the skills required to assist you in converting and deploying Hugging Face models into WebLLM format. I understand that you require not only efficient deployment but also a detailed explanation of the process for future replication. Need not worry, I have a proven track record of creating meticulous and easily comprehensible documentation which includes all the conversion tools, flags, and weight slicing decisions used. Having worked extensively with TypeScript and JavaScript, I am confident in my ability to create the minimal demo web page you're seeking, ensuring correct buffer allocation and prompt serving through the WebGPU back-end. To guarantee consistent performance within your requirements, I will capture performance metrics such as tokens/second and memory footprint on consumer-grade GPUs for comparison. At Dlite Info Tech, we strongly believe in transparency and real partnerships. Therefore, full reproducibility on your hardware following my instructions is a commitment I assure you. Let us collaborate to unblock compilation and browser inference for your Llama-2, GPT-J or BLOOM model into WebLLM format for smooth runtime within browsers using the power of WebGPU&WASM so we could hit the ground running right away!
₹600 INR in 40 Tagen
4,3
4,3

I see this project as an ideal match for my expertise, and I’m excited to contribute at a discount to build my reputation. I love your idea of turning Hugging Face checkpoints into efficient WebLLM artefacts for smooth in-browser inference and have recently done a similar project for another client. Your focus on a step-by-step conversion script, clear explanations, and a minimal demo aligned with your acceptance criteria shows a well-defined scope I fully understand. While I am new to freelancer, I have tons of experience and have done other projects off site involving WebGPU, model compilation, and performance optimization. When will our journey begin? Luke H.
₹1.000 INR in 40 Tagen
1,9
1,9

I bring 13 years of professional experience delivering high-quality results. I have strong expertise in all the required skills listed for this project. My approach ensures accuracy, clear communication, and timely delivery. I am confident I can exceed your expectations with efficient, reliable work. Looking forward to contributing to your project—ready to begin immediately.
₹1.050 INR in 40 Tagen
0,0
0,0

When it comes to your specific project needs, my years of experience as an AI and a Full-Stack Engineer are perfectly matched. I have worked with the spectrum of modern development aspects including AI-powered chatbots, automation workflows, and LLM-based tools, which directly aligns with your requirement. My deep knowledge in the relevant areas like React, Node.js, Python, Django, Flask will be instrumental in ensuring a smooth and successful conversion. I emphasize sustainable solutions to my clients' problems. Not only will I deliver a step-by-step script or notebook on converting an HF model into WebLLM format, but I will also ensure clear explanation of the conversion tools, flags and weight slicing decisions for future reference. Moreover, I commit to creating a minimal demo web page indicating how the converted model loads perfectly in an evergreen Chromium-based browser with no errors, memory footprint analysis and performance metrics all on the consumer-grade GPU to help you compare. My track record proves that together we can build powerful digital products that align with your goals!
₹1.050 INR in 40 Tagen
0,0
0,0

Hello, We propose a focused, hands-on engagement to unblock your WebLLM compilation and browser inference pipeline. Our team has prior experience converting Hugging Face checkpoints into WebLLM-compatible artefacts and deploying them successfully via WebGPU/WebAssembly in Chromium-based browsers. We will work alongside your engineers in real time to deliver a fully reproducible, last-mile workflow. This includes a step-by-step script or executable notebook that transforms your existing fine-tuned HF model (Llama-2 / GPT-J / BLOOM class) into WebLLM format, with clear justification for compiler flags, quantization choices, and weight slicing strategies. Every decision will be documented to ensure your team can independently repeat and adapt the process. To validate end-to-end success, we will provide a minimal browser demo (TypeScript or vanilla JS) that loads the compiled model, allocates GPU buffers correctly, and runs inference using the WebGPU backend. We will benchmark token throughput, memory usage, and first-token latency on a consumer-grade GPU and share results transparently. Lets connect to discuss further. Regards, Kuntal
₹850 INR in 40 Tagen
0,0
0,0

New Delhi, India
Mitglied seit Dez. 16, 2025
$1500-3000 USD
₹1500-12500 INR
€3000-5000 EUR
$15-25 USD / Stunde
₹12500-37500 INR
₹12500-37500 INR
$30-250 USD
€1500-3000 EUR
min. $50 AUD / Stunde
₹1500-12500 INR
₹12500-37500 INR
$30-250 USD
₹1250-2500 INR / Stunde
$30-250 USD
₹12500-37500 INR
₹37500-75000 INR
$30-250 NZD
$30-250 NZD
£750-1500 GBP
$10-30 USD