
Open
Posted
Paid on delivery
Need to optimize a databricks notebook which is running for 70+ mins
Project ID: 40442168
5 proposals
Open for bidding
Remote project
Active 6 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
5 freelancers are bidding on average ₹9,564 INR for this job

Hi there, I understand a Databricks notebook on Azure is taking 70+ minutes to complete; I will profile Spark stages, data shuffles and cluster sizing to cut runtime. My experience optimizing PySpark jobs, partitioning, caching and Delta Lake on Azure Databricks makes me a fit for this task. - Run Spark UI / Ganglia profiling, identify slow stages, fix expensive shuffles and optimize joins (broadcast, repartition) in the notebook - Implement code changes: vectorized PySpark/SQL, caching, partition pruning, Delta optimization and tuned cluster configuration (autoscaling, executor size) - Optional: add CI checks (Git + pipeline) to run lightweight performance tests before merge - Risk/quality-control: backup checkpoint + staged deployment and post-fix validation Skills: ✅ Databricks ✅ Azure Databricks / Azure ✅ PySpark / Spark optimization ✅ Git / CI pipeline integration ✅ Performance tuning / monitoring ✅ Delta Lake / partitioning Certificates: ✅ Microsoft® Certified: MCSA | MCSE | MCT ✅ cPanel® & WHM Certified CWSA-2 Is this running on a live production Azure Databricks cluster? Best regards,
₹2,872 INR in 1 day
5.2
5.2

Welcome to professional Python development services! Hi there, I'm Alema, a Python expert programmer who strives for clear code in atmospheric, numerical weather prediction, physics, and all other seminal fields. I'm ready to provide you with high-quality services. I have completed 350+ projects with a 100% Positive Rating. If you are looking for Quality work, look no further. Also, we are a team of professional workers, and we are always available 24/7 to help employers without limitations, and delivery is guaranteed on time. Your faithfully. Eng. Alema Akter
₹3,500 INR in 1 day
3.4
3.4

Hi! I'm a DevSecOps engineer with over 4 years of experience working with clients such as EY, Globant, and MercadoLibre. Before proposing a solution, could you share more details about the notebook? Specifically: What's the data volume it's processing? Is it mostly SQL queries, Python or a mix? Are you using a shared or dedicated cluster? This will help me identify whether the bottleneck is in the code, the cluster config, or the data pipeline itself. Best regards, Ing. Mauricio Batista
₹35,000 INR in 7 days
2.1
2.1

Hello, Resonite Technologies has a proven data engineering team experienced in Databricks, PySpark, Spark SQL, Delta Lake, and performance tuning. We can optimize your notebook currently running 70+ minutes by identifying bottlenecks and improving execution time without affecting output accuracy. We will review: • Spark execution plan and job stages • Joins, shuffles, partitions, cache usage • Delta table reads/writes and file sizes • SQL/PySpark logic optimization • Cluster configuration recommendations We have optimized slow ETL/data processing pipelines and reduced runtime significantly through query tuning, partitioning, broadcast joins, caching strategy, and Delta best practices. Please share the notebook, sample data structure, current runtime logs, and expected output. We can start with a quick assessment and provide exact optimization steps. Warm Regards, Karthik B Resonite Technologies
₹3,450 INR in 7 days
0.0
0.0

Hyderabad, India
Payment method verified
Member since Sep 10, 2022
₹600-1500 INR
₹600-1500 INR
₹600-1500 INR
₹1500-12500 INR
₹1500-12500 INR
£250-750 GBP
€750-1500 EUR
₹12500-37500 INR
$30-250 AUD
$250-750 USD
$30-250 USD
$750-1500 USD
$250-750 USD
$10-30 USD
$250-750 USD
₹75000-150000 INR
$30-250 USD
£250-750 GBP
$250-750 USD
₹1500-12500 INR
$25-50 USD / hour
$250-750 USD
₹1500-12500 INR