Director of ML Engineering & Infrastructure

🌍 Remote, USA 🎯 Full-time 🕐 Posted Recently

Job Description

10+ years of industry experience spanning machine learning engineering and distributed systems,
3+ years of leadership and management experience, with a proven ability to build and lead strong technical teams,
MSc or Ph.D. in Computer Science, Machine Learning, or related field, or equivalent practical experience,
Proven expertise in building and deploying end-to-end ML systems at scale, including recommendation and personalization systems,
Strong background in distributed systems architecture, including low-latency services, streaming platforms, and large-scale serving,
Hands-on experience with deep learning frameworks (e.g., TensorFlow, PyTorch) and ML infrastructure technologies,
Track record of delivering high-quality, scalable, and fault-tolerant systems,
Excellent communication skills and ability to influence product and technical strategy,
Proven experience deploying large-scale serving systems on AWS and demonstrated expertise in leveraging Databricks for large-scale data processing and ML workflows

We are seeking a Director of Machine Learning Engineering and Infrastructure to lead a hybrid team bridging advanced ML engineering with world-class infrastructure design,
In this role, you will own the strategic direction and execution for scaling our machine learning capabilities while ensuring our distributed systems and infrastructure can support innovation at massive scale,
You will combine technical depth with leadership excellence to guide teams that deliver both foundational ML systems and high-performance distributed services,
Lead and manage high-performing teams across ML engineering and ML infrastructure, fostering a culture of innovation, collaboration, and growth,
Define and execute the strategic roadmap for ML systems, including recommendation, personalization, and ads optimization,
Oversee the design, development, and deployment of scalable ML pipelines: data ingestion, feature engineering, model training, evaluation, and serving,
Architect distributed systems to support ML workloads at scale, ensuring reliability, observability, and operational excellence,
Partner closely with Product, Engineering, and Content teams to align on business goals and deliver impactful ML-driven experiences,
Support best practices in experimentation, evaluation, and ML system monitoring,
Ensure cost efficiency, scalability, and performance in ML infrastructure investments

Apply tot his job

Apply To this Job