Job Description
- Job Description:
- Lead and grow a team of ML engineers focused on production ML systems
- Lead model improvements in response to production issues, product feedback, and new research or platform advancements
- Lead production release processes for ML services, including release planning, CI/CD, staged rollouts, and rollback procedures
- Build and operate observability and on-call practices for ML features, including monitoring, alerting, dashboards, incident response, and post-incident reviews
- Develop and maintain scalable evaluation frameworks, datasets, and automated regression tests to prevent quality regressions
- Lead reliability, performance, and cost improvements for inference and serving, including capacity planning and meeting SLAs (latency, throughput, availability)
- Partner with researchers, product, and platform teams to define quality bars and production readiness, including Trusted AI requirements
- Establish and evolve production standards and governance across ML features (testing, evaluation methodology, release gates, model versioning and lineage)
- Partner with platform and product teams to integrate ML capabilities into products
- Requirements:
- BS/MS in CS/Engineering or equivalent experience
- Experience building and operating software systems, including production ML systems
- People leadership experience, or strong technical leadership experience (mentoring, setting direction, driving delivery)
- Experience with cloud infrastructure and production observability (AWS, Azure, or GCP)
- Experience with CI/CD, reproducible deployments, and operating services in production
- Strong written communication and documentation skills
- Benefits:
- Health and financial benefits
- Time away and everyday wellness
Apply tot his job
Apply To this Job