Job Description
Note: The job is a remote job and is open to candidates in USA. Scale AI is building the infrastructure that makes enterprise AI seamless. They are seeking a Senior or Staff Infrastructure Engineer to act as a primary technical lead, engineering the 'paved road' for knowledge retrieval and inference engines while ensuring the platform remains reliable for enterprise agents.
- Responsibilities
- Architect multi-cloud systems and abstractions to allow the SGP platform to run on top of existing Cloud providers
- Use our own data and AI platform to analyze build and test logs and metrics to identify areas for improvement
- Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers
- Enhance engineering and infrastructure efficiency, reliability, accuracy, and response times, including CI/CD processes, test frameworks, data quality assurance, end-to-end reconciliation, and anomaly detection
- Collaborate with platform and product teams to develop and implement innovative infrastructure that scales to meet evolving needs
- Design and champion highly scalable, reliable, and low-latency infrastructure and frameworks for building, orchestrating, and evaluating multi-agent systems at enterprise scale
- Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies
- Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response
- Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization to improve workflows and operational efficiency
- Skills
- Proven experience in a senior role, with 5+ years of full-time software engineering experience
- Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana)
- Extensive experience with at least one major cloud provider (AWS, Azure, or GCP)
- Strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups
- Proficiency in Python or JavaScript/TypeScript, and SQL
- Hands-on experience and a passion for working with Agents, LLMs, vector databases, and other emerging AI technologies
- Benefits
- Comprehensive health, dental and vision coverage
- Retirement benefits
- A learning and development stipend
- Generous PTO
- A commuter stipend
- Company Overview
- Scaleโs mission is to develop reliable AI systems for the worldโs most important decisions. It was founded in 2016, and is headquartered in San Francisco, California, USA, with a workforce of 501-1000 employees. Its website is https://scale.com.
- Company H1B Sponsorship
- Scale AI has a track record of offering H1B sponsorships, with 82 in 2025, 54 in 2024, 29 in 2023, 17 in 2022, 10 in 2021, 10 in 2020. Please note that this does not guarantee sponsorship for this specific role.
Apply tot his job
Apply To this Job