Job Description
About the position
We’re looking for a hands-on technical leader to architect, fine-tune, and deploy on-device small language models (SLMs) for consumer security at scale. You’ll lead a focused team of 3–5 senior engineers while remaining deeply involved in the code and technical architecture.
Your core responsibility is building high-performance, privacy-preserving AI models that run directly on user devices (Mac, iOS, Android, Linux). You’ll own model optimization, fine-tuning for tool-use accuracy, evaluation frameworks, and cost-aware deployment strategies. While you won’t own the agent orchestration platform itself, you’ll work closely with it to ensure models behave correctly in multi-turn conversations and make reliable tool-calling decisions.
This role sits at the intersection of edge ML, applied LLMs, and production engineering. Success requires navigating real-world tradeoffs: latency vs. capability, privacy vs. accuracy, on-device vs. cloud execution, and cost vs. performance.
This is not a traditional director role. You’ll spend 60%+ of your time on technical architecture and implementation, with the remainder focused on mentoring senior engineers and setting technical direction.
This is a Hybrid remote position located in a hub location of Frisco, TX or San Jose, CA. You will be required to be onsite on an as-needed basis, typically 1-4 days per month. We are only considering candidates within a commutable distance to this location and are not offering relocation assistance at this time.
- Responsibilities
- Design and deploy small language models optimized for on-device inference (Mac, iOS, Android, Linux)
- Lead model optimization efforts including quantization, pruning, distillation, and efficient inference pipelines
- Fine-tune models to improve tool selection accuracy and conversational behavior in security-focused workflows
- Build evaluation frameworks to measure model efficacy, tool-calling accuracy, conversation quality, and safety in production
- Create synthetic data and workflow simulations to train and validate security-relevant conversations
- Partner closely with agent orchestration systems to optimize multi-turn dialogue behavior and state handling
- Implement cost-optimization strategies such as intelligent on-device vs. cloud routing, prompt caching, batching, and token efficiency
- Integrate cloud-based LLMs when deeper reasoning or broader context is required
- Build production ML systems that detect threats and protect users directly on-device
- Set technical standards and architectural direction for AI/ML across the security platform
- Mentor principal engineers and architects while remaining hands-on
- Requirements
- 10+ years of software engineering experience, with 5+ years focused on ML/AI
- Proven experience shipping ML models to production with transferrable skills to deploy these on edge or mobile platforms
- Experience with conversational AI systems and tool/function-calling architectures
- Strong Python and systems programming skills (C++ or Rust) for performance-critical code
- Deep expertise in model optimization (INT4/INT8 quantization, pruning, distillation)
- Hands-on experience with PyTorch and at least one edge deployment framework (TensorFlow Lite, CoreML, ONNX Runtime, or llama.cpp)
- Experience building evaluation and benchmarking frameworks for ML systems
- Nice-to-haves
- Experience applying ML systems in security, safety, or other adversarial domains
- Master’s degree in CS, ML, or a related field (or equivalent practical experience)
- Benefits
- Bonus Program
- Pension and Retirement Plans
- Medical, Dental and Vision Coverage
- Paid Time Off
- Paid Parental Leave
- Support for Community Involvement
Apply tot his job
Apply To this Job