Job Description
- Job Description:
- Red-team AI models and agents by testing jailbreak attempts, prompt injections, misuse scenarios, and exploit strategies
- Generate high-quality human evaluation data by annotating model failures, classifying vulnerabilities, and identifying systemic risks
- Apply structured testing methodologies using taxonomies, benchmarks, and playbooks to ensure consistent evaluation
- Document findings clearly and reproducibly, producing reports, datasets, and adversarial test cases that teams can act upon
- Work across multiple projects, supporting different AI systems and evaluation objectives
- Requirements:
- You have **prior red-teaming experience**, such as adversarial AI testing, cybersecurity, or socio-technical risk analysis
- You naturally think **adversarially**, exploring ways to push systems to their limits and uncover weaknesses
- You prefer **structured methodologies**, using frameworks and benchmarks rather than ad-hoc testing
- You communicate risks and vulnerabilities **clearly to both technical and non-technical audiences**
- You are comfortable **working across multiple projects and adapting to new evaluation challenges**Nice-to-Have Specialties- **Adversarial Machine Learning:** jailbreak datasets, prompt injection attacks, RLHF/DPO vulnerabilities, or model extraction techniques- **Cybersecurity:** penetration testing, exploit development, reverse engineering- **Socio-technical risk analysis:** harassment or misinformation testing, abuse pattern analysis- **Creative adversarial thinking:** backgrounds in psychology, acting, writing, or other disciplines that support unconventional attack strategies
Benefits:
Apply tot his job
Apply To this Job