Job Description
AWS HPC Architect / SME (GovCloud, Secret/SCI)
100% REMOTE
Contract position
- Client note:
- Looking for someone with an Active Secret Service Clearance (SCI)
- Skill Set: DevOps/HPC tooling resource, experience with infra, landing zone, etc., for a project for Secret Deployment/AWS GovCloud
Overview:
The AWS HPC LEAD & SME is responsible for designing, implementing, and optimizing high-performance computing solutions on the AWS Cloud platform. This role combines deep technical expertise in distributed computing, data-intensive workflows, and AWS HPC services with the ability to lead architecture design sessions, define best practices, and ensure scalability, performance, and cost efficiency across enterprise or research workloads.
- Key Responsibilities:
- Lead the Design & Build: Develop scalable, high-performance architectures leveraging AWS HPC services such as AWS ParallelCluster, FSx for Lustre, EFA (Elastic Fabric Adapter), AWS Batch, and EC2 HPC instances.
- Solution Implementation: Deploy, automate, and optimize HPC clusters and data pipelines for compute- and memory-intensive workloads, including modeling, simulation, genomics, CFD, AI/ML training, and financial risk analysis.
- Performance Optimization: Benchmark, tune, and monitor system performance for compute, storage, and networking components to achieve optimal throughput and cost efficiency.
- Infrastructure as Code (IaC): Implement reproducible environments using Terraform, AWS CDK, or CloudFormation to streamline provisioning, CI/CD, and configuration management.
- Data and Storage Management: Design high-throughput parallel storage solutions using S3, FSx for Lustre, EBS, and EFS; integrate with hybrid and on-prem HPC environments.
- Security and Compliance: Apply AWS Well-Architected Framework and HPC security best practices to ensure compliance with enterprise, academic, or government standards.
- Collaboration and Leadership: Partner with application scientists, DevOps teams, and business stakeholders to translate workload requirements into optimized HPC architectures. Provide mentoring and technical leadership across multidisciplinary teams.
- Documentation and Knowledge Sharing: Develop architecture diagrams, reference implementations, and technical playbooks to support ongoing HPC adoption and operations.
- Required Skills & Experience:
- 8-10+ years of experience in high-performance computing, distributed systems, or cloud architecture.
- Proven expertise in AWS HPC services (EC2 HPC, ParallelCluster, Batch, FSx for Lustre, EFA).
- Strong knowledge of Linux systems administration, networking (Infiniband, EFA, MPI), and job schedulers (Slurm, Torque, PBS Pro).
- Hands-on experience with automation and IaC (Terraform, Ansible, CloudFormation).
- Scripting and development proficiency (Python, Bash, or similar).
- Experience with monitoring tools (CloudWatch, Grafana, Prometheus) and cost-optimization strategies.
- AWS Certified Solutions Architect Professional or AWS Certified Advanced Networking preferred.
- Bachelor s or Master s degree in Computer Science, Engineering, or related technical field.
- Preferred Attributes:
- Experience with GPU workloads, containerized HPC (ECS/EKS with ParallelCluster), or hybrid/on-prem to cloud HPC migrations.
- Strong communication and presentation skills for executive and technical audiences.
- Demonstrated thought leadership in HPC strategy, performance benchmarking, and AWS innovation.
Apply Now
Apply Now