Job Description
Note: The job is a remote job and is open to candidates in USA. DigitalOcean is a cutting-edge technology company focused on simplifying cloud and AI for builders. They are seeking an entry-level Systems Engineer to optimize and troubleshoot data center hardware and deploy firmware across their products and components.
Responsibilities
- Work with vendors and internal peer teams on qualifying, onboarding, and delivering new firmware to the DigitalOcean ecosystem
- Act as Tier 3 escalation on-call for triage, investigation, and resolution of system firmware issues in the DigitalOcean fleet (both customer-facing and internal)
- Participate in 24/7 on-call rotation with other members of the team
- Improve existing firmware and hardware configuration automation/validation, for both hardware platforms and components (such as NIC, Storage and BMC)
- Engage with hardware vendors about new automation features and existing bugs
- Help with development of tooling and associated runbooks to address gaps in operational capabilities around hardware and firmware operations
- Coordinate with Ops teams on monitoring thresholds, failure modes and alerting
- Assist in troubleshooting causes of failures and work to prevent them in the future
- Raise the quality bar in the delivery of our cloud infrastructure by identifying industry best practices and working to adopt them
Skills
- Technical Degree (BS Computer Science/Engineering) or equivalent practical experience
- Strong understanding of x86 server hardware architecture and subsystems
- Ideally, you've worked with non-x86 hardware too!
- Demonstrated professional proficiency in configuration management best-practices (we use Ansible and Chef)
- Experience automating server firmware components at large-scale using industry-standard tooling (Redfish, IPMI, etc) including a deep understanding of benchmarking, automating test frameworks, and process automation in general
- Practical knowledge of PXE boot, UEFI, Linux/OS boot, AMI/OEM BIOS distributions, OpenBMC/AMI/OEM BMC implementations, RAID and other storage resiliency technologies, and the full Network stack- from NIC firmware to TCP/IP
- Adept at Linux (or Unix) operating systems. You'll be spending a lot of time working in one!
- Comfortable with version control systems (we use Git) and proficient in at least one programming language (such as Python or Go)
- Ability to participate in 24/7 on-call rotation with other members of the team
- Excellent communication skills, both within the team and with the broader company
- Have an insatiable passion for hardware, both new and old
Benefits
- Reimbursement for relevant conferences, training, and education
- Access to LinkedIn Learning's 10,000+ courses
- Employee Assistance Program
- Local Employee Meetups
- Flexible time off policy
- Bonus in addition to base salary
- Equity compensation to eligible employees
- Equity grants upon hire
- Option to participate in our Employee Stock Purchase Program
Company Overview
Company H1B Sponsorship
Apply To This Job