Site Reliability Engineer - CTJ - Poly

🌍 Remote, USA 🎯 Full-time 🕐 Posted Recently

Job Description

Microsoft is a leading technology company dedicated to empowering every person and organization on the planet. The Site Reliability Engineer will leverage technical expertise to improve the reliability and performance of large-scale distributed systems, collaborating with product teams and automating operational tasks to enhance service efficiency.


Responsibilities

  • Independently creates, tests, and deploys changes through a safe deployment process (SDP) to enhance code quality and improve the observability, security, reliability and operability of one or more platforms, systems, or products operating at scale
  • Leverages technical expertise in cloud technologies and specific products, as well as objective insights drawn from analyses of production telemetry data to suggest changes or add-ons to product features or the automation to improve product components or features supported by their team
  • Engages with product engineering teams by participating code/design reviews, regular meetings, on-call rotations and incident responses throughout product development and operations cycles
  • Utilizes technical knowledge of systems/platforms and insights drawn from product engineering teams, security best practices, artificial intelligence (AI)/machine learning (ML), and telemetry analyses to suggest potential improvements in code base and designs across components and features of one or more products
  • Independently writes code or scripts that automate the performance of scalable operations processes (e.g., monitoring, alerting, deploying products and updates) across components and features of products operating at scale
  • Develops alerts and instrumentation across components and features to monitor product capacity, related security risk, and resource demands and analyze telemetry data using existing capacity planning models
  • Draws insights from analyses of capacity and resource data to optimize component and feature code to manage resources and capacity across limited range of use conditions and system parameters
  • Independently uses existing tools and/or models to troubleshoot problems or flaws affecting the availability, security, reliability, performance, and/or efficiency of components and features, leveraging the artificial intelligence (AI) and machine learning (ML) capabilities
  • Proposes solutions that will resolve and prevent recurring issues and brings them to the attention of their Site Reliability Engineering (SRE) and/or product engineering teams
  • Utilizes insights from performance and resource monitoring tools to identify whether there is a need to optimize the efficiency of component and feature code, or if changes to compute resources are required
  • Models the predicted effect of changes to code and/or compute resources across components or features to document the efficacy of proposed solutions
  • Proposes changes and drives implementation of solutions to identified performance and resource challenges
  • Embody our culture and values

Skills

  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Candidates must be able to meet Microsoft, customer, and/or government security screening requirements are required for this role
  • The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination
  • This position requires successful verification of the stated security clearance to meet federal government customer requirements
  • You will be asked to provide clearance verification information prior to an offer of employment
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • This position requires verification of U.S. citizenship due to citizenship-based legal restrictions
  • Citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance
  • Experience working on large-scale distributed services with on-call responsibilities
  • Ability to build and influence broadly towards common goals and priorities
  • Experience with distributed database systems such as SQL and PostgreSQL

Company Overview

  • Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services. It was founded in 1975, and is headquartered in Redmond, Washington, USA, with a workforce of 10001+ employees. Its website is https://www.microsoft.com.

  •  

    Apply To This Job

    Ready to Apply?

    Don't miss out on this amazing opportunity!

    🚀 Apply Now

    Similar Jobs

    Recent Jobs

    You May Also Like