Site Reliability Engineer - CTJ - Poly Job at Microsoft Corporation, Redmond, WA

azFvTzI4TytxZjNVUTBvNk5aV1hMcXpE
  • Microsoft Corporation
  • Redmond, WA

Job Description

Overview

Leverages end-to-end technical expertise in large scale distributed systems' infrastructure, code, inter- and intra-service dependencies, and operations to proactively and continuously improve the reliability, performance, efficiency, latency, and scalability of services and/or products operating at scale. Partners with software engineering product teams by suggesting scalable ways to optimize code, sharing expertise and insights drawn from working across related services or products, and participating in incident response throughout development and operations lifecycles. Develops code, scripts, systems, and/or tools that reduce operational burden by automating complex and repetitive tasks, enable product engineering teams to increase the velocity at which they can safely deploy changes to production, and monitor the effects of changes across systems, services, and/or products. Analyzes telemetry data to develop capacity planning models, identify patterns and trends that drive continuous improvement, and highlight opportunities to deploy automation to monitor and manage services and/or products. Participates in on-call rotations to resolve live site incidents, minimize customer impact, and document solutions and insights that inform ongoing improvements to infrastructure, code, tools, and/or processes that prevent the recurrence of similar issues.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Qualifications

Required / Minimum Qualifications:

  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.

Other Requirements:

Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:  

  • The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph. Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.
  • Clearance Verification : This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.
  • Microsoft Cloud Background Check : This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.  
  • Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance

Preferred Qualifications:

  • Experience working on large-scale distributed services with on-call responsibilities.  
  • Ability to build and influence broadly towards common goals and priorities.  
  • Experience with distributed database systems such as SQL and PostgreSQL.

Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $100,600 - $199,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $131,400 - $215,400 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Microsoft will accept applications for the role until October 31, 2025

#Silver

Responsibilities

  • Independently creates, tests, and deploys changes through a safe deployment process (SDP) to enhance code quality and improve the observability, security, reliability and operability of one or more platforms, systems, or products operating at scale.
  • Leverages technical expertise in cloud technologies and specific products, as well as objective insights drawn from analyses of production telemetry data to suggest changes or add-ons to product features or the automation to improve product components or features supported by their team.
  • Engages with product engineering teams by participating code/design reviews, regular meetings, on-call rotations and incident responses throughout product development and operations cycles. Utilizes technical knowledge of systems/platforms and insights drawn from product engineering teams, security best practices, artificial intelligence (AI)/machine learning (ML), and telemetry analyses to suggest potential improvements in code base and designs across components and features of one or more products.
  • Independently writes code or scripts that automate the performance of scalable operations processes (e.g., monitoring, alerting, deploying products and updates) across components and features of products operating at scale.
  • Develops alerts and instrumentation across components and features to monitor product capacity, related security risk, and resource demands and analyze telemetry data using existing capacity planning models. Draws insights from analyses of capacity and resource data to optimize component and feature code to manage resources and capacity across limited range of use conditions and system parameters.
  • Independently uses existing tools and/or models to troubleshoot problems or flaws affecting the availability, security, reliability, performance, and/or efficiency of components and features, leveraging the artificial intelligence (AI) and machine learning (ML) capabilities. Proposes solutions that will resolve and prevent recurring issues and brings them to the attention of their Site Reliability Engineering (SRE) and/or product engineering teams.
  • Utilizes insights from performance and resource monitoring tools to identify whether there is a need to optimize the efficiency of component and feature code, or if changes to compute resources are required. Models the predicted effect of changes to code and/or compute resources across components or features to document the efficacy of proposed solutions. Proposes changes and drives implementation of solutions to identified performance and resource challenges.
  • Embody our and 

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

Industry leading healthcare

Educational resources

Discounts on products and services

Savings and investments

Maternity and paternity leave

Generous time away

Giving programs

Opportunities to network and connect

Job Tags

Full time, Local area,

Similar Jobs

Sinch

Senior Income Tax Manager Job at Sinch

 ...and negotiating the best settlements for the company. Research newtaxlegislation and impact to the company. Assist with transfer pricing documentation requests, if required. Assist othertaxdepartment members with month-end and year-end accruals. Maintain... 

Engbrecht Agency Staffing

Work From Home — Commission-Based Sales Career Job at Engbrecht Agency Staffing

 ...the Engbrecht Agency asa Remote Life Insurance Representative. Youll help families protect their loved ones while working entirely from home. What Youll Do: Connect with...  ...have or be willing to obtain your life/health insurance licenses (We help you do this... 

TEL Staffing & HR

T700 Engine Mechanic Job at TEL Staffing & HR

 ...located in Stroud, Ok. Our client, a leading turbine engine repair, maintenance, and overhaul (MRO) company specializing in the Honeywell T53, Pratt & Whitney PT6A & PT6T, and GE T700 currently has a position open for an experienced Progressive Inspector. Job... 

One Robert Wood Johnson Place

Clinical Care Technician, Medical-Surgical Unit-IV Job at One Robert Wood Johnson Place

Job Title: Clinical Care Technician Location: Main Hospital - New Brunswick Department Name: Medical-Surgical Unit-IV Req #: 0000220668 Status: Hourly Shift: Day Pay Range: $20.11 - $28.67 per hour Pay Transparency: The above reflects the anticipated...

Young Chevrolet Cadillac Inc

Body Shop Technician Job at Young Chevrolet Cadillac Inc

Are you an i-Car Certified Body Technician looking for an exciting opportunity with a reputable dealership? Young Chevrolet of St. John...  ...repair damaged parts* Communicate status of repairs clearly to body shop manager* Communicate with parts department to obtain necessary...