Sr. Hardware Reliability Engineer, Infrastructure Reliability & Quality
Descrizione dell'offerta
Sr. Hardware Reliability Engineer, Infrastructure Reliability & Quality
Job ID: | Amazon Data Services, Inc.
As an Infrastructure Reliability Engineer you will be proactively driving the reliability risk identification, assessment, and mitigation for datacenter infrastructure equipment (e.g., Air Handling Units, LV Generator, MV Transformers, LV SWGR, Breakers, UPS, Chillers, etc.). You will also be responsible for root cause analysis of critical equipment failures and for driving continuous improvements to enhance datacenter availability for AWS customers. Your role involves working closely with internal and external partners, including suppliers, to drive product specifications, risk identification plans, and execution. You must be ownership‑mindful, independent, action‑oriented, and results‑focused to succeed in an open, collaborative environment.
The candidate should have experience in using a Physics‑of‑Failure based approach to develop and implement both analytical and empirical methods for product quality/reliability risk identification and assessment during product design, manufacture, and deployment stages. The individual should be able to carry out lifecycle environmental and operational stress‑driven risk analysis, including thermal, electrical, chemical, and mechanical stresses, to identify overstress and fatigue‑related product weaknesses. Candidate should also evaluate electronics manufacture process quality and reliability issues. Knowledge of statistical techniques and models is required to analyze test and field data.
At the component level, you will drive critical component identification and associated vendor selection and qualification requirements. At the system level, you will develop datacenter system‑level reliability models and related reliability quantification and risk analysis for datacenter configuration optimization. During the sustaining stage you will monitor product performance in the field, conduct root cause analysis of critical failures, and drive corrective and preventive actions. You will also drive effective vendor auditing and quarterly review processes to continuously improve datacenter availability.
Key Job Responsibilities
- Drive DFR (Design for Reliability) methodology to proactively design‑in reliability in new product designs.
- Drive reliability/quality qualification of third‑party critical infrastructure equipment for use in AWS data centers.
- Oversee factory and site testing of third‑party equipment in all LLE categories (Liquid Cooling, generator, chiller, air handler, etc.).
- Guide and support root cause analysis of field failures performed by internal teams, OEMs, and external laboratories, validating conclusions and ensuring the highest testing and remediation standards.
- Make recommendations about AWS infrastructure maintenance and equipment replacement based on reliability data.
- Provide feedback to sourcing/procurement teams for evaluation of vendor performance.
- Analyze internal reliability data and create metrics to drive highest reliability at lowest cost.
- Support DFMEAs on an as‑needed basis.
- Develop end‑of‑life strategy for critical infrastructure equipment.
Basic Qualifications
- Experience in industrial or commercial engineering in mission‑critical facilities, including but not limited to data centers, power generation, or oil and gas facilities.
- Bachelor's or Master’s degree in Reliability Engineering, Physics, Electrical, Mechanical or Materials Engineering, or a related field.
- 8+ years of Reliability Engineering work experience in a high‑reliability industry.
- 3+ years of experience with accelerated life testing, stress analysis, and finite element analysis.
Preferred Qualifications
- 10+ years of work experience in reliability risk identification and assessment, from component to system level, applying analytical, experimental, and statistical approaches to evaluate product design and manufacture quality/reliability levels.
- Experience with proactive and effective reliability approaches in a cost‑effective manner throughout product design, manufacture, and deployment stages.
- Proven experience working with external design and manufacturing supply‑chain partners.
- Excellent verbal and written communication skills.
Amazon is an equal‑opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
The base salary range for this position is $136,600.00 – $184,800.00 USD annually. Your Amazon package will include sign‑on payments and restricted stock units (RSUs). Final compensation is determined based on experience, qualifications, and location. Amazon also offers comprehensive benefits, including health insurance, 401(k) matching, paid time off, and parental leave.
#J-18808-Ljbffr