Job Description:

Note: Fidelity will not provide immigration sponsorship for this position.

Job Description

This position is for a Sr. Site Reliability Engineer within the R4 Responsive OpsWorX Team covering multiple products in the Brokerage Recordkeeping Technology organization.

This Engineer will be responsible for responding to production incidents. You will closely work with our business partners responding to application specific questions and work with the product teams to promote availability, resilience, and stability.

The Expertise and Skills You Bring

  • Bachelor’s degree or higher in a technology related field (like Engineering, Computer Science, Information Technology) required, master’s degree is a plus.

  • Minimum 5 years of combined experience across Production Support, Application Development (Java), and Site Reliability Engineering (SRE) to ensure system stability, scalability, and performance.

  • Build, manage, and optimize resilient, scalable cloud platforms using AWS-native services, leveraging 3 years of hands-on experience with Amazon EKS and RDS.

  • Lead and execute cloud migration initiatives, ensuring minimal downtime, performance optimization, and adherence to architectural best practices.

  • Implement and maintain CI/CD pipelines to enable reliable, automated, and secure application deployments.

  • Ensure platforms meet high availability, scalability, fault tolerance, and disaster recovery requirements.

  • Design, implement, and continuously improve observability solutions, including:

    • Monitoring

    • Logging

    • Alerting

    • Distributed tracing
      using tools such as Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, and Splunk.

  • Instrument applications and infrastructure to provide end-to-end visibility into system health, performance, and reliability.

  • Proactively identify performance bottlenecks, capacity risks, and failure points; recommend and implement remediation strategies.

  • Lead incident response, providing rapid triage and resolution during production outages or performance degradation.

  • Conduct root cause analysis (RCA) for critical incidents and drive corrective and preventive actions.

  • Collaborate closely with development, infrastructure, security, and business teams to ensure alignment with operational and business objectives.

  • Analyze and reverse‑engineer existing applications to understand system behavior, integrations, and dependencies

  • Continuously evaluate emerging technologies, tools, and industry trends to improve platform reliability and operational efficiency.

  • Demonstrate adaptability and a strong learning mindset in a fast-paced, evolving environment.

Nice to Have Skills

  • AI. Apply Generative AI tools responsibly to improve productivity, including assisting with analysis, documentation, summarization, and ideation activities.

  • SQL. Utilize SQL and relational databases (Oracle or other RDBMS) to support application troubleshooting, reporting, and performance analysis.

  • Certification in public Cloud (AWS) or Kubernetes is a plus.

Certifications:

Category:

Information Technology

Please be advised that Fidelity’s business is governed by the provisions of the Securities Exchange Act of 1934, the Investment Advisers Act of 1940, the Investment Company Act of 1940, ERISA, numerous state laws governing securities, investment and retirement-related financial activities and the rules and regulations of numerous self-regulatory organizations, including FINRA, among others. Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.