Job Description:
Position Description:
Facilitates and orchestrates Data Recovery (DR) events including organizational cloud and on-premise routing, failovers, and evidence captures. Improves observability and resilience of software and infrastructure through incident root cause analysis using tools including Datadog and Splunk. Performs SSL certificate management work and renew certificates in non-prod and prod environments before expiry to continue workflow. Deploys and supports highly distributed multitiered systems at scale. Builds and operates highly resilient platforms in Amazon Web Services (AWS) Cloud environments. Designs, develops, and executes performance tests using Java, JMeter, Cloud-test, Rush-hour, and other performance testing tools to ensure comprehensive performance testing. Automates with scripting languages -- Python and Shell scripting. Works with Cloud Computing and DevOps concepts including Continuous Integration and Continuous Delivery (CI/CD) pipelines and Kubernetes. Builds and improves standard methodologies for performance, load, stress, and chaos testing, along with analytics and reports based on business requirements.
Primary Responsibilities:
- Defines and leads enterprise-level reliability strategies.
- Architects resilient systems and infrastructure.
- Creates and publishes performance test results report with recommendations on quality improvement.
- Maintains scalability and resiliency of complex environment.
- Implements advanced observability practices and techniques at scale.
- Manages and interprets large datasets using query languages and visualization tools.
- Advises senior leadership on reliability engineering best practices.
- Mentors junior engineers.
- Performs independent and complex technical and functional analysis for multiple divisional initiatives.
- Develops innovative solutions to improve system availability, scalability, and performance.
- Designs, implements, and maintains performance test frameworks.
Education and Experience:
Bachelor’s degree in Computer Science, Applied Computer Science, Engineering, Information Technology, Information Systems, or a closely related field (or foreign education equivalent) and five (5) years of experience as a Principal Site Reliability Engineer (or closely related occupation) managing mission-critical applications and administering resilient platform infrastructure across testing and production.
Or, alternatively, Master’s degree in Computer Science, Applied Computer Science, Engineering, Information Technology, Information Systems, or a closely related field (or foreign education equivalent) and three (3) years of experience as a Principal Site Reliability Engineer (or closely related occupation) managing mission-critical applications and administering resilient platform infrastructure across testing and production.
Skills and Knowledge:
Candidate must also possess:
- Demonstrated Expertise (“DE”) enabling an end-to-end, Continuous Integration and Continuous Delivery (CI/CD) platform using uDeploy, Jenkins Core, Ansible AWX, and Terraform; and modifying programming language scripts in Shell and Python for software application deployments and targeted utility purposes through Deployment as a Service (DAAS).
- DE enabling DevOps and Site Reliability Engineering (SRE) practices and principles in multi-Cloud environments, using automations and proactive monitoring for Azure and AWS services.
- DE configuring internet-facing network, traffic-routing, firewall, and webservers within a complex environment, using F5, AVI, AWS Route53, and Azure Load Balancer; and troubleshooting complex problems that span multiple component tiers and operating systems, using different Splunk and Datadog dashboards.
- DE creating and operating monitors and dashboards for the tracking, alerting, and presentation of application metrics, traces, and logs, using Datadog, Splunk, and Grafana.
#PE1M2
#LI-DNI
Fidelity’s Onsite Working Model
Fidelity is transitioning to a full-time onsite working model through a phased rollout across regions and roles. Currently, some roles and locations require 100% onsite presence, while others require less. Onsite expectations are likely to evolve as the rollout continues. This transition does not apply to fully remote roles.
Certifications:
Category:
Information TechnologyPlease be advised that Fidelity’s business is governed by the provisions of the Securities Exchange Act of 1934, the Investment Advisers Act of 1940, the Investment Company Act of 1940, ERISA, numerous state laws governing securities, investment and retirement-related financial activities and the rules and regulations of numerous self-regulatory organizations, including FINRA, among others. Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.