- Career Center Home
- Search Jobs
- Site Reliability Engineer III
Results
Job Details
Explore Location
JPMorganChase
Hyderabad, Indiana, India
(on-site)
Posted
1 day ago
JPMorganChase
Hyderabad, Indiana, India
(on-site)
Job Type
Full-Time
Job Function
Banking
Site Reliability Engineer III
The insights provided are generated by AI and may contain inaccuracies. Please independently verify any critical information before relying on it.
Site Reliability Engineer III
The insights provided are generated by AI and may contain inaccuracies. Please independently verify any critical information before relying on it.
Description
There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Infrastructure Platforms team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
- Own L1.5/L2 production support, participate in on‑call rotations, and drive rapid triage, containment, and recovery for incidents.
- Lead post‑incident reviews and implement preventative actions to eliminate repeat issues and reduce operational risk.
- Define and maintain SLIs/SLOs and error budgets for critical user journeys, integrating them with change guardrails to balance velocity and reliability.
- Conduct capacity and performance analysis; design and validate resilience patterns such as high availability, failover, and disaster recovery tests.
- Implement and standardize metrics, logs, and traces; build actionable dashboards and alerts that improve signal‑to‑noise.
- Tune alert policies to reduce noise and improve MTTD/MTTR, leveraging APM/AIOps to accelerate root‑cause analysis.
- Build and maintain CI/CD pipelines (e.g., Jenkins, GitHub Actions, GitLab CI), manage artifact/versioning, and orchestrate environment promotions.
- Enable pre/post‑deploy checks, canary/blue‑green strategies where feasible, and automated rollback to reduce change failure rate.
- Develop Python‑based automation for self‑healing, runbook execution, health checks, and operational workflows with tests and code quality gates.
- Support and harden platform/data components including Redis, RDBMS, and Kafka by managing topic lifecycle, capacity, retention, replication, and failover.
- Adhere to governance by executing change management, patching, and vulnerability SLAs; maintain environment configuration integrity across dev/QA/UAT/prod.
Required qualifications, capabilities, and skills
- Formal training or certification on Site Reliability concepts and 3+ years applied experience
- Hands on experience in SRE/Production Engineering/DevOps with a pronounced application support focus.
- Program in Python to automate operations (APIs, CLIs, schedulers) with unit tests, linters, and code quality practices.
- Operate CI/CD systems such as Jenkins, GitHub Actions, or GitLab CI, including YAML pipelines, secrets management, and deployment automation.
- Configure and use metrics/dashboards with Prometheus and Grafana or equivalents to monitor service health.
- Implement centralized logging using ELK/Opensearch/Kibana or equivalents for efficient troubleshooting.
- Instrument and trace services with OpenTelemetry and Jaeger/Tempo or equivalents to improve observability.
- Leverage APM/AIOps platforms such as Dynatrace, New Relic, or AppDynamics or equivalents to speed diagnosis and remediation.
- Administer Linux, write shell scripts, use Git, and troubleshoot basic networking and HTTP issues.
- Operate Redis and at least one RDBMS; configure Kafka topics and plan capacity for reliable messaging.
- Run high‑quality incident management: take on‑call, drive fast recovery, conduct PIRs, and prevent repeat issues.
Preferred qualifications, capabilities, and skills
- Operate within highly regulated or large‑scale environments, ideally including financial services.
- Implement Infrastructure as Code with Terraform/Ansible and run containers with Docker/Kubernetes.
- Adopt progressive delivery with feature flags and canaries, integrate automated testing frameworks, and enforce policy‑as‑code.
- Execute performance engineering, load testing, capacity modeling, and optimize costs with unit‑economics dashboards.
Job ID: 82998913
Please refer to the company's website or job descriptions to learn more about them.
View Full Profile
More Jobs from JPMorganChase
Software Engineer III - Big Data Pyspark, Java And AWS
Wilmington, Delaware, United States
1 day ago
Lead Java Software Engineer in FX Payments tech
London, United Kingdom
1 day ago
Industry Manager - Aerospace, Defense, and Government Services (ADG), Commercial Banking - Managing Director
Washington, Dist. Columbia, United States
1 day ago
Jobs You May Like
Median Salary
Net Salary per month
$1,286
Cost of Living Index
21/100
21
Median Apartment Rent in City Center
(1-3 Bedroom)
$221
-
$572
$397
Safety Index
57/100
57
Utilities
Basic
(Electricity, heating, cooling, water, garbage for 915 sq ft apartment)
$21
-
$159
$43
High-Speed Internet
$5
-
$16
$8
Transportation
Gasoline
(1 gallon)
$4.32
Taxi Ride
(1 mile)
$0.69
Data is collected and updated regularly using reputable sources, including corporate websites and governmental reporting institutions.
Loading...
