Senior Site Reliability Engineer

Redhat
Full Time Bangalore, Karnataka, India Posted 1 week ago
Apply in 1 click

Job Overview

Develop, scale, and operate OpenShift managed cloud services, Red Hat’s enterprise Kubernetes distribution. Contribute to running OpenShift at scale by enabling customer self-service, improving monitoring sustainability, and automating work. Influence complex scale challenges unique to managed cloud services using skills in coding, operations, and large-scale distributed system design. Work in a global, transparent team environment that fosters learning from failures and supports continuous improvement.

Responsibilities

  • Contribute code to increase the scalability and reliability of the service
  • Contribute software tests and participate in peer review to increase the quality of our codebase
  • Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration
  • Participate in a regular on-call schedule, including occasional paid weekends and holidays
  • Practice sustainable incident response and blameless postmortems
  • Resolve customer issues escalated from the Red Hat Global Support team
  • Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve

Qualifications

  • Bachelor’s degree in Computer Science or related technical field involving software or systems engineering (or equivalent hands-on experience in Site Reliability Engineering)
  • Experience programming in at least one language: Python, Golang, Java, C, C++ or another object-oriented language
  • Experience working with public clouds such as AWS, GCP, or Azure
  • Ability to collaboratively troubleshoot and solve problems in a team setting
  • Experience troubleshooting as-a-service offerings (SaaS, PaaS, etc.) and working with complex distributed systems
  • Direct experience with Kubernetes or OpenShift is a plus
  • Demonstrated ability to debug, optimize code and automate routine tasks
  • Basic understanding of Unix/Linux operating systems
  • Desired: 5+ years managing Linux servers (RHEL, CentOS, or Fedora) in cloud (AWS, GCE, Azure)
  • Desired: 3+ years with enterprise systems monitoring (Prometheus a plus)
  • Desired: 3+ years with configuration management (Ansible, Puppet, Chef)
  • Desired: 2+ years programming with Golang, Java, or Python
  • Desired: 2+ years delivering a hosted service
  • Desired: Ability to quickly troubleshoot system issues
  • Desired: Understanding of TCP/IP networking and protocols like DNS and HTTP
  • Desired: Solid communication skills and customer interaction experience
  • Desired: 1+ year with Kubernetes or Docker-based containers