Lead Infrastructure Engineer

NexGen Cloud
Full Time United Kingdom Posted 1 week ago
Apply in 1 click

Job Overview

This role involves leading the technical aspects of OpenStack and Kubernetes platforms optimized for GPU workloads in a fast-growing cloud infrastructure company. You will own the performance, reliability, and evolution of these platforms while leading a small team of engineers, ensuring scalability for AI, ML, and HPC applications in a remote UK/Europe environment.

Responsibilities

  • Own and drive the design, deployment, and operation of OpenStack and Kubernetes clusters optimised for GPU workloads
  • Lead and develop a team of 4–5 infrastructure engineers, setting clear direction and standards
  • Build and improve infrastructure through automation (IaC, GitOps, CI/CD pipelines)
  • Ensure platform reliability through strong monitoring, observability, and incident management practices
  • Collaborate closely with DevOps, Product, and Support teams to align infrastructure with real-world customer needs
  • Identify opportunities to simplify, standardise, and scale systems as the platform grows
  • Take ownership of operational governance including incident, problem, and change management
  • Communicate clearly with leadership on platform performance, risks, and improvements

Qualifications

  • Strong hands-on experience operating OpenStack in production environments
  • Experience running production-grade Kubernetes clusters (ideally bare metal or private cloud)
  • Solid Linux, networking, and storage fundamentals with a pragmatic troubleshooting approach
  • Experience with infrastructure automation, CI/CD, and Git-based workflows
  • Ability to work in a fast-moving, scale-up environment
  • Proven leadership or mentoring experience within infrastructure/platform teams
  • Experience managing incidents and coordinating response during critical service events
  • Strong communication skills, particularly translating technical issues to non-technical stakeholders
  • Nice to have: Experience integrating Kubernetes with OpenStack; Exposure to GPU infrastructure, HPC, or large-scale compute platforms; Familiarity with advanced networking or cloud-native ecosystems; Contributions to open-source or cloud-native communities