Senior Infrastructure Engineer

NexGen Cloud
Full Time London, United Kingdom Posted 1 week ago
Apply in 1 click

Job Overview

This role involves owning the design, deployment, and operation of OpenStack and Kubernetes environments as the platform scales globally. Focus on business-critical infrastructure impacting performance, reliability, and customer experience for GPU workloads in a high-performance cloud platform.

Responsibilities

  • Own the design, deployment, and operation of OpenStack and Kubernetes environments.
  • Ensure platform performance, scalability, and resilience for GPU workloads.
  • Build and improve infrastructure using infrastructure-as-code and GitOps practices.
  • Drive automation across provisioning, deployment, and operational workflows.
  • Optimize GPU workload scheduling using Kubernetes and NVIDIA tooling.
  • Implement monitoring, logging, and alerting to ensure platform stability.
  • Lead incident response and continuous improvement of reliability.
  • Maintain strong security controls across infrastructure and container layers.
  • Implement RBAC, network policies, and tenant isolation.
  • Work closely with Platform, DevOps, AI, Product, and Support teams to align infrastructure with requirements.

Qualifications

  • Strong hands-on experience running OpenStack in production environments.
  • Proven experience operating Kubernetes at scale, ideally bare-metal or private cloud.
  • Solid understanding of Linux, networking, and storage systems.
  • Experience with infrastructure automation, CI/CD, and Git-based workflows.
  • Ability to troubleshoot complex infrastructure and performance issues.
  • Strong ownership mindset, comfortable operating without heavy oversight.
  • Ability to simplify and scale systems in a fast-moving environment.