Job Overview This role involves leading architecture and solution design for AI/ML networking infrastructure, data center, and WAN networking opportunities...
Lead Infrastructure Engineer
NexGen CloudJob Overview
This role involves leading the technical direction for OpenStack and Kubernetes platforms optimized for GPU workloads in a fast-growing cloud infrastructure company. You will own the performance, reliability, and evolution of these platforms while managing a small team of engineers, ensuring scalability to meet global demand for AI, ML, and HPC applications.
Responsibilities
- Own and drive the design, deployment, and operation of OpenStack and Kubernetes clusters optimized for GPU workloads
- Lead and develop a team of 4–5 infrastructure engineers, setting clear direction and standards
- Build and improve infrastructure through automation (IaC, GitOps, CI/CD pipelines)
- Ensure platform reliability through strong monitoring, observability, and incident management practices
- Collaborate closely with DevOps, Product, and Support teams to align infrastructure with real-world customer needs
- Identify opportunities to simplify, standardize, and scale systems as the platform grows
- Take ownership of operational governance including incident, problem, and change management
- Communicate clearly with leadership on platform performance, risks, and improvements
Qualifications
- Strong hands-on experience operating OpenStack in production environments
- Experience running production-grade Kubernetes clusters (ideally bare metal or private cloud)
- Solid Linux, networking, and storage fundamentals with a pragmatic troubleshooting approach
- Experience with infrastructure automation, CI/CD, and Git-based workflows
- Ability to work in a fast-moving, scale-up environment
- Proven leadership or mentoring experience within infrastructure/platform teams
- Experience managing incidents and coordinating response during critical service events
- Strong communication skills, particularly translating technical issues to non-technical stakeholders