Job Overview This role involves leading architecture and solution design for AI/ML networking infrastructure, data center, and WAN networking opportunities...
Senior Infrastructure Engineer
NexGen CloudJob Overview
This role involves owning the design, deployment, and operation of OpenStack and Kubernetes environments as the platform scales globally. Focus on business-critical infrastructure impacting performance, reliability, and customer experience for GPU workloads in a high-performance cloud platform.
Responsibilities
- Own the design, deployment, and operation of OpenStack and Kubernetes environments.
- Ensure platform performance, scalability, and resilience for GPU workloads.
- Build and improve infrastructure using infrastructure-as-code and GitOps practices.
- Drive automation across provisioning, deployment, and operational workflows.
- Optimize GPU workload scheduling using Kubernetes and NVIDIA tooling.
- Implement monitoring, logging, and alerting to ensure platform stability.
- Lead incident response and continuous improvement of reliability.
- Maintain strong security controls across infrastructure and container layers.
- Implement RBAC, network policies, and tenant isolation.
- Work closely with Platform, DevOps, AI, Product, and Support teams to align infrastructure with requirements.
Qualifications
- Strong hands-on experience running OpenStack in production environments.
- Proven experience operating Kubernetes at scale, ideally bare-metal or private cloud.
- Solid understanding of Linux, networking, and storage systems.
- Experience with infrastructure automation, CI/CD, and Git-based workflows.
- Ability to troubleshoot complex infrastructure and performance issues.
- Strong ownership mindset, comfortable operating without heavy oversight.
- Ability to simplify and scale systems in a fast-moving environment.