Director, NSV Automation
NVIDIA
NVIDIA is seeking an experienced Director to lead the NSV Automation group within Networking Solution Validation. In this role, you will manage and grow a group responsible for building large-scale automation platforms that validate the performance, stability, and reliability of NVIDIA’s AI networking solutions. As part of the Networking Cluster Solutions (NCS) organization, you will partner closely with validation, architecture, and product teams to ensure our networking fabrics perform reliably at massive scale. This role combines strong people leadership with deep technical ownership of automation systems that support both performance benchmarking and long-haul stability testing.
What You’ll Be Doing:
- Lead and Grow the group: Build, mentor, and manage a high-performing group of automation and software engineers, including hiring, career development, and technical leadership.
- Own the Automation Strategy: Define and drive the roadmap for NSV automation platforms, balancing near-term execution with long-term scalability and maintainability.
- Platform Engineering: Oversee the design and delivery of automation systems that validate network performance (latency, throughput) and reliability (uptime, fault recovery) at scale.
- Resilience & Chaos Engineering: Design automated "stress-and-survive" frameworks that inject faults, link flaps, and congestion to ensure the fabric is self-healing and loss-free.
- Cross-Functional Leadership: Work closely with NSV execution teams, NCS leadership, and partner organizations to align automation priorities with validation needs and business goals.
- Execution & Delivery: Ensure the team delivers reliable, production-grade software platforms that support continuous validation across multiple programs.
What We Need to See:
- B.Sc. in Computer Science, Engineering, or a related field.
- Overall 15+ years of experience, 10+ years of them in software engineering, preferably in distributed systems, infrastructure, networking, or cloud platforms.
- 5+ years of experience in managing managers and scaling engineering teams, including ownership of complex, mission-critical software.
- Strong system-level thinking with the ability to identify scalability, reliability, and performance bottlenecks.
- Proven ability to balance long-running stability efforts with fast-paced performance and feature validation.
Ways to Stand Out from the Crowd:
- Experience in high-performance networking environments (e.g., InfiniBand, RoCE, NCCL, or similar technologies).
- Background in building automation or validation platforms driven by telemetry and data-based decision making.
- Exposure to applying AI-based or agent-driven approaches to debugging, analysis, or infrastructure automation.
- Experience in HPC, large-scale distributed systems, or other environments where performance and reliability are critical.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
#LI-Hybrid