Senior LLM Agents Architect

NVIDIA

NVIDIA

Posted on May 22, 2026

We don't just build the hardware and software that powers the AI revolution — we are building the AI that designs the next generation of both. Our team sits at the intersection of inference software and GPU architecture, creating autonomous LLM-driven systems that reason about hardware, write high-performance CUDA, and automate the complex loops of architectural simulation, analysis, and optimization.

We are looking for a senior LLM Agents Architect to work hands-on with hardware architects, verification engineers, GPU performance experts, and software developers to build end-to-end agent flows that drive significant improvements in kernel optimization, architectural exploration, and developer efficiency.

What you'll be doing:

  • Design and build agentic AI systems that generate, analyze, and optimize GPU compute kernels — targeting speed-of-light performance on NVIDIA hardware.

  • Collaborate with GPU architects and performance engineers to encode domain expertise — memory hierarchy trade-offs, occupancy tuning, instruction-level reasoning — into agent workflows that rival hand-tuned optimization.

  • Build automated performance forensics agents capable of ingesting large-scale simulation traces and Nsight profiler data to identify bottlenecks and propose architectural or software mitigations.

  • Partner with HW architects to develop agentic flows for GPU architectural studies — enabling rapid what-if analysis across micro-architecture configurations such as cache sizing, memory controller design, and compute unit scaling.

  • Explore agentic approaches to HW/SW co-design challenges, including replacing or augmenting graph-compiler functionality (e.g., TorchInductor) with LLM-driven optimization and code-generation pipelines.

  • Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.

  • Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.

  • Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

What we need to see:

  • 8+ years in applied ML/AI or large-scale systems, with 2+ years crafting agentic or LLM-powered applications in production environments.

  • B.Sc in Computer Science / Electrical Engineering.

  • Solid grounding in computer architecture: memory hierarchies, parallelism models, pipelining, and cache behavior. Specific familiarity with NVIDIA GPU architecture — streaming multiprocessors, warp scheduling, shared/global memory model, and occupancy reasoning — is essential.

  • Hands-on CUDA programming experience: writing, profiling, and optimizing GPU kernels — not just calling into CUDA-accelerated libraries. Comfortable with tools such as Nsight Compute, Nsight Systems, or equivalent profiling workflows.

  • Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production — not just experience with off-the-shelf frameworks.

  • Strong software engineering skills in Python and one systems language (C++ preferred).

  • Proficient in tool use, RAG pipelines, and model adaptation techniques for building agentic systems.

  • Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.

  • Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.

  • Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.

Ways to stand out from the crowd:

  • Familiarity with the PyTorch compilation and lowering stack (torch.compile, TorchDynamo, TorchInductor, Triton, down to PTX), and with GPU graph compilers, kernel fusion strategies, or auto-tuning frameworks.

  • Background in performance engineering for HPC or GPU-accelerated workloads, including experience with performance modeling or hardware simulators.

  • Familiarity with distributed processing, multi-GPU workloads, and networking (e.g., NVLink, InfiniBand).

  • Familiarity with frontier agentic coding tools (e.g., Claude Code, Codex, Cursor) — understanding their underlying architecture: tool orchestration, context management, and autonomous task execution patterns.

  • Hands-on experience building a domain-specific coding agent — whether on top of frontier agentic harnesses (e.g., Claude Code, Codex SDK) or lower-level agent frameworks (e.g., LangChain/LangGraph deep agents, CrewAI). Comfortable with the design choices that make a coding agent useful in practice: task scoping, tool and context curation, evaluation, and failure recovery

Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/