Senior SW Engineer - AI Infrastructure & Optimization
Posted on Jun 12, 2026
Senior SW Engineer – AI Infrastructure & Optimization
- Artificial Intelligence
- Israel
- Full-time
Description
We are looking for a Senior Software Engineer to help build and optimize large-scale, high-performance GenAI infrastructure and inference systems on Kubernetes.
As AI workloads increasingly move toward Kubernetes-native infrastructure, we are building systems that support distributed inference, performance optimization, reliability, observability, and production-grade deployment at scale.
This role is ideal for an engineer who can reason deeply about systems, performance, tradeoffs, and reliability, and who is comfortable owning difficult technical decisions end-to-end.
You will work across inference serving, distributed systems, optimization, and Kubernetes-native AI infrastructure.
What You’ll Do
- Build and optimize high-performance Kubernetes-native GenAI inference systems
- Work with modern inference stacks such as vLLM, SGLang, TensorRT-LLM, and related tooling
- Work with Kubernetes-native distributed LLM inference frameworks such as llm-d and NVIDIA Dynamo
- Design and implement optimization algorithms and performance improvements
- Improve reliability, observability, deployment, and operational maturity of AI systems
- Make architectural decisions and take ownership of technical outcomes
- Collaborate with a small, senior engineering team focused on performance and production quality
Requirements
Required Qualifications
- Minimum 5 years of experience as a Software Engineer, with strong software engineering and system design skills.
- Programming experience in Go and Python
- Hands-on experience with the Kubernetes ecosystem, including Operators, service meshes, GitOps, Gateway API, and OpenTelemetry
- Experience with cloud platforms
- Strong understanding of optimization algorithms and performance engineering
- Ability to independently drive technical initiatives from concept to production
- Strong systems thinking and debugging skills
- Comfort operating in environments with high autonomy and responsibility
Nice to Have
- Experience with modern LLM inference frameworks such as vLLM, SGLang, or TensorRT-LLM
- Experience with distributed LLM inference frameworks such as llm-d or NVIDIA Dynamo
- Contributions to open-source Kubernetes or ML infrastructure projects
- GPU performance optimization and profiling experience
- Familiarity with CUDA, NCCL, or Triton kernels
- Experience running GenAI systems at scale in production