Lead SW Architect
Posted on Jun 14, 2026
Lead SW Architect
- System Architecture
- Israel
- Full-time
Description
NeuReality is seeking a Lead System Architect to join our system architecture team and help define NR-NEXUS, our next-generation AI inference platform.
Responsibilities
- Lead the software architecture and technical roadmap for NeuReality’s NR-Nexus
- Write system specifications for NR-Nexus product
- Research AI infrastructure, SaaS platforms, model serving, and inference trends
- Work with engineering to translate technical capabilities into product value
- Work closely with engineering teams to optimize performance, scalability, and feature delivery.
- Define performance goals and lead profiling, benchmarking, and optimization efforts for GenAI and distributed AI workloads.
- Collaborate with customers, partners, and open-source communities to ensure ecosystem compatibility and adoption.
- Mentor software engineers and provide technical leadership
Requirements
- 7+ years of software engineering experience, including 3+ years in software architecture or technical leadership.
- Strong experience with Kubernetes-based platforms and cloud-native architecture.
- Deep understanding of Gen AI/LLM infrastructure and distributed workloads
- Experience designing management software or SaaS platforms for production systems.
- Strong background in distributed systems, microservices, APIs, and automation.
- Hands-on experience with observability stacks, monitoring, logging, alerting, and SLA/SLO tracking.
- Experience with CI/CD, deployment automation, upgrades, and rollback mechanisms.
- Good understanding of security, authentication, authorization, and integration with customer data center environments.
Nice to have
- Deep understanding of GenAI / LLM inference infrastructure, including model serving, scaling, batching, latency, throughput, and resource utilization.
- Experience with production AI inference clusters using GPUs, AI accelerators, or other specialized compute infrastructure.
- Understanding of how distributed inference systems operate, including scheduling, load balancing, autoscaling, failover, and cluster-level observability.
- Experience with LLM serving frameworks such as vLLM, Triton Inference Server, TensorRT-LLM, or similar.
- Familiarity with GPU/accelerator orchestration, device plugins, resource scheduling, and cluster capacity planning.
- Familiarity with GPU communication technologies such as GPUDirect RDMA, NCCL, NVLink, or UALink.
- Experience optimizing communication for distributed AI/ML workloads.
- Knowledge of Prometheus, Grafana, OpenTelemetry, Helm, Argo CD, Istio, KServe, Kubeflow, or similar tools.
- Experience deploying software in on-prem, edge, private cloud, or hybrid environments.