Lead SW Architect

Neureality
Neureality

Posted on Jun 14, 2026

Lead SW Architect

  • System Architecture
  • Israel
  • Full-time

Description

NeuReality is seeking a Lead System Architect to join our system architecture team and help define NR-NEXUS, our next-generation AI inference platform.

Responsibilities

  • Lead the software architecture and technical roadmap for NeuReality’s NR-Nexus
  • Write system specifications for NR-Nexus product
  • Research AI infrastructure, SaaS platforms, model serving, and inference trends
  • Work with engineering to translate technical capabilities into product value
  • Work closely with engineering teams to optimize performance, scalability, and feature delivery.
  • Define performance goals and lead profiling, benchmarking, and optimization efforts for GenAI and distributed AI workloads.
  • Collaborate with customers, partners, and open-source communities to ensure ecosystem compatibility and adoption.
  • Mentor software engineers and provide technical leadership

Requirements

  • 7+ years of software engineering experience, including 3+ years in software architecture or technical leadership.
  • Strong experience with Kubernetes-based platforms and cloud-native architecture.
  • Deep understanding of Gen AI/LLM infrastructure and distributed workloads
  • Experience designing management software or SaaS platforms for production systems.
  • Strong background in distributed systems, microservices, APIs, and automation.
  • Hands-on experience with observability stacks, monitoring, logging, alerting, and SLA/SLO tracking.
  • Experience with CI/CD, deployment automation, upgrades, and rollback mechanisms.
  • Good understanding of security, authentication, authorization, and integration with customer data center environments.

Nice to have

  • Deep understanding of GenAI / LLM inference infrastructure, including model serving, scaling, batching, latency, throughput, and resource utilization.
  • Experience with production AI inference clusters using GPUs, AI accelerators, or other specialized compute infrastructure.
  • Understanding of how distributed inference systems operate, including scheduling, load balancing, autoscaling, failover, and cluster-level observability.
  • Experience with LLM serving frameworks such as vLLM, Triton Inference Server, TensorRT-LLM, or similar.
  • Familiarity with GPU/accelerator orchestration, device plugins, resource scheduling, and cluster capacity planning.
  • Familiarity with GPU communication technologies such as GPUDirect RDMA, NCCL, NVLink, or UALink.
  • Experience optimizing communication for distributed AI/ML workloads.
  • Knowledge of Prometheus, Grafana, OpenTelemetry, Helm, Argo CD, Istio, KServe, Kubeflow, or similar tools.
  • Experience deploying software in on-prem, edge, private cloud, or hybrid environments.