Senior GPU Optimisation Engineer

Experience: 4.0-10 years Location: BangaloreCTC: 40 –65L

Senior backend Engineer with expertise in GPU optimization and inference performance tuning for AI models.

About this role

Senior GPU Optimisation Engineer

About Smallest.ai

Smallest AI is one of the world's fastest-growing speech AI research companies. We have some of the world's best speech AI models, powering over 30,000 customers across 12 different countries to automate their calling operations. At heart, we are engineers, researchers, and a bunch of nerds who love diving deep into AI.

Role

We're hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You'll work across CUDA kernels, model graph optimizations, hardware-specific tuning, and porting models across GPU architectures. Your work directly impacts the latency, throughput, and reliability of smallest's real-time speech models.

What You'll Do

Partner with the research and infra teams to ensure the entire stack is optimized for real-time workloads
Work with TensorRT, ONNX Runtime, and custom runtimes for deployment
Port models across GPU chipsets (NVIDIA → AMD / edge GPUs / new compute backends)
Benchmark and calibrate inference across NVIDIA, AMD, and potentially emerging accelerators
Tune models to fit GPU memory limits while maintaining quality
Perform operator fusion, graph optimization, and kernel-level scheduling improvements
Design and implement custom kernels (CUDA/Triton/Tinygrad) for performance-critical model sections
Profile models end-to-end to identify GPU bottlenecks — memory bandwidth, kernel launch overhead, fusion opportunities, quantization constraints
Optimize model architectures (ASR, TTS, SLMs) for maximum performance on specific GPU hardware

Requirements

Deep understanding of GPU architecture (NVIDIA, AMD)
Experience with CUDA, TensorRT, ONNX Runtime
Proficiency in kernel-level optimization and profiling
Experience with model quantization and inference optimization
Strong background in performance-critical systems

Senior GPU Optimisation Engineer

About this role

Senior GPU Optimisation Engineer

About Smallest.ai

Role

What You'll Do

Requirements

Title

Quick Search