Job Title: T&T | EAD | HCE | Senior Consultant/Manager | NVIDIA AI Infrastructure Architect | Pan India

T&T | EAD | HCE | Senior Consultant/Manager | NVIDIA AI Infrastructure Architect | Pan India
• Job requisition ID : 99255
• Location: Bengaluru
• Entity: Deloitte Touche Tohmatsu India LLP
The Team
Deloitte’s Technology & Transformation practice can help you uncover and unlock the value buried deep inside vast amounts of data. Our global network provides strategic guidance and implementation services to help companies manage data from disparate sources and convert it into accurate, actionable information that can support fact-driven decision-making and generate an insight-driven advantage. Our practice addresses the continuum of opportunities in business intelligence & visualization, data management, performance management and next-generation analytics and technologies, including big data, cloud, cognitive and machine learning.
Your work profile
- Strong understanding of NVIDIA Infrastructure.
- Experience with GPU, CUDA
Key skills required:
- 6+ Years and 3+ Years relevant Experience with NVIDIA Infra, CUDA and GPU
Languages:
- Strong NVIDIA Infrastructure, GPU and CUDA.
Preferred Qualification:
- We are seeking a highly experienced Senior NVIDIA AI Infrastructure Architect to lead the design, deployment, and operations of large-scale AI compute environments. This role is anchored in data center engineering, GPU infrastructure, and NVIDIA AI frameworks, with a strong focus on building robust, scalable, production-grade AI platforms.
- You will serve as the technical authority for GPU-accelerated infrastructure, driving architecture strategy, reference designs, performance optimization, and end-to-end operational excellence across on-prem and hybrid environments.
- Architecture, Design & Implementation
- Architect and deploy large-scale GPU-accelerated AI infrastructure based on NVIDIA platforms (H100/A100 systems, DGX, HGX, OVX).
- Lead end-to-end design for AI clusters, including networking (Ethernet/InfiniBand), storage, fabric topology, and high-availability requirements.
- Define the architecture for AI factories, high-density GPU clusters, and multi-node training platforms.
- Implement and optimize NVIDIA Base Command, NVIDIA AI Enterprise, and NGC stack components.
- NVIDIA AI Frameworks & Platforms. Integrate and optimize NVIDIA AI frameworks such as:
- NVIDIA Triton Inference Server. NVIDIA TensorRT / TensorRT-LLM
- NVIDIA CUDA, cuDNN, NCCL. NVIDIA NeMo, Riva, and Clara (as applicable)
- Work closely with data science/ML teams to map training and inference needs to GPU platform architectures.
Key Skills and Competencies:
- Lead large-scale deployments in enterprise data centers, including rack layout, thermals, power planning, and high-density cooling. {nice to have vs. must have}
- Oversee operational runbooks, monitoring, patching, firmware upgrades, and lifecycle management of GPU servers.
- Ensure high availability, resiliency, and scalable expansion of AI compute infrastructure.
- Performance Optimization & Reliability
- Tune training, inference, and workload orchestration pipelines for maximum GPU utilization and throughput.
- Optimize networking for multi-node, multi-GPU systems with RDMA, NVLink, NVSwitch.
- Conduct performance benchmarking using NVIDIA profiling tools.
Education
- Any Graduation, B.Tech. /B.E., MBA, MCA, BCA.
Location and Way of Working:
- Base location: PAN India (Deloitte India locations)
