Job Title: Director | AI / ML | Bengaluru | Engineering | Hybrid Cloud Engineering

Job requisition ID :: 107011

Date: Jul 3, 2026

Location: Bengaluru

Designation: Director

Entity: Deloitte Touche Tohmatsu India LLP

Your work profile

· AI Data Center Architecture & Solution Design

· Design and implement AI-focused Data Center architectures aligned with Tier II, Tier III, and Tier IV standards.

· Develop end-to-end AI Data Center solutions, including retrofitting traditional CPU-based data centers into AI Factories.

· Create advisory documents, RFPs, technical proposals, and commercial proposals for AI Data Center engagements.

· Design AI infrastructure solutions across hyperscalers (AWS, Azure, GCP, OCI) and NVIDIA Cloud Partners.

· Prepare HLDs, LLDs, network diagrams, rack layouts, BOQs, and TCO models.

· AI Networking & Fabric Architecture

· Architect and deploy InfiniBand and NVIDIA Spectrum Ethernet fabrics for AI workloads.

· Design and implement Spine-Leaf network architectures using EVPN-VXLAN overlays.

· Configure and optimize BGP, ECMP, RoCE, and high-performance networking environments.

· Lead Cumulus Linux-based deployments and network automation initiatives.

· Optimize network performance, latency, throughput, and congestion management for AI environments.

· AI Compute & GPU Infrastructure

· Design and size GPU clusters using NVIDIA H100, H200, B200, B300, DGX, and AI Factory platforms.

· Perform GPU capacity planning and workload profiling for AI and ML use cases.

· Implement GPU virtualization and Multi-Instance GPU (MIG) architectures.

· Support AI training and inference infrastructure deployments.

· AI Storage & Platform Engineering

· Design AI storage solutions utilizing NAS, SAN, NVMe, Object Storage, NFS, iSCSI, Fibre Channel, and parallel file systems.

· Implement and manage Kubernetes-based AI platforms, including OpenShift and VMware Tanzu.

· Deploy and integrate RUN and Slurm workload schedulers for GPU orchestration.

· Ensure seamless integration of AI platforms with existing enterprise infrastructure.

· Monitoring, Observability & Operations

· Implement NVIDIA UFM, NVIDIA Mission Control, and NetQ for infrastructure monitoring and observability.

· Configure telemetry, validation, troubleshooting, and fabric management workflows.

· Drive infrastructure benchmarking, performance optimization, and capacity planning initiatives.

· Support POCs, design validation exercises, production rollouts, and operational readiness activities.

· Cloud & AI Services

· Design AI infrastructure solutions across AWS, Azure, GCP, and OCI.

· Enable AI services integration across hybrid and multi-cloud environments.

· Provide guidance on AI platform adoption, scalability, and operational best practices.

Key skills required

Data Center Infrastructure

Strong understanding of Data Center power infrastructure, including UPS, PDU, ATS, switchgear, transformers, and generators.
Knowledge of Data Center cooling technologies such as CRAC, CRAH, liquid cooling, immersion cooling, and chiller systems.
Experience in rack design, cabling architecture, white space planning, and physical infrastructure design.
Understanding of raised floors, fire suppression systems, plenum design, and facility infrastructure.

AI Networking

Strong expertise in InfiniBand (HDR/NDR), RoCE, and Ethernet fabrics.
Hands-on experience with NVIDIA Spectrum switches.
Deep understanding of EVPN-VXLAN, BGP, ECMP, Spine-Leaf architecture, and network automation.
Experience with Cumulus Linux environments.

AI Compute & Platforms

Expertise in NVIDIA GPU platforms including DGX, H100, H200, B200, and B300.
Experience with GPU virtualization, MIG, and AI workload optimization.
Strong understanding of AI training and inference infrastructure.

AI Storage

Knowledge of AI storage architectures and parallel file systems such as Lustre and GPFS.
Experience with NAS, SAN, Fibre Channel, NVMe, NFS, iSCSI, and Object Storage technologies.

Orchestration & Container Platforms

Experience with Kubernetes ecosystems.
Hands-on expertise with OpenShift and VMware Tanzu.
Experience with RUN and Slurm workload management platforms.
Understanding of container networking for AI workloads.

AI Software Stack

Understanding of AI infrastructure software layers including:
LLM Models
MLOps Platforms
Training and Inference Frameworks
Agentic AI
NVIDIA AI Enterprise
NVIDIA Licensing
NVIDIA NVIS

Cloud Technologies

Strong understanding of AWS, Azure, GCP, and OCI services.
Experience designing AI and cloud-native solutions in hyperscaler environments.