Job Title:  Director | AI / ML | Bengaluru | Engineering | Hybrid Cloud Engineering

Job requisition ID ::  107011
Date:  Jul 3, 2026
Location:  Bengaluru
Designation:  Director
Entity:  Deloitte Touche Tohmatsu India LLP

Your work profile


·      AI Data Center Architecture & Solution Design

·      Design and implement AI-focused Data Center architectures aligned with Tier II, Tier III, and Tier IV standards.

·      Develop end-to-end AI Data Center solutions, including retrofitting traditional CPU-based data centers into AI Factories.

·      Create advisory documents, RFPs, technical proposals, and commercial proposals for AI Data Center engagements.

·      Design AI infrastructure solutions across hyperscalers (AWS, Azure, GCP, OCI) and NVIDIA Cloud Partners.

·      Prepare HLDs, LLDs, network diagrams, rack layouts, BOQs, and TCO models.

·      AI Networking & Fabric Architecture

·      Architect and deploy InfiniBand and NVIDIA Spectrum Ethernet fabrics for AI workloads.

·      Design and implement Spine-Leaf network architectures using EVPN-VXLAN overlays.

·      Configure and optimize BGP, ECMP, RoCE, and high-performance networking environments.

·      Lead Cumulus Linux-based deployments and network automation initiatives.

·      Optimize network performance, latency, throughput, and congestion management for AI environments.

·      AI Compute & GPU Infrastructure

·      Design and size GPU clusters using NVIDIA H100, H200, B200, B300, DGX, and AI Factory platforms.

·      Perform GPU capacity planning and workload profiling for AI and ML use cases.

·      Implement GPU virtualization and Multi-Instance GPU (MIG) architectures.

·      Support AI training and inference infrastructure deployments.

·      AI Storage & Platform Engineering

·      Design AI storage solutions utilizing NAS, SAN, NVMe, Object Storage, NFS, iSCSI, Fibre Channel, and parallel file systems.

·      Implement and manage Kubernetes-based AI platforms, including OpenShift and VMware Tanzu.

·      Deploy and integrate RUN and Slurm workload schedulers for GPU orchestration.

·      Ensure seamless integration of AI platforms with existing enterprise infrastructure.

·      Monitoring, Observability & Operations

·      Implement NVIDIA UFM, NVIDIA Mission Control, and NetQ for infrastructure monitoring and observability.

·      Configure telemetry, validation, troubleshooting, and fabric management workflows.

·      Drive infrastructure benchmarking, performance optimization, and capacity planning initiatives.

·      Support POCs, design validation exercises, production rollouts, and operational readiness activities.

·      Cloud & AI Services

·      Design AI infrastructure solutions across AWS, Azure, GCP, and OCI.

·      Enable AI services integration across hybrid and multi-cloud environments.

·      Provide guidance on AI platform adoption, scalability, and operational best practices.

 

Key skills required


Data Center Infrastructure

  • Strong understanding of Data Center power infrastructure, including UPS, PDU, ATS, switchgear, transformers, and generators.
  • Knowledge of Data Center cooling technologies such as CRAC, CRAH, liquid cooling, immersion cooling, and chiller systems.
  • Experience in rack design, cabling architecture, white space planning, and physical infrastructure design.
  • Understanding of raised floors, fire suppression systems, plenum design, and facility infrastructure.

AI Networking

  • Strong expertise in InfiniBand (HDR/NDR), RoCE, and Ethernet fabrics.
  • Hands-on experience with NVIDIA Spectrum switches.
  • Deep understanding of EVPN-VXLAN, BGP, ECMP, Spine-Leaf architecture, and network automation.
  • Experience with Cumulus Linux environments.

AI Compute & Platforms

  • Expertise in NVIDIA GPU platforms including DGX, H100, H200, B200, and B300.
  • Experience with GPU virtualization, MIG, and AI workload optimization.
  • Strong understanding of AI training and inference infrastructure.

AI Storage

  • Knowledge of AI storage architectures and parallel file systems such as Lustre and GPFS.
  • Experience with NAS, SAN, Fibre Channel, NVMe, NFS, iSCSI, and Object Storage technologies.

Orchestration & Container Platforms

  • Experience with Kubernetes ecosystems.
  • Hands-on expertise with OpenShift and VMware Tanzu.
  • Experience with RUN and Slurm workload management platforms.
  • Understanding of container networking for AI workloads.

AI Software Stack

  • Understanding of AI infrastructure software layers including:
  • LLM Models
  • MLOps Platforms
  • Training and Inference Frameworks
  • Agentic AI
  • NVIDIA AI Enterprise
  • NVIDIA Licensing
  • NVIDIA NVIS

Cloud Technologies

  • Strong understanding of AWS, Azure, GCP, and OCI services.
  • Experience designing AI and cloud-native solutions in hyperscaler environments.