Job Title:  Lead Senior Associate | Engineering Foundry & Managed Services | Bengaluru | Engineering as a Servic

Job requisition ID ::  100455
Date:  May 19, 2026
Location:  Bengaluru
Designation:  Lead Associate
Entity:  Deloitte LLP

Job Description: 


A Senior Kubeflow Developer, having 8+ years of experience in software engineering or platform engineering, with substantial Kubernetes experience and 4+ years working directly with Kubeflow or related MLOps tooling. who will design, build, and maintain Kubeflow-based AI/ML platforms and applications.  


This role focuses on customizing Kubeflow components (Jupyter integrations, Knative/KServe), managing Kubeflow install/upgrade/lifecycle, and implementing secure Kubernetes authentication and authorization. The ideal candidate partners closely with data scientists and platform engineers to deliver production-grade MLOps pipelines and scalable, secure AI services. 


Overview


We are seeking a senior-level engineer with deep hands-on experience in Kubeflow, Kubernetes, and cloud-native MLOps to lead customization, deployment, and lifecycle management of Kubeflow deployments. You will be responsible for integrating Jupyter notebook services, extending Knative/KServe for model serving, implementing robust Kubernetes authN/authZ patterns, and ensuring reliable install/upgrade processes across environments (development, staging, production, private cloud). This is both a developer and platform-owner role — building AI/ML applications and operating the underlying Kubeflow platform.


Key responsibilities


Development and customization


 

  • Customize and extend Kubeflow applications and components (KFP, Pipelines, Katib, Profiles, Metadata). 
  • Integrate and harden Jupyter Notebook / JupyterHub environments for interactive data science workflows. 
  • Implement and extend Knative and KServe components to support custom model-serving runtimes and autoscaling patterns. 
  • Create reusable manifests, operators, kustomize/Helm charts, or Kubernetes operators for repeatable deployments. 

 


Deployment and lifecycle management 

 


  • Design and own the install, upgrade and rollback processes for Kubeflow across clusters and environments. 
  • Manage manifests and configuration (versioning, parameterization) to enable repeatable, auditable deployments. 
  • Automate bootstrap and cluster lifecycle tasks, including preflight checks, dependency validation, and post-deploy verification. 
  • Troubleshoot and resolve complex deployment/install issues across control plane and data plane components. 

 

Security (authN/authZ) 


 

  • Implement Kubernetes authentication (OIDC, RBAC, ServiceAccounts, Vault integration, short-lived credentials) and authorization policies for secure multi-tenant Kubeflow deployments. 
  • Design and enforce least-privilege access models for data scientists, pipelines, and model-serving endpoints. 
  • Integrate cluster security controls (namespace isolation, PSP/PSA or equivalent, network policies, admission controllers) with Kubeflow components. 

 


CI/CD and automation 

 


  • Build CI/CD pipelines to validate, test, and release Kubeflow manifests, application code, and model-serving images. 
  • Integrate test automation for functional, security, and smoke tests as part of deployment pipelines. 
  • Create git-driven workflows (GitOps) for manifests and environment promotion. 

 

Operations, observability, and reliability 


 

  • Instrument and monitor Kubeflow and Kubernetes control/data planes (logs, metrics, tracing). 
  • Implement alerting and runbook documentation for common failure modes and operational tasks. 
  • Lead post-mortems and continuous improvement of platform reliability and deployment practices. 

 


Collaboration and enablement 

 


  • Work closely with data scientists to translate model training and serving requirements into platform capabilities. 
  • Collaborate with platform, security, and cross-fuctional teams to align on architecture, policy, and operational standards. 

 

Required skills and experience 


 

  • Strong experience with Kubeflow: customization, components, Pipelines, Profiles, Notebook integration, and operational management. 
  • Familiarity with AI tooling on kubernetes. One or more of: LangChain, LangFlow, Spark, Airflow, Kubeflow, MLFlow, KServe, Ray 
  • Good to have open-source contributions and particularly in the Kubeflow and Knative communities 
  • Deep Kubernetes expertise: cluster architecture, resource management, controllers, CRDs, operators, networking, and storage. 
  • Proven experience implementing Kubernetes authentication (OIDC, webhook token auth, service accounts) and authorization (RBAC, ABAC, policy enforcement). 
  • Practical experience with Knative and KServe: custom predictors, scaling behavior, revisions, and annotations for serving models. 
  • MLOps knowledge: model training, reproducible pipelines, model versioning, deployment patterns, inference scaling and A/B testing. 
  • CI/CD tooling: building pipelines for build/test/deploy of manifests and container images (Jenkins, GitHub Actions, GitLab CI, Tekton, ArgoCD, etc.). 
  • Strong troubleshooting and debugging skills for distributed systems and Kubernetes-native apps. 
  • Excellent communication and collaboration skills for cross-functional teams. 

 


Preferred qualifications 

 


  • Experience designing cloud-native architectures and microservices patterns. 
  • Familiarity with GitOps workflows and tools (ArgoCD, Flux). 
  • Experience with Helm, Kustomize, and Kubernetes operators for managing manifests at scale. 
  • Knowledge of container registries, image promotion, and secure image supply chains. 
  • Monitoring, logging and tracing stack experience (Prometheus, Grafana, etc). 
  • Familiarity with secrets management solutions (Vault, K8s SAa, ExternalSecret). 
  • Prior experience maintaining or contributing to open-source Kubeflow manifests or distributions. 
  • Desired experience with our repositories 

 

Additional attributes 


 

  • Senior-level mindset: proactive, ownership-oriented, and driven to improve platform reliability and developer productivity. 
  • Comfortable working in ambiguous environments and balancing short-term fixes and long-term platform investments. 
  • Willingness to mentor and grow the team’s Kubeflow and Kubernetes capabilities.