Job Title: Senior Consultant | Cloud Infrastructure | Mumbai | Engineering
Apache Flink Infrastructure Expert
As a Cloud Data Platform Expert on the Public Cloud team, you will lead the core architecture of our streaming infrastructure. Our team's mission is to evolve our application and data hosting environments into an integrated, hybrid multi-cloud ecosystem leveraging AWS and Google Cloud. You will focus heavily on designing, provisioning, managing, and scaling distributed Apache Flink clusters on cloud-native environments. You will empower application developers with self-service streaming capabilities, infrastructure-as-code, and codified blueprints to accelerate stream processing from idea to production.
Key Responsibilities
- Platform Architecture: Design, build, and operate resilient, scalable, and secure Apache Flink platform clusters across AWS and GCP.
- Cloud Infrastructure Provisioning: Deploy and manage Flink clusters on cloud environments using Kubernetes (EKS/GKE) utilizing session, application, or per-job deployment models.
- Infrastructure as Code (IaC): Develop and manage underlying cloud infrastructure using IaC principles and tools such as Terraform and Ansible to ensure automated, repeatable cluster provisioning.
- Operational Excellence & Tuning: Take ownership of the Flink platform lifecycle, including high availability (HA) configurations, state backend management (RocksDB), autoscaling, and savepoint strategies.
- Risk, Security & Control: Ensure the streaming platform adheres to technology standards and risk management frameworks. Implement security controls, Kerberos, IAM, network isolation, and encryption for Flink environments.
- CI/CD & Platform Automation: Implement and maintain CI/CD pipelines to automate Flink operator installations, Kubernetes deployments, and cluster upgrades.
- Observability & Monitoring: Define cluster telemetry and implement production-grade metrics, logs, and alerts for JobManagers, TaskManagers, and JVM behaviors.
- Collaboration & Blueprints: Create codified blueprints and automated self-service paths to reduce duplication, enabling developers to deploy Flink code with ease.
Required Qualifications & Skills
- Proven experience as a Platform Engineer, Cloud Infrastructure Engineer, or Data Platform Architect in a mid-level or senior capacity.
- Expert knowledge setting up, scaling, and managing Apache Flink clusters natively on cloud resource managers.
- Hands-on experience with public cloud platforms, specifically AWS and/or Google Cloud (GCP).
- Containerization Technologies: Expert-level proficiency in Docker and container orchestration platforms like Kubernetes (e.g., Amazon EKS, Google GKE).
- Infrastructure-as-Code: Demonstrable knowledge of IaC tools such as Terraform and/or Ansible for distributed data system cluster management.
- Strong architectural skills with a focus on building well-engineered, fault-tolerant, and resilient platform foundations.
- Experience with automated platform validation and verification testing frameworks.
- Understanding of Site Reliability Engineering (SRE) practices in a platform infrastructure ownership model.
Preferred Qualifications & Skills
- Public cloud provider certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Professional Cloud Architect).
- Strong proficiency in one or more programming languages (Python, Java, Go) to build custom platform tooling.
- Experience with Managed Services for Flink (e.g., Amazon Managed Service for Apache Flink / Kinesis Analytics, Confluent Cloud Flink).
- Familiarity with cloud-native CI/CD tools and platforms like Harness, Tekton, or Jenkins.
- Knowledge of core networking, security, VPC peering, and identity/access management (IAM) in a cloud environment.
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).