Job Title: Technology and Transformation - EAD - Engineering -Senior Consultant /Consultant - Data Engineer
Key Skills :
GCP, Spark, SQL, Scala , AIRFLOW, AUTOMIC , CICD , BIG QUERY , PySpark, Unix , Python , Hive , Hadoop, Kafka, Spark Streaming
· GCP : BigQuery, Cloud Storage, Dataproc, Dataflow, Pub/Sub, and Bigtable
· ETL : Experience in designing, building, and managing ETL/ELT pipelines
· Hadoop & Spark : Proficiency with Hadoop ecosystem tools and Apache Spark for large-scale data processing.
· HIVE : SQL Proficiency (HiveQL), Performance tuning & optimization Joins and Subqueries, Window functions, Partitions and Buckets
· DevOps CI/CD
Responsibilities
· Design, develop, implement and tune distributed data processing pipelines that process large volume of data; focusing on scalability, low -latency, and fault-tolerance in every system built
· Engage with Product Management and Business to drive the agenda, set your priorities and deliver awesome product features to keep platform ahead of market scenarios.
· Influence cross functional architecture in sprint planning
· Provide business insights, while leveraging internal tools and systems, databases and industry data
· Skillset:
· Proven work experience in Spark, Python/Scala, Java, SQL, Hive, Any RDBMS/NoSQL database
· Demonstrates expertise in writing complex, highly-optimized queries across large data sets
· Hands on experience with one of the cloud environments either on GCP , Azure or others
· Experience in Unix/Linux shell scripting or similar programming/scripting knowledge
· Understanding of the CI/CD framework
· Exposure to one or more tools such as apache Airflow, Nifi
· Develop, Test, deploy and maintain automation processes to extract data across various data systems, platforms and filetypes; creates data pipelines and transform data into structures that is relevant to the problem by applying appropriate techniques; and automate the end-to-end loading process into target databases.
· Perform detailed data analysis, data quality checks, data cleansing on extracted data, and perform source to target mapping exercises; and identify integration challenges and develop, test and deploy appropriate solutions.
· Analyze complex data elements, systems, data flows, dependencies, and relationships to contribute to conceptual, physical, and logical data models; define relational tables, primary and foreign keys, and stored procedures to create a data model structure; and evaluate existing data models and physical databases for variances and discrepancies.
· Interact with various data sources to perform data analytics and derive business insights, and communicate these through reports and dashboards.
· Participate in discussions and collecting business requirements from stakeholders and translate to design and development specifications.
· Troubleshoot business and production issues by gathering information (for example, issue, impact, criticality); engage support teams to assist in the resolution of issues; and perform actions as designated in the action plan.
· Produce well-tested code and participate in code reviews to ensure code base adhere to standards.
· Contribute to code documentation, maintaining client implementation playbooks and provide training to others on data models, data pipelines and the Extract-Transform-Load (ETL) processes.