Lead Data Engineer

November 19, 2024

Job Overview

  • Date Posted
    November 19, 2024
  • Expiration date
    --
  • Experience
    10+ years
  • Qualification
    B.Tech / BE / MCA / or any other relevant degree

Job Description

Candidate Requirements:
Bachelor’s/Master’s in Computer Science, Data Science, Engineering, or related fields.
10+ years of hands-on experience in data engineering, with a focus on ETL development using PySpark or other Spark tools.
Proficiency in SQL for complex queries, performance tuning, and data modeling.
Expertise with Microsoft Fabric or similar cloud data platforms (mandatory).
Strong understanding of data warehousing, big data processing, and ETL frameworks.
Familiarity with data processing technologies like Hadoop, Hive, and Kafka (preferred).
Experience with both structured and unstructured data sources.
Knowledge of scripting languages like Python or Scala for data manipulation.
Experience with Azure Data Services (e.g., Azure Data Factory, Synapse).
Familiarity with Data Lake, Data Warehouse, and Delta Lake concepts.
DevOps knowledge, including CI/CD pipelines and containerization, is a plus.
Strong problem-solving, leadership, and communication skills.

Role Description:
The client seeks a seasoned Lead Data Engineer with extensive expertise in developing ETL processes using PySpark Notebooks and Microsoft Fabric, while also supporting legacy SQL Server environments. The candidate should demonstrate proficiency in Spark-based development, advanced SQL skills, and the ability to work independently, collaboratively, or in a leadership capacity.
Responsibilities:
Design, develop, and maintain ETL pipelines using PySpark Notebooks and Microsoft Fabric.
Collaborate with stakeholders to gather and translate data requirements into scalable solutions.
Migrate and integrate data from legacy SQL Server systems to modern platforms.
Optimize workflows for scalability, efficiency, and reliability.
Provide technical leadership and mentor junior developers.
Troubleshoot and resolve performance, quality, and scalability issues.
Establish and enforce best practices, coding standards, and technical documentation.
Conduct code reviews and provide constructive feedback.
Ensure data integrity and consistency to support data-driven decision-making.