Dijitprogram - job | Lead Data Engineer

Jobs Description

Work Mode: Work from Office

Position Overview:

We are seeking an experienced and highly skilled Lead Data Engineer to join our data platform team. In this critical role, you will be responsible for designing, building, and optimizing the data pipelines and infrastructure that form the backbone of our data-driven organization. You will work with large, complex datasets, leveraging a modern data stack to make data available, reliable, and accessible for analytics and machine learning initiatives.

The ideal candidate is an expert in building scalable data solutions on the AWS cloud, with deep, hands-on expertise in Python, PySpark, Apache Spark, and dbt. You are passionate about data quality, automation, and building robust systems that can handle data at scale.

Key Responsibilities:

Data Pipeline Architecture & Development: Design, develop, and maintain scalable and resilient data pipelines using Python, PySpark, AWS Glue, and EMR to process large volumes of structured and unstructured data.
Data Transformation with dbt: Build, manage, and optimize modern, analytics-ready data transformation workflows using dbt. Champion best practices for dbt development, including testing, documentation, and modularity.
ETL/ELT Implementation: Implement robust ETL/ELT processes to ingest data from a variety of source systems (APIs, databases, event streams) into our data lake (S3) and cloud data warehouse (e.g., Amazon Redshift).
Big Data Processing: Utilize Apache Spark and PySpark for large-scale data processing, aggregation, and analysis tasks, ensuring high performance and efficiency.
Data Modeling & Warehousing: Design and implement logical and physical data models for our data warehouse, applying principles of dimensional modeling to support business intelligence and analytics use cases.
Infrastructure as Code (IaC): Collaborate with DevOps to define and manage data infrastructure using tools like AWS CloudFormation or Terraform, promoting an automated and version-controlled environment.
Data Quality & Governance: Implement automated data quality checks, validation rules, and monitoring to ensure accuracy, completeness, and reliability of data assets.
Performance Tuning & Optimization: Proactively monitor performance of data pipelines and warehouse queries, identify bottlenecks, and implement optimizations.
Mentorship: Mentor junior data engineers, conduct code reviews, and establish best practices for data engineering within the organization.

Qualifications & Skills:

5–8+ years of professional experience in a data engineering role with production-grade data pipelines.
Expert-level proficiency in Python for data processing and extensive hands-on experience with PySpark.
Strong experience with AWS data ecosystem (S3, Glue, EMR, Redshift, Lambda, IAM).
Proven experience building and managing complex data transformation projects using dbt.
Advanced SQL skills with deep understanding of data warehousing, dimensional modeling, and performance tuning.
Solid understanding of big data technologies and distributed computing concepts.
Experience with orchestration tools such as Apache Airflow or AWS Step Functions.
Familiarity with Infrastructure as Code (IaC) tools (Terraform, CloudFormation) is a plus.
Excellent analytical and troubleshooting skills.