Key Responsibilities
- Design and deploy production-grade data pipelines using AWS-native services
- Develop scalable lakehouse architectures leveraging S3 and modern data formats
- Build batch and real-time data processing systems with Spark, Glue, and EMR
- Implement event-driven data pipelines using Lambda, EventBridge, SQS, and Kinesis
- Own data transformation, testing, and modeling using dbt
- Automate infrastructure provisioning with Infrastructure as Code (IaC) tools
- Drive CI/CD pipelines for data workflows and infrastructure deployments
- Collaborate with cross-functional teams to deliver high-quality data products
Requirements
- Strong hands-on experience with AWS Glue (PySpark) or Spark on EMR
- Expertise in SQL and dbt for transformations and testing
- Experience with AWS-based lakehouse architectures and S3
- Proficiency in Infrastructure as Code using CDK, Terraform, or CloudFormation
- CI/CD pipeline experience with GitHub Actions, CodePipeline, or similar
- Experience building API/Lambda-based data ingestion pipelines