Key Responsibilities
- Design, build, and maintain scalable data pipelines and ETL processes
- Develop and optimize SQL queries for large datasets
- Implement data storage solutions using databases and data lakes
- Collaborate with data scientists to ensure data availability and quality
- Monitor and troubleshoot data workflows for performance and reliability
- Automate data processing tasks to improve efficiency
Requirements
- Proficiency in Python and SQL
- Experience with Apache Spark and Hadoop
- Knowledge of data pipeline tools and frameworks
- Familiarity with cloud platforms like AWS
- Understanding of data modeling and schema design