Key Responsibilities
- Design and implement scalable big data pipelines and ETL processes
- Optimize data storage and retrieval systems for performance and cost efficiency
- Develop machine learning models for large-scale data processing
- Collaborate with cross-functional teams to define data requirements
- Ensure data integrity, security, and compliance with industry standards
- Monitor and troubleshoot data infrastructure issues
Requirements
- 5+ years of experience in big data technologies (Hadoop, Spark, Kafka)
- Proficiency in Python and SQL for data processing
- Experience with cloud platforms (AWS, GCP, or Azure)
- Strong understanding of distributed systems and data architecture
- Familiarity with data governance and security best practices