Key Responsibilities
- Design, develop, and maintain scalable ETL processes using Talend, Informatica, and scripting languages like Python and Bash
- Build and manage robust data pipelines with Hadoop, Spark, Apache Hive, Azure Data Lake, and AWS services to process large volumes of structured and unstructured data
- Develop and optimize complex SQL queries for data extraction, transformation, and loading across multiple relational databases including Microsoft SQL Server and Oracle
- Architect and implement efficient data models for data warehouses to support analytics and reporting initiatives
- Collaborate with data scientists to prepare clean datasets and integrate machine learning workflows
- Monitor system performance, troubleshoot issues, and implement improvements to ensure high availability of data services
Requirements
- Extensive experience with cloud platforms such as AWS (including S3) and Azure Data Lake
- Strong programming skills in Java, Python, VBA, Bash, and Shell Scripting for automation tasks
- Proficiency with big data technologies including Hadoop ecosystem, Spark (PySpark), and Apache Hive
- Expertise in ETL development using Talend, Informatica, or similar tools; strong SQL skills for complex query development
- Experience with RESTful API integration and modern data architecture concepts including Data Warehouse design and database modeling