Key Responsibilities
- Design, develop, and validate data pipelines for benchmarking and evaluation workflows to ensure efficiency and reliability.
- Perform comprehensive data processing, analysis, feature engineering, and validation to support data science use cases.
- Write, execute, and optimize Python scripts to process data and facilitate experiments locally, ensuring reproducibility and accuracy.
- Assess data quality, transformations, and outputs for correctness, consistency, and reproducibility.
- Create clean, well-documented, and reusable data workflows suitable for benchmarking and evaluation purposes.
- Collaborate closely with researchers and engineers to design challenging, real-world data engineering and data science tasks.
Requirements
- At least three years of professional experience in Data Engineering, Data Science, or Software Engineering with a focus on data workflows.
- Proficiency in Python for data processing, analysis, and scientific workflows.
- Demonstrable experience working with both structured and unstructured data, including machine learning and data science fundamentals.
- Ability to navigate and modify complex, real-world codebases while writing clean, reusable, and well-documented code.
- Strong problem-solving skills in algorithmic or data-intensive problem domains.