Key Responsibilities
- Work with large structured datasets using SQL and PySpark to build and optimize data processing pipelines.
- Assist in developing business/entity matching logic and fuzzy matching implementations for data enrichment.
- Create and validate analytical datasets for model development, reporting, and risk analytics use cases.
- Perform data cleaning, transformation, aggregation, and quality checks to ensure accuracy.
- Write efficient SQL queries using joins, CTEs, window functions, and aggregations for analytics and reporting.
- Support feature engineering for machine learning and risk modeling initiatives.
Requirements
- Pursuing or recently completed a degree in Computer Science, Data Science, Statistics, Mathematics, or a related field.
- Strong foundational knowledge of SQL, including joins, CTEs, aggregations, CASE statements, and window functions.
- Basic understanding of Python and familiarity with PySpark or distributed data processing concepts.
- Familiarity with relational databases, data structures, and ETL/data pipeline concepts.
- Exposure to AWS or cloud platforms like Redshift, Spark, Hadoop, or Databricks is a plus.