Key Responsibilities
- Design and build scalable, testable data pipelines using Python (pandas, PySpark/Snowpark) optimized for performance and maintainability
- Implement Snowflake objects (tables, stages, tasks), write efficient SQL, and develop Snowpark-based transformations with performance tuning
- Develop RESTful APIs and backend services in FastAPI to expose data and business logic with authentication, rate limiting, and request validation
- Package services with Docker and deploy/operate them on Kubernetes, managing manifests, Helm charts, and observability
- Design and implement event-driven architectures using Kafka, including schema management and stream processing patterns
- Write unit/integration tests, data validation checks, and implement CI/CD pipelines with security scans and automated testing
Requirements
- Expertise in Python with deep experience in pandas for ETL/ELT and data wrangling (vectorization, memory management, IO, time series)
- Hands-on experience with Snowflake (SQL, performance tuning, warehouse configuration) and Snowpark (Python) for scalable transformations
- Strong FastAPI experience building production services (dependency injection, Pydantic models, async IO)
- Practical knowledge of Kafka (consumer groups, offsets, partitions, schema management) and event-driven microservices
- Proficiency with Docker and Kubernetes (deployment strategies, networking, volumes, service meshes)