About the Role
As a Data Analyst on the ID Document Intelligence team, you'll investigate data quality issues, perform root cause analysis, and build the pipelines and automated checks that keep Incode's machine learning ecosystem reliable. You'll ensure that data flows efficiently and accurately through every stage of model training, labeling, and performance tracking.
Your work will be essential to maintaining the scalability, quality, and precision of Incode's document intelligence systems used by millions worldwide.
What You'll Own & Drive
- Root Cause Investigation — Independently investigate data quality issues end-to-end. When a model metric drops or data looks wrong, you own the investigation: forming hypotheses, querying across data sources, and delivering a clear, evidence-backed answer.
- Automated Data Pipelines — Design, build, and maintain pipelines for collection, labeling, validation, and metric computation that support ML training and evaluation.
- Data & Labeling Quality Standards — Establish and monitor consistency checks, accuracy audits, and root-cause analysis when issues impact model outcomes.
- Model Evaluation Metrics — Define, implement, and automate evaluation metrics and reporting that reflect real-world product use cases and business goals.
- Performance Tracking Systems — Build scalable dashboards and monitoring to enable fast, data-driven decisions across teams.
- Workflow Orchestration — Develop and operate reliable orchestration (Airflow, Prefect, or similar) to schedule, observe, and troubleshoot end-to-end pipelines.
- Clean, Maintainable Code — Write SQL and Python to efficiently investigate problems — querying databases, calling internal APIs, and processing data across multiple sources.
- Cross-Functional Partnership — Partner closely with ML engineers, analysts, and product stakeholders to prioritize work by impact, unblock execution, and continuously improve internal tooling for analysis and evaluation.
The Qualities That Set You Apart
- Investigative mindset — You don't stop at the symptom. You chase data anomalies until you find the root cause, and you bring receipts.
- Statistical intuition — You can tell the difference between a meaningful metric shift and noise, and you know how to prove it.
- Builder's bias — You'd rather automate a check than run it manually for the third time.
- Proactive ownership — You spot problems before anyone flags them, and you drive them to resolution without waiting for permission.
- Clear communicator — You translate messy data into crisp answers that ML engineers, PMs, and leadership can act on.
Your Background
- 3+ years of experience as a Data Analyst or in a similar data infrastructure role.
- Strong SQL and Python skills for data investigation and root cause analysis.
- Hands-on experience with AWS Redshift or a similar columnar/cloud database (BigQuery, Snowflake, etc.).
- Solid statistical foundation — you can reason about rates, distributions, significance, and sampling bias.
- Hands-on experience with workflow orchestration tools (Airflow, Prefect, Dagster, etc.).
- Proven experience in data quality management, data preparation, or ML data pipelines.
- A proactive mindset toward identifying problems.
- Strong collaboration and problem-solving skills.
- Background in mathematics, physics, or engineering.
Preferred Experience
- Familiarity with big data technologies or data infrastructure optimization.
- Experience with labeling workflows or ML data preparation pipelines.
- Exposure to AWS or other cloud-based data solutions.
- Interest in machine learning operations (MLOps) and scalable ML systems.