About Us
YipitData is the leading market research and analytics firm for the disruptive economy, most recently raising $475M from The Carlyle Group at a valuation over $1B. Our proprietary technology analyzes billions of alternative data points daily to uncover actionable insights across sectors like software, AI, cloud, e-commerce, ridesharing, and payments. Our data and research teams transform raw data into strategic intelligence, delivering accurate, timely, and deeply contextualized analysis that our customers—ranging from top investment funds to Fortune 500 companies—depend on for high-stakes decisions. Our award-winning, people-centric culture, recognized by Inc. as a Best Workplace for three consecutive years, emphasizes transparency, ownership, and continuous mastery.
About The Role
YipitData transforms billions of alternative-data points into signals for institutional investors and Fortune 500 customers. The data science team is responsible for the methods behind these signals, including causal inference on panel data, predictive modeling against earnings outcomes, and authoring white papers. This role combines applied data science with AI-native tooling. You will own substantive analytical projects end-to-end, utilizing LLM coding assistants as a primary collaborator across all phases of work: exploratory analysis, production code, and written deliverables. You will also contribute to shaping the team's approach to AI-native workflows, including tool selection, guardrails, and evaluation patterns. This is a remote-friendly opportunity, open to candidates in NYC (headquarters), office hubs (Austin, Miami, Denver, Mountain View, or Seattle), or anywhere else in the US.
As an AI Engineer, you will:
- Translate ambiguous customer questions into well-scoped data science projects spanning panel data, time series, and causal inference.
- Engineer features from large alternative-data panels (transaction-level, invoice-level, web-scraped) using Spark.
- Build, validate, and interpret causal and predictive models that link alternative-data signals to financial outcomes such as revenue, earnings surprise, and KPI inflections.
- Author technical white papers and customer analyses for institutional investors and Fortune 500 readers, including figures, equations, and narrative framing.
- Use LLM coding assistants and agents as a primary collaborator to prototype faster, write higher-quality code, audit your own work, and ship deliverables efficiently.
- Build internal LLM-driven tooling (agents, eval harnesses, retrieval pipelines) for the broader organization.
- Partner with data engineering, product, and revenue teams to integrate signal development with customer-facing products.
- Set technical standards for the team: PEP 8, type hints, vectorized operations, reproducible notebooks, sound methodology, and citation discipline.
You Are Likely to Succeed If:
- You have shipped multiple data science projects end-to-end with quantifiable customer or business impact.
- Your statistical foundations are robust, allowing you to defend causal-inference method choices under technical questioning.
- You write Python code that demonstrates senior engineering quality: clear naming, type hints, vectorized code, and no premature abstraction.
- You regularly use LLM coding assistants and can articulate how they enhance output without compromising quality, as well as instances where you override them.
- Writing is a primary skill, enabling you to communicate as carefully on the page as you do in code.
- You take ownership proactively, identifying questions, delivering answers, and explaining them effectively.
- You exhibit humility about what you don't know and confidence in what you do.
- You engage with relevant literature, cite sources, and base your work on established methods.