About the Role
We are seeking a highly skilled AI Engineer – Data to join our team. The primary focus of this role is to research, design, and implement solutions that facilitate seamless data collection from various sources and make it accessible to Large Language Models. The ideal candidate will possess a strong background in data engineering and AI, with experience in integrating and processing data from diverse formats and platforms.
Key Responsibilities
- Design and implement data collection solutions from diverse sources, including Excel, SQL, CSV, PDF, Word, NoSQL databases, emails, WhatsApp, Slack, and other common data formats.
- Develop methods to structure, preprocess, and transform collected data into formats that are easily ingestible by LLMs.
- Create efficient data pipelines that handle both historical and real-time data from these sources, ensuring seamless integration into LLMs.
- Continuously optimize data collection and processing methodologies to enhance performance and accuracy.
- Stay informed about the latest trends and advancements in AI, data engineering, and data integration techniques. Evaluate and integrate new tools and technologies to improve the data pipeline.
Required Skills and Experience
- Proven experience in designing and implementing data pipelines and solutions, particularly for AI or LLM applications.
- Experience working with LLMs, including knowledge of techniques for building efficient RAG systems and data ingestion systems.
- Strong proficiency in Python.
- Ability to collect data from the specified sources through scraping, efficient querying, or APIs.
- Strong knowledge of SQL and NoSQL databases.
- Experience with cloud platforms and containerization.