Data Engineer
About the Company
At 21.co Technologies, our mission centers on building scalable bridges into the world of cryptocurrency. By creating DeFi accessibility through traditional financial standards, we bring ourselves one step closer to the equitable financial future we all believe in.
About the Role
We are seeking a highly motivated and skilled Data Engineer with a focus on MLOps and Large Language Models (LLMs) to join our team and help us design, build, and maintain robust data pipelines and infrastructure. As a Data Engineer with expertise in LLMs, you will be responsible for ensuring data is accessible, reliable, and optimally structured to support analytics, machine learning, and LLM-driven applications. You will work on cutting-edge technologies and collaborate closely with cross-functional teams, enabling you to make a significant impact on our data-driven and AI-focused strategies.
This role integrates core data engineering principles with MLOps practices to support the full lifecycle of LLM-driven applications, from data preparation to production monitoring. This role offers opportunities for growth, innovation, and learning in a dynamic and fast-paced environment.
Our culture values diversity, communication, collaboration, and a shared passion for using data and AI to drive business outcomes.
Responsibilities and Scope
- Design and maintain scalable data pipelines tailored to LLM requirements, including preprocessing unstructured text data from various sources, implementing chunking strategies, and optimizing embedding generation for vector databases.
- Build and manage data infrastructure, including data warehouses, data lakes, and streaming solutions, specifically optimized for LLM workflows.
- Deploy LLMs into production environments using containerization (Docker) and orchestration tools (Kubernetes).
- Automate CI/CD pipelines for model versioning, A/B testing, and rollback procedures, ensuring seamless updates to fine-tuned models.
- Optimize data systems for performance, reliability, and scalability, particularly for real-time inference for applications like chatbots or document analysis.
- Implement MLOps-driven model deployment and monitoring, tracking key metrics such as inference latency, token usage costs, and output quality drift.
- Manage vector databases (e.g., Qdrant, Pinecone, FAISS) and design indexing strategies for Retrieval-Augmented Generation (RAG) architectures.
- Collaborate with data scientists/analysts, and other stakeholders to understand data and LLM requirements and deliver solutions.
- Implement data governance best practices, ensuring data quality, security, and compliance, including lineage tracking for text sources and redaction pipelines for PII detection.
- Monitor and troubleshoot data pipelines and LLM deployments, resolving issues in a timely manner.
- Create and maintain documentation for all data-related processes, procedures, and workflows, including LLM-specific pipelines and deployments.
- Research and stay up-to-date with the latest trends, technologies, and best practices in data engineering, MLOps, and LLM technologies.
What You Will Need To Be Great In This Role
- 5+ years of experience as a Data Engineer with 2+ years focused on MLOps.
- Strong proficiency in Python, SQL, and data orchestration tools (e.g., Airflow).
- Experience with cloud platforms like AWS (SageMaker), Google Cloud Platform (Vertex AI), or Azure Machine Learning for managed LLM deployments.
- Familiarity with data warehouse solutions such as Snowflake or BigQuery.
- Experience with big data technologies like Spark, Hadoop, or Kafka.
- Understanding of data modeling and schema design (e.g., dimensional modeling).
- Proficiency with version control systems like Git.
- Excellent problem-solving and debugging skills.
- Strong communication skills and the ability to work collaboratively with cross-functional teams.
- Experience working in Agile development environments.
- Hands-on experience with Hugging Face Transformers, LangChain for prompt engineering, and LlamaIndex for document indexing.
- Portfolio demonstrating deployed LLM applications with measurable performance metrics.
Our Stack
- Languages: Python, SQL
- Tools: Apache Airflow, Kafka (MSK, RedPanda), LangChain, Langsmith
- Cloud Platforms: AWS (S3, Databricks)
- Databases: Postgres, MongoDB, Vector Databases (Qdrant)
- Version Control: Git
Preferred
- Experience with containerization tools like Docker and orchestration platforms like Kubernetes.
- Familiarity with modern data streaming tools (e.g., Kafka, Kinesis).
- Familiarity with Natural Language Processing (NLP) / LLM.
- Familiarity with chunking & data transformation for LLMs.
- Familiarity with Vector Databases / Embedding Stores.
- Hands-on experience with real-time analytics or machine learning pipelines.
- Exposure to or interest in data visualization tools like Tableau, Looker, or Streamlit.
- Experience with specialized LLM techniques.
- Implementation of OpenTelemetry for distributed tracing and integration with Betterstack/Grafana dashboards.
This role is based in New York City and will be expected to work from our New York office Monday - Wednesday.
Compensation (NYC Only)
Pursuant to Section 8-102 of Title 8 of the New York City administrative code, the base salary range for this role is $140,000.00 - $180,000.00. Total compensation packages are based on various factors unique to each candidate, including but not limited to skill set, years and depth of experience, certifications, and specific office location.
Recommended Jobs
Client Operations Manager
Client Operations Manager Location Bronx, NY : Summary of Position: Under the direction of the Director of Client Operations and the general instruction of the primary Facility Contact at various fac…
Advanced Advertising Account Manager - Spectrum Reach
This role requires the ability to work lawfully in the U.S. without employment-based immigration sponsorship, now or in the future. Spectrum Reach®, the advertising sales business of Charter Com…
Electrical Foreman
One of our clients, an established electrical contractor based in Genesee County, is looking for an Electrical Foreman to join the team. Job Overview We are seeking a highly experienced and motiv…
Senior Software Engineer
About GiveDirectly GiveDirectly (GD) aims to reshape international giving – and millions of lives – by providing cash grants directly to the world’s poorest. The Brookings Institution estimates th…
Support Analyst IT
Support Analyst IT Location Hauppauge, NY : ORGANIZATIONAL STRUCTURE BUSINESS: Aerospace & Defense (A&D) LOCATION: Hauppauge DIRECT REPORTING: Sr. IT Architect Manager / Hauppauge VP General Ma…
Outside Salesperson needed Salary plus Commision
Job Description Job Description Benefits: ~401(k) matching ~ Bonus based on performance ~ Company car ~ Competitive salary ~ Dental insurance ~ Employee discounts ~ Free uniforms …
Explore Rochester: Heartbeat of New York's Healthcare Scene!
Registered Nurse - Medical Surgical - Travel - (MS RN) Dive into the vibrant pulse of Rochester, where your expertise as a Medical Surgical Nurse will meet the thrill of adventure! Picture yourself c…
Program Director - MS15
Job Description Job Description Title: Program Director, MS 15 Site Name: MS 15 Reports To: Vice President, Afterschool Location: 2195 Andrews Avenue Bronx NY 10453 Hours: 35 Hours/We…
Senior Graphic Designer
Coach seeks a Senior Graphic Designer in New York to lead the print design team for ready-to-wear and accessories. This role involves managing design projects, collaborating with teams, and mentoring …
Authorized OSHA Trainer
The OSHA Trainer is responsible for providing training and education on Occupational Safety and Health Administration (OSHA) regulations and best practices to employees within the organization. This …