Data Engineer
About the Company
At 21.co Technologies, our mission centers on building scalable bridges into the world of cryptocurrency. By creating DeFi accessibility through traditional financial standards, we bring ourselves one step closer to the equitable financial future we all believe in.
About the Role
We are seeking a highly motivated and skilled Data Engineer with a focus on MLOps and Large Language Models (LLMs) to join our team and help us design, build, and maintain robust data pipelines and infrastructure. As a Data Engineer with expertise in LLMs, you will be responsible for ensuring data is accessible, reliable, and optimally structured to support analytics, machine learning, and LLM-driven applications. You will work on cutting-edge technologies and collaborate closely with cross-functional teams, enabling you to make a significant impact on our data-driven and AI-focused strategies.
This role integrates core data engineering principles with MLOps practices to support the full lifecycle of LLM-driven applications, from data preparation to production monitoring. This role offers opportunities for growth, innovation, and learning in a dynamic and fast-paced environment.
Our culture values diversity, communication, collaboration, and a shared passion for using data and AI to drive business outcomes.
Responsibilities and Scope
- Design and maintain scalable data pipelines tailored to LLM requirements, including preprocessing unstructured text data from various sources, implementing chunking strategies, and optimizing embedding generation for vector databases.
- Build and manage data infrastructure, including data warehouses, data lakes, and streaming solutions, specifically optimized for LLM workflows.
- Deploy LLMs into production environments using containerization (Docker) and orchestration tools (Kubernetes).
- Automate CI/CD pipelines for model versioning, A/B testing, and rollback procedures, ensuring seamless updates to fine-tuned models.
- Optimize data systems for performance, reliability, and scalability, particularly for real-time inference for applications like chatbots or document analysis.
- Implement MLOps-driven model deployment and monitoring, tracking key metrics such as inference latency, token usage costs, and output quality drift.
- Manage vector databases (e.g., Qdrant, Pinecone, FAISS) and design indexing strategies for Retrieval-Augmented Generation (RAG) architectures.
- Collaborate with data scientists/analysts, and other stakeholders to understand data and LLM requirements and deliver solutions.
- Implement data governance best practices, ensuring data quality, security, and compliance, including lineage tracking for text sources and redaction pipelines for PII detection.
- Monitor and troubleshoot data pipelines and LLM deployments, resolving issues in a timely manner.
- Create and maintain documentation for all data-related processes, procedures, and workflows, including LLM-specific pipelines and deployments.
- Research and stay up-to-date with the latest trends, technologies, and best practices in data engineering, MLOps, and LLM technologies.
What You Will Need To Be Great In This Role
- 5+ years of experience as a Data Engineer with 2+ years focused on MLOps.
- Strong proficiency in Python, SQL, and data orchestration tools (e.g., Airflow).
- Experience with cloud platforms like AWS (SageMaker), Google Cloud Platform (Vertex AI), or Azure Machine Learning for managed LLM deployments.
- Familiarity with data warehouse solutions such as Snowflake or BigQuery.
- Experience with big data technologies like Spark, Hadoop, or Kafka.
- Understanding of data modeling and schema design (e.g., dimensional modeling).
- Proficiency with version control systems like Git.
- Excellent problem-solving and debugging skills.
- Strong communication skills and the ability to work collaboratively with cross-functional teams.
- Experience working in Agile development environments.
- Hands-on experience with Hugging Face Transformers, LangChain for prompt engineering, and LlamaIndex for document indexing.
- Portfolio demonstrating deployed LLM applications with measurable performance metrics.
Our Stack
- Languages: Python, SQL
- Tools: Apache Airflow, Kafka (MSK, RedPanda), LangChain, Langsmith
- Cloud Platforms: AWS (S3, Databricks)
- Databases: Postgres, MongoDB, Vector Databases (Qdrant)
- Version Control: Git
Preferred
- Experience with containerization tools like Docker and orchestration platforms like Kubernetes.
- Familiarity with modern data streaming tools (e.g., Kafka, Kinesis).
- Familiarity with Natural Language Processing (NLP) / LLM.
- Familiarity with chunking & data transformation for LLMs.
- Familiarity with Vector Databases / Embedding Stores.
- Hands-on experience with real-time analytics or machine learning pipelines.
- Exposure to or interest in data visualization tools like Tableau, Looker, or Streamlit.
- Experience with specialized LLM techniques.
- Implementation of OpenTelemetry for distributed tracing and integration with Betterstack/Grafana dashboards.
This role is based in New York City and will be expected to work from our New York office Monday - Wednesday.
Compensation (NYC Only)
Pursuant to Section 8-102 of Title 8 of the New York City administrative code, the base salary range for this role is $140,000.00 - $180,000.00. Total compensation packages are based on various factors unique to each candidate, including but not limited to skill set, years and depth of experience, certifications, and specific office location.
Recommended Jobs
Inside Sales
If you want to work with a purpose-driven organization that has a family culture, keep reading! Famous is a place for positive, growth-oriented, high-performers, not someone who’s just interested in …
Retail Sales Specialist - Part-Time - $18.00 per hour, plus commission and incentives!
This role requires the ability to work lawfully in the U.S. without employment-based immigration sponsorship, now or in the future. Do you have a passion for connecting with people and driving sales…
Physical Therapist FFS
As a home care agency, Able Health Care Service is dedicated in providing top quality care to all of our clients. Currently, we are looking for individuals who are compassionate and caring to be apar…
Office Engineer
Job Description Job Description Hazen and Sawyer is seeking an Office Engineer to join our team in the NYC Metro area. The ideal candidate will have experience in construction inspection or prefe…
Snow Removal Subcontractors
About Us Berrington Snow is a trusted leader in snow and ice management, providing reliable winter services to commercial and industrial clients across Nassau and Suffolk Counties. As the winter se…
Business Coach / Consultant, Exit Strategy (NY)
Exit Factor is Expanding Their Already Successful Team! You must be located in Buffalo, NY to apply for this position. We are hiring in the Buffalo, NY market only. What is Exit Factor? Exit…
Electrician
One of our clients, an established electrical contractor based in Genesee County, is looking for an Electrical Foreman to join the team. Job description: Our goal is to add people to our team tha…
Part Time Office Assistant
Part Time Office Assistant We are seeking a part time Office Assistant to join our team. The ideal candidate will be dependable and support daily operations to help keep our team running efficiently…
Registered Nurse - Ambulatory Care (Days) - RNAC 25-24121
Job Title: Registered Nurse – Ambulatory Care (Days) Duration: 13 Weeks contract Location: Syracuse, NY Shift:- Days Pay: Travel: $2,600 weekly include stipend Local: $2,200 weekly…
Pet Parent Coordinator (Front Desk)
Bring your dog to work? That’s right! Dogtopia, the industry leader in dog daycare, boarding, and spa services has an immediate opening for an energetic, organized, business-minded individual that wi…