ML ops
**Job Description:**
**The ML Ops Lead drives the design, deployment, and optimization of machine learning solutions, balancing hands-on engineering with strategic leadership to enable robust, scalable, and maintainable AI infrastructure.**
**Key Responsibilities:**
* Architect and maintain scalable ML infrastructure, self-service ML pipelines, and CI/CD workflows for model training and deployment.
* Lead and mentor an MLOps team, fostering technical excellence and continual improvement.
* Design high-scale distributed training and inference environments using cloud (AWS, GCP) and on-premises resources.
* Build and manage feature stores, data ingestion, preprocessing, and validation pipelines.
* Implement A/B testing, canary releases, monitoring, and rollback mechanisms for production ML models.
* Ensure compliance with data governance, privacy, and security standards; manage role-based access controls for ML infrastructure.
* Collaborate with data scientists, software engineers, DevOps, and product teams to bring models from experimentation to enterprise-grade production.
**Required Skills and Experience:**
* Deep expertise in creating and managing machine learning infrastructure and orchestration frameworks (e.g., Kubeflow, MLflow, Airflow).
* Proficiency in cloud platforms (AWS, GCP), Kubernetes, Terraform, and distributed computing.
* Having databricks MLflow knowledge.
* Excellent skills in Python and ML frameworks (TensorFlow, TorchServe), CI/CD automation, and pipeline management.
* Strong analytical, problem-solving, and project management abilities.
* Demonstrated ability to build, scale, and lead technical teams.
* Solid understanding of data compliance, governance, and model monitoring.
* Master’s degree in a technical field (Computer Science, Data Science, ML, or equivalent).
**Desired Qualifications:**
* Experience optimizing GPU/TPU utilization and large-scale storage solutions.
* Track record in designing robust monitoring systems for model drift, downtime, and performance.
* Familiarity with the challenges of deploying models in real-time, multi-cloud, or edge environments.
* Ability to innovate and continuously improve workflows, combining ML and human computation.
Recommended Jobs
Senior/Lead Backend Engineer - NYC
About Medal At Medal , we’re redefining the way gamers connect, share, and relive their greatest in-game moments. Our platform makes it easy to clip, edit, and share gaming content —whether you'r…
Accountant
We are seeking an Accountant to join our Corporate Accounting team. This role will support the day-to-day financial management of our consumer revenue streams, assist with sales and use tax and other…
Instructor - biology - state
Alfred State College Title: Adjunct Instructor - Biology - Alfred State College Location: Alfred, NY Category: Adjunct Faculty Posted On: Fri Aug 29 2025 Job Description: Position Ove…
Ford Mobile Service Technician
Sayville Ford, Long Island's Ford Giant continues to grow. We are Long Island's busiest and largest Ford Service Department. Whether you are starting your career or advancing it, when you join our tea…
Full Stack Engineer
Full Stack Engineer Location: Hybrid in NYC (3 day in-office requirement) Qualifications: ~4-5+ years of professional experience as a software engineer with frontend and backend development.…
Production service developer
Job Description We are seeking an experienced Data Engineer to join the AssetHub development team, playing a pivotal role in managing and optimizing the data of our Centralized Cloud Inventory.…
Credit customer manager
About the Role: Grade Level (for internal use): 09 The Role: C&RS Customer Success Manager The Team: The Credit Customer Success Management team directly supports our Risk & Valuation Ser…