Software Engineer, Infrastructure
About Basis
Basis is a nonprofit applied AI research organization with two mutually reinforcing goals.
The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles.
The second is to advance society’s ability to solve intractable problems . This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future.
To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.
About the Role
Software Engineers on the Platform team at Basis build the infrastructure that accelerates research and enables commercial deployment of Basis innovations. You will create reliable training and evaluation infrastructure, manage compute resources scaling to medium-scale models, develop SaaS platform offerings, and build the technical foundation that supports both internal research and external customers.
We are looking for people who excel at infrastructure engineering and understand the unique demands of ML systems at scale. The ideal Software Engineer has experience with distributed systems, cloud infrastructure, and ML training pipelines, and brings a reliability-focused mindset that ensures researchers can trust the systems they depend on. You will work at the intersection of cutting-edge research and production-grade infrastructure.
This role is central to Basis’s commercial strategy and scaling objectives. The Platform team develops general-purpose infrastructure separate from individual design partner teams, enabling replication-based growth across multiple domains and clients.
We seek individuals who aspire to build rigorous, high-quality, robust systems, but are not afraid to iterate quickly, learn from production, and explore different architectural approaches to achieve excellence.
Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building infrastructure for problems larger than ones they can tackle alone.
We expect you to:
Have demonstrated significant technical achievements in infrastructure engineering . Examples include:
Building ML training or inference infrastructure for distributed systems
Developing cloud platforms or services used by multiple teams or customers
Creating developer tools, CI/CD systems, or deployment automation at scale
Contributing to infrastructure open-source projects or technical systems with high reliability requirements
Possess deep understanding of distributed systems principles including consistency, availability, fault tolerance, scalability patterns, and performance optimization for high-throughput, low-latency workloads.
Have hands-on experience with cloud platforms (AWS, GCP, Azure) including compute orchestration, storage systems, networking, and cost optimization strategies. Experience managing significant cloud budgets is valuable.
Be proficient in infrastructure technologies including Kubernetes, Docker, infrastructure as code (Terraform), CI/CD pipelines, monitoring and observability (Prometheus, Grafana), and modern DevOps practices.
Understand ML infrastructure requirements including GPU cluster management, distributed training frameworks (PyTorch Distributed, DeepSpeed, Ray), experiment tracking, model versioning, and reproducible research pipelines.
Have experience with systems programming languages including Python (primary for ML), and familiarity with Go, Rust, or C++ for performance-critical components.
Value reliability and operational excellence . You design systems that fail gracefully, monitor proactively, and enable teams to debug and recover quickly when issues arise.
Progress with autonomy on complex technical challenges . You can scope infrastructure projects, make sound architectural decisions, and execute from design through deployment.
Be excited about enabling breakthrough research that advances society’s ability to solve intractable problems through robust, scalable infrastructure.
In addition, the following would be an advantage:
Experience at companies building ML infrastructure at scale (Anthropic, OpenAI, Google, Meta AI Research, Weights & Biases, HuggingFace).
Background in ML research or research engineering providing understanding of researcher workflows.
Experience with on-premise GPU cluster management or hybrid cloud architectures.
Contributions to infrastructure open-source projects (Kubernetes, PyTorch, Ray).
SRE background or experience with production ML systems serving external customers.
Understanding of AI safety and responsible AI deployment practices.
Responsibilities:
Design and build ML training infrastructure supporting medium-scale models with distributed training across GPU clusters, experiment tracking, checkpoint management, and reproducible pipelines.
Develop SaaS platform and API offerings that package Basis research innovations into commercial products, including backend services, API design, authentication, rate limiting, and customer-facing features.
Manage compute infrastructure as it scales, including capacity planning, resource allocation, cost optimization, cloud and on-premise orchestration, and efficiency monitoring.
Build developer tools and workflows that accelerate research velocity including CI/CD pipelines, testing frameworks, deployment automation, and development environment management.
Implement monitoring and observability providing comprehensive visibility into system health, performance, costs, and research progress through metrics, logging, alerting, and dashboards.
Ensure system reliability and scalability by designing fault-tolerant architectures, implementing graceful degradation, conducting load testing, and establishing SLAs appropriate for research and production workloads.
Collaborate with research teams to understand infrastructure needs, translate experimental techniques into scalable systems, and provide technical consultation on architecture and performance.
Maintain security and compliance implementing access controls, encryption, audit logging, and adherence to data governance policies as Basis serves external customers.
Contribute to the culture and direction of Basis by modeling technical excellence, operational discipline, and focus on enabling high-impact research and commercial applications.
Role Details
Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.
FT/PT: Full-time.
In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
Location: New York City.
Salary range: Competitive salary.
Privacy Notice
By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.
Read our full Global Data Privacy Notice here .
Recommended Jobs
Coordinator of Online Learning, Full-time
Thank you for considering Jamestown Community College in your search. About Jamestown Community College:Jamestown Community College, a SUNY institution, is a comprehensive community college with de…
Product Manager
About Kustomer Kustomer is the industry leading conversational CRM platform perfecting every customer experience. Built with intelligent tools such as AI and Automation, no code-configuration and a …
Senior Product Manager, Acquisition (Provider)
Headway’s mission is a big one – to build a new mental health care system everyone can access. We’ve built technology that helps people find great therapists with the first software-enabled national …
Director, AI/ML Engineering - Remote
Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs, but promotes jobs on LinkedIn on behalf of …
Microsoft D365 CRM OR AI/CoPilot Functional Solution Architect, Senior Manager Save for Later Remove job
At PwC, our people in business application consulting specialise in consulting services for a variety of business applications, helping clients optimise operational efficiency. These individuals an…
2026 Business Strategy & Operations Summer Internship
Job Requisition ID: 92534 Location Designation: Hybrid - 3 days per week The New York Life Summer Internship experience offers an exciting opportunity for rising juniors and seniors lookin…
Care Manager Social Worker
Care Manager Social Worker Job Ref: 128690 Category: Utilization Review and Case Management Department: CASE MANAGEMENT Location: 50 Water Street, 7th Floor, New York, NY 10004 Job Type: …
Pastry Assistant
Pastry Assistant (Cook) — Supermoon Bakehouse (Lower East Side) Overview Supermoon Bakehouse is seeking a Pastry Assistant (Cook) to support our pastry production team and help maintain an orga…
Open Call
Makina is now hiring for front-of-house and back-of-house positions. We are seeking passionate, dependable, and hospitable team members dedicated to providing an authentic and memorable Ethiopian cul…
Lead Cashier
OVERVIEW Experience a place of energy, passion, and excitement. A place where the joy of discovery and uncommon artistry blend to create exhilarating buying experiences—for true beauty enthus…