Machine Learning Operations Engineer

Company: RAI Institute
Location: Cambridge
Posted on: April 4, 2025

Job Description:

Our MissionOur mission is to solve the most important and fundamental challenges in AI and Robotics to enable future generations of intelligent machines that will help us all live better lives.Machine Learning Operations (ML-Ops) Engineers build infrastructure that supports the entire lifecycle of Machine Learning (ML) projects from development to scaling and to deployment. If you have a passion for building the foundation that enables robotics research and engineering, you will want to join us!What You Will Do

Design, develop, and maintain company-wide platforms and tooling that utilize Kubernetes infrastructure to enable machine learning and data processing applications
Enable self-service access to ML-compute for our on-prem and cloud compute clusters, including support for job scheduling, workload scalability and workload fault tolerance
Enhance observability across ML applications through integrations with tools and services such as FluentD, Prometheus, Grafana and DataDog
Integrate ML applications with experiment tracking and management services like Weights and Biases
Elevate code quality and champion best practices in our engineering processes
Collaborate with Machine Learning Engineers, Data Engineers, DEVOPs engineers and researchers to build scalable solutions that improve engineering and research velocity.What You Will Bring
- BS or MS in Computer Science, Engineering, or equivalent
- 3+ years of experience in an MLOPs, DevOps, ML Engineering or software engineering role
- Strong hands-on experience deploying and managing applications running on Kubernetes
- Experience developing MLOPS platforms to manage the lifecycle of ML experiments; including one or more of data and artifact management, reproducibility, fault-tolerance, experiment tracking and model serving
- Experience with Docker and Python environment management tools such as pip, poetry, uv or similar
- Proficient in software practices such as version control (Git), CI/CD (Github Actions, ArgoCD), Infrastructure as Code(Terraform).Extra Skills We Value
  - Experience with Kueue, or similar job scheduling mechanisms
  - Experience with workflow orchestration tools such as Airflow, Metaflow, Argo Workflows or similar
  - Hands-on experience deploying and managing cloud infra on platforms like GCP and AWS
  - Experience with hybrid-cloud compute and data environments
  - Experience with Ray, Pytorch Lightning or similar scalable AI/ML platforms
  - Experience with application and system logging with tools and services like FluentD, Prometheus, Grafana and DataDog or similar
  - Experience with Bazel build tool or similar
  - Experience with ML model serving frameworks such as Torchserve, ONNX runtime or similar
  - Experience working with research teams in an academic or industrial environment.We provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
    #J-18808-Ljbffr

Keywords: RAI Institute, Concord , Machine Learning Operations Engineer, Engineering , Cambridge, New Hampshire

Click here to apply!

Didn't find what you're looking for? Search again!

Let Cambridge recruiters find you. Post your resume for free!

Get Cambridge Engineering jobs via email.

View more Concord Engineering jobs

Other Engineering Jobs

Solutions Engineer (Wisconsin)
Description: About NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers (more...)
Company: NetApp
Location: Quincy
Posted on: 04/16/2025

Staff Data Scientist / Staff AI/ML Engineer
Description: About NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers (more...)
Company: NetApp
Location: Cambridge
Posted on: 04/16/2025

Sr. Manager, Enterprise Solutions Engineering (West Region)
Description: About NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers (more...)
Company: NetApp
Location: Cambridge
Posted on: 04/16/2025

Salary in Concord, New Hampshire Area | More details for Concord, New Hampshire Jobs |Salary

Senior Machine Learning Engineer, Personalization
Description: The Now Playing View team is a new surface on Spotify that gives users a chance to engage deeply with the content they are currently playing, and explore new content related to what's currently playing. (more...)
Company: Spotify
Location: Boston
Posted on: 04/15/2025

Solutions Engineer - Enterprise Sales (Denver, CO)
Description: About NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers (more...)
Company: NetApp
Location: Quincy
Posted on: 04/16/2025

Mechanic II / Waste Water Technician - 24/7 Nights
Description: Description: br The TSUBAKI name is synonymous with excellence in quality, dependability and customer service. U.S. Tsubaki Automotive, LLC is an international tier-one supplier of high-speed chain (more...)
Company: US Tsubaki Automotive, LLC
Location: Chicopee
Posted on: 04/15/2025

Solutions Engineer (Wisconsin)
Description: About NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers (more...)
Company: NetApp
Location: Lynn
Posted on: 04/16/2025

Cloud Infrastructure Site Reliability Engineer
Description: About NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers (more...)
Company: NetApp
Location: Quincy
Posted on: 04/16/2025

Staff Engineer - Modeling
Description: By clicking the Apply button, I understand that my employment application process with Takeda will commence and that the information I provide in my application will be processed in line with Takeda's (more...)
Company: Takeda
Location: Boston
Posted on: 04/15/2025

Controls Engineer
Description: Controls Engineer Needed -Local Candidates OnlyThis Jobot Job is hosted by: Andrew AtchisonAre you a fit Easy Apply now by clicking the Apply button br and sending us your resume.Salary: 75,000 (more...)
Company: Jobot
Location: Wilmington
Posted on: 04/15/2025

Loading more jobs...

Machine Learning Operations Engineer

Didn't find what you're looking for? Search again!

Other Engineering Jobs

Log In or Create An Account