Classification: Contract-To-Hire
Contract Length: 6 Months
Position Summary
The Staff MLOps Engineer/Consulting Level is responsible to design, implement, and maintain the infrastructure and tools necessary to support our machine learning workflows on Google Cloud Platform (GCP). This role is firmly collaborative, with an expectation of constant contact, communication, and iteration with data scientists, machine learning engineers, and other stakeholders to ensure efficient and reliable deployment of ML models. The ML Ops Engineer must be comfortable and capable of clearly communicating the nature and results of complex infrastructure and deployment processes to both technical and non-technical users in written, spoken, and visual means.
Responsibilities:
- Infrastructure Design and Maintenance: Design, build, and maintain scalable ML infrastructure on GCP.
- Infrastructure as Code: Develop and manage infrastructure as code using Terraform.
- ML Workflow Optimization: Implement and optimize ML workflows using Vertex AI.
- CI/CD Pipelines: Create and maintain CI/CD pipelines using GitHub workflows.
- Collaboration: Work closely with cross-functional teams to understand requirements and deliver solutions.
- System Monitoring and Troubleshooting: Monitor and troubleshoot ML systems to ensure high availability and performance.
- Continuous Learning: Stay up to date with the latest industry trends and best practices in ML Ops and cloud technologies.
Requirements:
- GCP Expertise: Proven experience in ML Ops, with a focus on GCP.
- Terraform Proficiency: Strong proficiency in Terraform for infrastructure as code.
- Vertex AI Experience: Hands-on experience with Vertex AI for ML workflows.
- CI/CD Practices: Expertise in CI/CD practices and tools, particularly GitHub workflows.
- Containerization and Orchestration: Solid understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).