Job brief.

Job Role: Lead MLops Engineer
Experience: 12-18 Yrs
Location: Bangalore
Work Culture: Hybrid

Key Responsibilities

  • MLOps Leadership: Lead the design and implementation of robust MLOps practices, managing the deployment, scaling, and maintenance of machine learning models in production.
  • Cloud Infrastructure Strategy: Drive cloud architecture strategy for ML workloads, leveraging Azure or AWS to build scalable, secure, and cost-effective infrastructure for model deployment, monitoring, and scaling.
  • CI/CD Pipeline Design & Automation: Lead the development and maintenance of automated CI/CD pipelines for machine learning models, ensuring efficient integration, testing, and deployment workflows.
  • Model Monitoring & Optimization: Oversee the monitoring of model performance in production, implement model drift detection, manage retraining schedules, and ensure operational excellence in production ML systems.
  • Collaboration & Mentorship: Collaborate with cross-functional teams, including data scientists, software engineers, and infrastructure teams, to ensure successful deployment of ML models. Mentor and guide junior MLOps engineers and team members.
  • Continuous Improvement: Advocate and lead the adoption of new tools, techniques, and best practices to improve the efficiency, reliability, and scalability of the MLOps pipeline.
  • Automation & Efficiency: Champion the automation of model deployment, monitoring, and testing processes, ensuring that repetitive tasks are minimized, and model updates can be delivered rapidly and reliably.
  • Security & Compliance: Ensure that all ML deployments adhere to internal security protocols, governance, and regulatory compliance requirements, particularly in cloud environments.
  • Documentation & Reporting: Create and maintain comprehensive documentation for workflows, pipelines, infrastructure, and models. Provide regular reports to stakeholders on system performance, new features, and model health.
  • Risk Management: Identify and mitigate risks related to model performance, infrastructure stability, and deployment failures, ensuring the continuous operation of machine learning models.

Technical Skills

  • MLOps Expertise: Deep experience in designing, implementing, and optimizing end-to-end MLOps pipelines, including automation of model deployment, testing, scaling, and monitoring.
  • Cloud Platforms: Extensive experience with Azure or AWS services such as Azure Machine Learning, AWS Sagemaker, EC2, Lambda, Kubernetes, and related tools for managing cloud infrastructure for ML deployments.
  • CI/CD Pipelines: Strong knowledge of CI/CD tools (e.g., Jenkins, GitLab CI, Azure DevOps, AWS CodePipeline) for building automated pipelines for model deployment, testing, and monitoring.
  • Containerization & Orchestration: Expertise in Docker for containerizing machine learning models and using Kubernetes or Azure Kubernetes Service (AKS) for managing and orchestrating containers at scale.
  • Programming: Proficiency in Python for building ML workflows, automation scripts, data processing, and integration with ML frameworks and cloud infrastructure.
  • ML Lifecycle Management: Experience with tools like MLflow, Kubeflow, or TensorFlow Extended (TFX) for managing the full lifecycle of ML models from experimentation to deployment and monitoring.
  • Version Control & Collaboration: Expertise with Git and Git-based workflows for code versioning, collaboration, and deployment in team environments.
  • Monitoring & Logging: Experience with monitoring and logging tools such as Prometheus, Grafana, or Azure Monitor to track model performance, health, and detect anomalies in production.
  • Data Engineering: Familiarity with big data processing tools such as Apache Kafka, Apache Airflow, or Spark for handling large-scale data processing needs related to model training and deployment.

Apply for this job

Use the form below to submit your job application.

Allowed Type(s): .pdf, .doc, .docx