Location: Chennai / Remote Company: Smartan Fit
About Smartan Fit:
Smartan Fit is an innovative fitness tech company dedicated to transforming the gym experience for owners, trainers, and members. We leverage advanced technology to provide real-time insights and personalized recommendations, empowering the fitness community to achieve their goals more efficiently and effectively.
Role Overview:
We are seeking an experienced ML Ops Engineer to partner closely with our AI Engineering team to build and maintain robust machine learning and AI infrastructure. The ideal candidate will focus on creating scalable pipelines and systems that enable efficient development, deployment, and monitoring of AI models. You will be responsible for bridging the gap between AI innovation and production reliability, ensuring our AI solutions perform consistently at scale.
Key Responsibilities:
- Design and implement end-to-end MLOps pipelines for model development, training, and deployment
- Build and maintain scalable AI training and serving infrastructure
- Create automated CI/CD pipelines specifically tuned for deep learning models
- Implement comprehensive monitoring systems for model performance and system health
- Develop scalable solutions for efficient serving of large AI models in production
- Collaborate with AI Engineers to optimize model deployment workflows
- Set up and maintain model registry and versioning systems
- Implement distributed training environments and data pipelines
- Design and maintain testing frameworks for AI systems
- Establish metrics collection and visualization for model performance
- Create reproducible environments for AI development
- Optimize GPU/TPU resource utilization and management
- Implement disaster recovery and high availability for AI systems
- Set up distributed training environments for large-scale models
- Implement efficient data pipelines for training and inference
Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
- 3+ years of experience in MLOps, DevOps, or similar roles
- Strong experience with cloud platforms (AWS, GCP, Azure) and their ML/AI services
- Expertise in containerization (Docker) and orchestration (Kubernetes)
- Proficiency in Python and infrastructure automation
- Experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions)
- Knowledge of ML/AI frameworks (TensorFlow, PyTorch) and model serving platforms
- Understanding of distributed systems and microservices architecture
Preferred Qualifications:
- Experience with model parallelism and distributed training
- Knowledge of model optimization techniques (quantization, pruning, distillation)
- Familiarity with GPU/TPU infrastructure management
- Experience with large model deployment and serving
- Understanding of ML-specific security and privacy considerations
- Experience with ML experiment tracking platforms (MLflow, Weights & Biases)
- Knowledge of feature store implementations
- Experience with real-time inference services
- Background in performance optimization and scaling
- Expertise in monitoring and observability tools
Technical Stack:
- Containerization & Orchestration: Docker, Kubernetes
- CI/CD: Jenkins, GitLab CI, GitHub Actions
- ML Platforms: MLflow, Kubeflow
- Monitoring: Prometheus, Grafana
- Infrastructure as Code: Terraform
- Version Control: Git
- Model Serving: TensorFlow Serving, Triton
- Message Brokers: Kafka
- GPU Management: NVIDIA Docker, CUDA
- Distributed Training: Horovod, DeepSpeed
- ML Experiment Tracking: MLflow, Weights & Biases
- Monitoring and Logging: ELK Stack, Datadog
What We Offer:
- Competitive salary and benefits package
- Opportunity to build cutting-edge MLOps infrastructure
- Collaborative engineering environment
- Professional growth and development opportunities
- Access to modern cloud infrastructure and tools
- Work with state-of-the-art ML/AI technologies
- Regular training and upskilling opportunities
If you’re passionate about Machine Learning and want to make an impact in the fitness tech industry, we’d love to hear from you! Send your resume and portfolio to [email protected] with the subject line “Application for ML Ops Engineer Position.”