MLOps & Infrastructure
Model Lifecycle Management
Version Control
- ML Version Control Fundamentals
- Model Versioning
- Dataset Versioning
- Experiment Tracking
- DVC (Data Version Control)
Experiment Management
- Experiment Tracking Best Practices
- MLflow
- Weights & Biases
- Neptune.ai
- Reproducibility in ML
- Hyperparameter Tracking
Model Registry
Deployment Pipelines
Monitoring & Observability
- ML Monitoring Fundamentals
- Model Performance Monitoring
- Data Drift Detection
- Concept Drift
- Feature Drift
- Model Staleness
- Logging for ML Systems
- Alerting and Incident Response
Infrastructure & Scaling
Compute Resources
- GPU Fundamentals for ML
- TPU Overview
- GPU Utilization Optimization
- Multi-GPU Training
- GPU Memory Management
- Mixed Precision Training
Distributed Training
- Distributed Training Overview
- Data Parallelism
- Model Parallelism
- Pipeline Parallelism
- Distributed Data Parallel (DDP)
- Fully Sharded Data Parallel (FSDP)
- Horovod
- DeepSpeed
Model Optimization
- Model Compression Techniques
- Quantization
- Pruning
- Knowledge Distillation
- ONNX Runtime
- TensorRT
- Neural Architecture Search
Edge Deployment
Cloud Platforms
- Cloud ML Platforms Overview
- AWS SageMaker
- Azure Machine Learning
- Google Cloud Vertex AI
- Databricks ML
- Serverless ML Deployment
Container & Orchestration
Storage & Databases
📊 Progress Tracking
TABLE
status as "Status",
difficulty as "Difficulty",
last_modified as "Last Updated"
FROM "01 - ML & AI Concepts/03 - MLOps & Infrastructure"
WHERE contains(tags, "concept")
SORT file.name ASC🎓 Learning Path
Recommended Order:
- Start with Model Lifecycle basics (Version Control, Experiment Tracking)
- Learn Deployment Patterns
- Study Monitoring & Observability
- Understand Compute Resources
- Master Model Optimization
- Explore Cloud Platforms
- Advanced: Distributed Training and Edge Deployment
Back to: ML & AI Index