Ensuring zero-downtime ML deployments is a challenge for SREs. Traditional observability falls short for ML at scale. This talk explores the ML-SRE gap, breaking down systems and introducing key techniques to enhance observability, monitor performance, and ensure seamless, reliable deployments.
Learn for free, join the best tech learning community
Event notifications, weekly newsletter
Access to all content