Why Observability Matters (More!) with AI Applications

code red code red

As large language models (LLMs) move into production, observability is essential for ensuring reliability, performance, and responsible AI. In this talk, Sally will walk through deploying an open-source observability stack using Prometheus, Grafana, Tempo, and OpenTelemetry Collectors on Kubernetes and demonstrate how to monitor real AI workloads using vLLM and Llamastack.

The session will explore why LLMs are uniquely challenging to monitor—from probabilistic outputs to dynamic memory use and complex inference pipelines—and what kinds of telemetry are essential to overcome those challenges.

Attendees will learn to capture and interpret key signals like token counts, GPU utilization, latency, and failure modes to optimize performance, manage costs, and surface issues like hallucinations, drift, or prompt injection. Through live examples and open tooling, this session will show how observability turns opaque model behavior into actionable insight.


Date

Tuesday Jun 10 / 10:20AM EDT ( 50 minutes )

Location

Auditorium

Share