Why Observability Matters (More!) with AI Applications

code red code red

As large language models (LLMs) move into production, observability is essential for ensuring reliability, performance, and responsible AI. In this talk, Sally will walk through deploying an open-source observability stack using Prometheus, Grafana, Tempo, and OpenTelemetry Collectors on Kubernetes and demonstrate how to monitor real AI workloads using vLLM and Llamastack.

The session will explore why LLMs are uniquely challenging to monitor—from probabilistic outputs to dynamic memory use and complex inference pipelines—and what kinds of telemetry are essential to overcome those challenges.

Attendees will learn to capture and interpret key signals like token counts, GPU utilization, latency, and failure modes to optimize performance, manage costs, and surface issues like hallucinations, drift, or prompt injection. Through live examples and open tooling, this session will show how observability turns opaque model behavior into actionable insight.


Speaker

Sally O'Malley

Principal SWE @RedHat, Emerging Technologies, Office of the CTO | Recipient of Paul Cormier Trailblazer Award, 2025

Sally Ann O'Malley is a principal software engineer at Red Hat, where she has made significant contributions to teams within OpenShift and the Emerging Technologies organization. For the past decade, she has been instrumental in integrating cutting-edge tools and ideas into Red Hat's portfolio. Her recent projects, including Image Based Operating Systems, Podman, OpenTelemetry, and Sigstore, showcase her deep commitment to open-source. Sally shares her expertise as an instructor within Boston University’s College of Data Science. Furthermore, she is an organizer for DevConf.US, a developer focused conference for open-source. Most recently, Sally was honored as the recipient of the 2025 Paul Cormier Trailblazer award within Red Hat, and her current focus includes the incubation of the llm-d project and its community, where she is a founding contributor dedicated to building distributed, scalable, high-performance AI inferencing solutions.

Read more
Find Sally O'Malley at: