AI Deep Dive Series (Virtual) - Evaluating AI Agent Reliability


Jan 21, 10:00 AM PST
  • Virtual Snowflake
  • 861 RSVPs
Description
Speaker

Welcome to the AI Deep Dive Series (Virtual) with Snowflake. Join us for deep dive tech talks on AI, GenAI, LLMs, Agentic AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording.

Join Snowflake to learn how to evaluate AI Agent Reliability.

Tech Talk: Evaluating AI Agent Reliability
Speaker: Anupam Datta (Snowflake) | Josh Reini (Snowflake)
Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through?
These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it.
This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior

  • 95% error detection (vs. 55% baseline methods)
  • 86% accuracy in pinpointing where an error occurred (vs. 49% baseline methods)
  • Human reviewers using the GPA framework caught 100% of the internal agent errors in the TRAIL/GAIA dataset.

You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready.

Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Venue:
virtual, join from anywhere.

Global AI Tech Community on Discord
Join us on discord for local and global AI tech community:
- Events chat: chat and connect with speakers and global and local attendees;
- Learning AI: events, learning materials, study groups;
- Startups: innovation, projects collaborations, founders/co-founders;
- Jobs and Careers: job openings, post resumes, hiring managers

Anupam Datta, Josh Reini

The event ended.
Watch Recording
*Recordings hosted on Youtube, click the link will open the Youtube page.
Contact Organizer