Welcome to the AI Deep Dive Series (Virtual) with
This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording.
Join
Tech Talk: Evaluating AI Agent Reliability
Speaker: Anupam Datta (Snowflake)
| Josh Reini (Snowflake) ![]()
Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through?
These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it.
This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics
Venue:
virtual, join from anywhere.
Global AI Tech Community on Discord
Join us on discord for local and global AI tech community:
- Events chat: chat and connect with speakers and global and local attendees;
- Learning AI: events, learning materials, study groups;
- Startups: innovation, projects collaborations, founders/co-founders;
- Jobs and Careers: job openings, post resumes, hiring managers