Google Build with AI Series (Waterloo) - Building Multimodal Agents


Feb 27, 10:00 AM EST. Add to calendar: Google | Outlook
  • Canada-Waterloo (RSVP for full address) Google
  • 1 RSVPs
Description
Speaker

Join Google Cloud for the agentic AI bootcamp to learn how to build production-ready multimodal agents.

The ‘Next’ – Capturing Innovation
The era of the text-only chatbot is evolving. Focus on the bleeding edge of AI: Multimodality. We’ll explore how to build intelligent agents that can see, hear, and respond to the world in real-time, creating immersive experiences that feel more human than ever before.

What to Expect:
- Multimodal Gemini Agents: Coordinate agents to analyze video and audio while maintaining character consistency across multi-turn image generation.
- Intelligence Beyond RAG: Move past simple retrieval with hybrid search, context engineering, and multi-agent pipelines.
- Real-Time Live Interaction: Build low-latency, interruptible agents that "see" and "hear" using the Gemini Live API and bidirectional streaming.

Who Should Attend?
This hands-on workshop is designed for software developers, data scientists, and AI practitioners who have some experience building applications or working with models, and are looking to productionize them. To get the most out of the labs, you should have foundational knowledge of a programming language like Python and be comfortable using the command-line interface. While expertise is not required, a basic understanding of Cloud computing concepts, web APIs, and containerization technology like Docker will be highly beneficial.

To participate, you must bring your own laptop and power cable. The activities are intended for laptops and cannot be completed on a tablet or phone.

Agenda
10:00AM - 11:00AM
Registration & Check-In
11:00AM - 11:30AM
The Multimodal Shift
Why text-only is the "old story." Introduction to agents that process the world like humans (audio/video/vision).
11:30AM - 1:00PM
Multimodal Orchestration
Build specialized agent crews using ADK and MCP. Master character consistency through multi-turn image generation and coordinate parallel agents to reason across text, image, and video.
1:00PM - 1:45PM
Lunch Break
1:45PM - 3:15PM
Intelligence Beyond RAG
Move past simple retrieval using RAG and context engineering for long-term personalization. Build autonomous pipelines to transform unstructured multimodal data into structured intelligence.
3:15PM - 4:45PM
Real-Time Live Interaction
Implement low-latency, interruptible agents via bidirectional streaming. Orchestrate proactive Streaming Tools and EDA for resilient, real-time control.
5:00PM - 6:00PM
Builders Demo
Rapid showcase of what was built, final resources for certification, and "Next Steps" for the community.
6:00PM - 7:00PM
Happy Hour

Venue:
Waterloo, Canada (RSVP for full address)

Approval Required. Your registration is subject to approval.

Contact Organizer