Learn AI with global developers community

Google Build with AI Series (Waterloo) - Building Multimodal Agents

Feb 27 2026, 10:00 AM EST

Canada-Waterloo (RSVP for full address) Google
50 RSVPs

Join Google Cloud for the agentic AI bootcamp to learn how to build production-ready multimodal agents.

The ‘Next’ – Capturing Innovation
The era of the text-only chatbot is evolving. Focus on the bleeding edge of AI: Multimodality. We’ll explore how to build intelligent agents that can see, hear, and respond to the world in real-time, creating immersive experiences that feel more human than ever before.

What to Expect:
- Multimodal Gemini Agents: Coordinate agents to analyze video and audio while maintaining character consistency across multi-turn image generation.
- Intelligence Beyond RAG: Move past simple retrieval with hybrid search, context engineering, and multi-agent pipelines.
- Real-Time Live Interaction: Build low-latency, interruptible agents that "see" and "hear" using the Gemini Live API and bidirectional streaming.

Who Should Attend?
This hands-on workshop is designed for software developers, data scientists, and AI practitioners who have some experience building applications or working with models, and are looking to productionize them. To get the most out of the labs, you should have foundational knowledge of a programming language like Python and be comfortable using the command-line interface. While expertise is not required, a basic understanding of Cloud computing concepts, web APIs, and containerization technology like Docker will be highly beneficial.

To participate, you must bring your own laptop and power cable. The activities are intended for laptops and cannot be completed on a tablet or phone.

Agenda

10:00AM - 11:00AM

Registration & Check-In

11:00AM - 11:30AM

The Multimodal Shift
Why text-only is the "old story." Introduction to agents that process the world like humans (audio/video/vision).

11:30AM - 1:00PM

Multimodal Orchestration
Build specialized agent crews using ADK and MCP. Master character consistency through multi-turn image generation and coordinate parallel agents to reason across text, image, and video.

1:00PM - 1:45PM

Lunch Break

1:45PM - 3:15PM

Intelligence Beyond RAG
Move past simple retrieval using RAG and context engineering for long-term personalization. Build autonomous pipelines to transform unstructured multimodal data into structured intelligence.

3:15PM - 4:45PM

Real-Time Live Interaction
Implement low-latency, interruptible agents via bidirectional streaming. Orchestrate proactive Streaming Tools and EDA for resilient, real-time control.

5:00PM - 6:00PM

Builders Demo
Rapid showcase of what was built, final resources for certification, and "Next Steps" for the community.

6:00PM - 7:00PM

Happy Hour

Venue:
Waterloo, Canada (RSVP for full address)

Contact Organizer