LLMs Night (San Francisco) with NVIDIA


Sep 17, 05:30 PM PDT
  • US-San Francisco (GitHub, 88 Colin P Kelly Jr St, San Francisco, ...) NVIDIA
  • 428 RSVP
Description
Speaker

Join us for an exciting evening dedicated to the latest advancements in large language models (LLMs) as we partner with NVIDIA for LLMs Night at GitHub San Francisco! This event will focus on TensorRT-LLM, an open-source library designed to optimize LLM inference, pushing the boundaries of performance and efficiency.

This event is for adults only (21+).

Event Highlights:
- Explore TensorRT-LLM: Learn how this powerful library offers an easy-to-use Python API that incorporates cutting-edge advancements in LLM inference, including FP8 and INT4 Automatic Weight Quantization (AWQ) with no loss in accuracy.
- Optimization Techniques: Dive deep into TensorRT-LLM's model parallelism techniques such as Tensor Parallelism, Pipeline Parallelism, and In-flight batching, designed to maximize throughput, minimize latency, and enhance parallelism in your LLM model inference.
- Hands-On Demonstrations: See how TensorRT-LLM can take your LLM model weights and create a highly optimized engine, allowing you to achieve unprecedented performance in your AI projects.

This is a must-attend event for AI developers, researchers, and enthusiasts looking to stay ahead in the rapidly evolving field of large language models. Don’t miss the chance to network with industry experts, product core team and AI leaders from NVIDIA, gain valuable insights, and elevate your LLM projects with NVIDIA’s latest innovations.

Demo stations: Multimodal X-VILA

  • HLAPI code sample
  • Llama 3.1, Medusa
  • Huggingface: Tensor RT-LLM with Hugging Face’s optimum-nvidia and TGI.
  • Main Presentations:

  • Fundamentals of accelerated computing: Harnessing the capabilities of specialized hardware and co-designed software to accelerate applications.  (NVIDIA)
  • Product feature roadmap, model coverage, and performance (NVIDIA)
  • Accelerating LLM inference at Databricks with TensorRT-LLM (By Databricks)
  • Enhancing Model Serving at Baseten with TensorRT-LLM (By Baseten)
  • Technical Deep Dive stations

  • Performance Optimization
  • Debugging
  • Customization, New Features, New Models
  • Triton architecture update preview with disaggregated serving 
  • ModelOptimization
  • Venue:
    GitHub, 88 Colin P Kelly Jr St, San Francisco, CA 94107.
    Driving directions, parking info, car pool, etc.. are in the Discord/Slack.

    Sponsors/Partners:
    This event is sponsored by NVIDIA. Join the NVIDIA Developer Program to receive exclusive access to tools and SDKs, technical training and webinars, admittance to early access programs and developer community forums, and unlimited use of NVIDIA On-Demand, and a free NVIDIA Deep Learning Institute (DLI) course. Also join NVIDIA Developer Discord.

    Community on Slack/Discord
    - Event chat: chat and connect with speakers and attendees
    - Sharing blogs, events, job openings, projects collaborations
    Join Slack (search and join the #sanfrancisco channel) | Join Discord

    Jay Rodge and 5 more

    The event ended.
    Watch Recording
    *Recordings hosted on Youtube, click the link will open the Youtube page.
    Contact Organizer