An LLM journey speed run: Going from Hugging Face to Vertex AI

Google for Developers
16 May 202438:23

TLDRIn this talk, Google Cloud's Solutions Architect and Solution Manager discuss augmenting a Gemini-based large language model (LLM) with open source models from Hugging Face. They focus on integrating a time series forecasting model, TimesFM, into an LLM to enhance retail and e-commerce product catalog metadata with demand forecasting capabilities. The session covers the Vertex AI platform's features for building generative AI applications, the importance of model diversity, and demonstrates practical steps for deploying and integrating models to build a complete solution.

Takeaways

  • 😀 The Vertex AI platform by Google Cloud allows for the integration of open source models from Hugging Face to enhance Generative AI (Gen AI) solutions.
  • 🌟 Rajesh Thallam and Skander Hannachi from Google Cloud discussed the practical steps to integrate a time series forecasting model, TimesFM, into a Gemini-based large language model on Vertex AI.
  • 🛠️ Vertex AI offers tools for model training, deployment, and enhancement with extensions and grounding, providing full data control, enterprise security, and compliance.
  • 🔍 Google AI Studio and Vertex AI Studio were distinguished, with the former being a prototyping tool and the latter an end-to-end machine learning platform.
  • 📈 TimesFM is a new open source time series forecasting model by Google that can perform zero-shot learning for various forecasting tasks without the need for training on historical data sets.
  • 🛑 The importance of choosing the right model for specific tasks was emphasized, as there is no single model that can handle all scenarios effectively.
  • 🔧 The Vertex AI platform includes the Model Garden, which provides access to over 130 curated foundation models across different modalities and tasks.
  • 🌐 The integration of Hugging Face Hub with Vertex AI enables users to deploy thousands of foundation models with a single click, without managing infrastructure.
  • 📊 A concrete industry example was provided, demonstrating how to add demand forecasting capabilities to an e-commerce product catalog solution using Vertex AI and TimesFM.
  • 🔗 The concept of building generative AI agents was introduced, which are applications that attempt to complete complex tasks by understanding user intent and acting on it using various tools.

Q & A

  • What is the main topic discussed in the video?

    -The main topic discussed in the video is how to augment a Gemini-based generative AI solution with open source models from Hugging Face, specifically by adding a time series forecasting model to enhance retail and e-commerce product catalog metadata and content.

  • Who are the speakers in the video?

    -The speakers in the video are Rajesh Thallam and Skander Hannachi, both Solutions Architects at Google Cloud and part of the Vertex AI applied engineering team.

  • What is the role of Vertex AI in the discussed solution?

    -Vertex AI serves as a platform to build out generative AI applications. It provides tools for model training, deployment, and integration of open source models like the time series forecasting model TimesFM from Hugging Face.

  • What is the purpose of the Google's open source time series foundation model TimesFM?

    -TimesFM is designed to perform time series forecasting tasks without the need for training on historical data sets. It is capable of zero-shot learning, allowing it to generate forecasts on the fly, which is particularly useful for merchants who do not have the operational overhead for traditional forecasting tools.

  • How does the integration of Hugging Face with Vertex AI benefit users?

    -The integration allows users to deploy thousands of foundation models from Hugging Face directly on Vertex AI with a single click, without managing any infrastructure, providing a seamless experience within a secure Google Cloud environment.

  • What is the difference between Google AI Studio and Vertex AI Studio mentioned in the video?

    -Google AI Studio is a prototyping tool for developers and data scientists to interact and test Gemini models, while Vertex AI is an end-to-end machine learning platform on Google Cloud that offers tools for model training, deployment, and more.

  • What are the three components of the Vertex AI platform?

    -The three components of the Vertex AI platform are the Vertex Model Garden, which is a repository of models, the Vertex Model Builder layer for fine-tuning and augmenting models, and the Vertex Agent Builder platform for configuring agent-based and search-based use cases.

  • What is the significance of the 'function calling' feature in Gemini models?

    -Function calling is a native feature of Gemini models that allows them to return structured data responses, enabling interaction with external systems or APIs, which is crucial for building scalable and production-ready generative AI applications.

  • How does the video demonstrate the practical integration of TimesFM into a generative AI application?

    -The video demonstrates the integration by showing the deployment of the TimesFM model on Vertex AI, defining Python functions as tools to interact with BigQuery and call the TimesFM model, defining a LangChain agent, and deploying the agent on Vertex AI's Reasoning Engine as a remote service.

  • What is the final outcome of the demo presented in the video?

    -The final outcome of the demo is a working generative AI application that can query BigQuery for sales data and use the TimesFM model to generate forecasts, which are then made accessible as a remote service through Vertex AI's Reasoning Engine.

Outlines

00:00

🌐 Introduction to Vertex AI and Hugging Face Integration

The video begins with Skander Hannachi and Rajesh Thallam introducing themselves as part of Google Cloud's Vertex AI applied engineering team. They express excitement about discussing the augmentation of a Gemini-based generative AI solution with open source models from Hugging Face. The focus is on integrating a time series forecasting model into a Gemini-based large language model using Google's open source time series foundation model, TimesFM, recently launched on Hugging Face. The discussion aims to guide viewers on using the Vertex AI platform to build generative AI applications, with a specific use case of enhancing retail and e-commerce product catalogs with demand forecasting capabilities.

05:01

🛠️ Building Generative AI Applications with Vertex AI

Skander and Rajesh delve into the thought process and options available for constructing generative AI applications using the Vertex AI platform. They highlight the platform's foundational model, which is a multi-task model suitable for various use cases. They emphasize that there's no single model that fits all scenarios, advocating for a 'garden' approach where task-specific or smaller models are integrated as needed. The speakers also touch on the differences between Google AI Studio and Vertex AI Studio, with the latter offering an end-to-end machine learning platform. They outline the three components of the Vertex AI platform: the Vertex Model Garden for model repositories, the Vertex Model Builder for custom use case development, and the Vertex Agent Builder for configuring agent-based use cases.

10:03

📈 Enhancing E-commerce with Demand Forecasting

The conversation shifts to a concrete industry example, focusing on integrating an open source model with Vertex AI to add demand forecasting capabilities to an e-commerce product catalog solution. Skander discusses the challenges retailers face in managing product catalogs and how the Vertex AI team's solution automates the generation and enhancement of product content using Gemini and Imagine. The solution involves a user journey that starts with a merchant uploading a product description and image, followed by category detection, attribute filtering, and the generation of detailed product descriptions and image edits. The system design is likened to a single path state machine, with potential for expansion into agentic systems that assist with tasks like demand forecasting.

15:04

📊 The Challenge of Demand Forecasting and TimesFM

Skander addresses the complexities of demand forecasting, traditionally managed by either cumbersome ERP systems or custom ML pipelines. He introduces TimesFM, Google's newly released open source time series foundational model, which is trained on synthetic data and capable of zero-shot learning for various time series forecasting tasks. TimesFM's strength lies in its ability to generate forecasts on the fly without extensive training, making it accessible for merchants who lack the operational capacity for more traditional forecasting tools. The model's performance is benchmarked against dedicated models, showcasing its accuracy and efficiency.

20:06

🔧 Practical Integration of TimesFM with Vertex AI

Rajesh demonstrates the practical steps for integrating the TimesFM model from Hugging Face into Vertex AI. He outlines the user journey for reviewing item performance by building an agent using foundational models from Vertex AI and Hugging Face. Rajesh explains the concept of a generative AI agent, which goes beyond simple text generation to perform complex tasks using tools and orchestration. He details the four components necessary for building agents: models, tools, orchestration, and runtime. The process involves deploying the TimesFM model, defining Python functions as tools, creating a LangChain agent, and deploying the agent on Vertex AI's Reasoning Engine for scalability and security.

25:07

🚀 Deploying and Testing the Forecasting Agent

The video proceeds with a live demonstration of deploying the TimesFM model from Hugging Face onto Vertex AI's prediction endpoint. Rajesh guides viewers through data preparation, including downloading a dataset from Kaggle, uploading it to BigQuery, and transforming it for time series forecasting. He then demonstrates deploying the model using Vertex AI SDK, creating a custom container, and testing it locally before pushing it to the model registry and deploying it to an endpoint. The endpoint's response, which includes point and quantile forecasts, is used to illustrate how the model can be utilized for demand forecasting.

30:09

🔗 Binding Model with Tools and Defining the Agent

Rajesh defines functions as tools that will interact with BigQuery and the TimesFM model on Vertex AI. He creates Python functions to fetch data from BigQuery and invoke the TimesFM model's forecasting capabilities. The video shows testing these functions locally and plotting the forecasts against historical sales data. The process of defining a LangChain agent and binding the model with these tools is detailed, including setting up the agent with a prompt prefix to ensure the model uses the correct dataset. The agent is tested locally, and its ability to query BigQuery and generate forecasts is demonstrated.

35:11

🌐 Deploying the Agent and Conclusion

The final steps of deploying the agent on Vertex AI's Reasoning Engine are covered. Rajesh defines the agent, including its dependencies on Google BigQuery and Vertex AI SDK, and deploys it to Cloud Run. He tests the remote agent's functionality, showing how it can query BigQuery and generate forecasts when prompted. The video concludes with Skander and Rajesh summarizing the key takeaways: the importance of integrating various models for different tasks, the ease of deploying Hugging Face models on Vertex AI, and the potential of combining generative AI with predictive models for powerful solutions. They also highlight the flexibility of deploying scalable generative AI agents using function calling and Reasoning Engine. The presenters invite viewers to discuss further if they have additional questions.

Mindmap

Keywords

💡LLM (Large Language Model)

A Large Language Model (LLM) refers to advanced artificial intelligence models that are trained on vast amounts of text data, enabling them to understand and generate human-like text. In the context of the video, LLMs are central to the discussion as they form the basis for the generative AI applications being explored. The video specifically mentions augmenting a Gemini-based LLM with open source models, indicating the integration of different AI models to enhance capabilities.

💡Hugging Face

Hugging Face is an organization known for its contributions to the AI community, particularly through its platform that offers a range of tools and pre-trained models for natural language processing. In the video, presenters discuss the process of augmenting a Gemini-based LLM with open source models from Hugging Face, emphasizing the platform's role in providing models that can be integrated into Vertex AI solutions.

💡Vertex AI

Vertex AI is Google Cloud's platform for building and deploying machine learning models. It is highlighted in the video as a key component in the process of developing generative AI applications. The platform's capabilities are discussed in relation to enhancing retail and e-commerce product catalog metadata with demand forecasting, showcasing its practical application in business solutions.

💡Time Series Forecasting

Time series forecasting is a statistical technique used to predict future data points based on previously observed data. The video focuses on adding a time series forecasting model to a Gemini-based LLM, underscoring its importance in enhancing predictive capabilities for business applications such as demand forecasting in retail and e-commerce.

💡Gemini

Gemini, as mentioned in the video, likely refers to a specific type of large language model or a feature within Google's AI solutions. It is discussed in the context of being augmented with additional models for enhanced functionality, suggesting that Gemini is a foundational component in the architecture of the AI solutions presented.

💡Google Cloud

Google Cloud is a suite of cloud computing services offered by Google. It is mentioned as the broader platform on which Vertex AI operates, providing the infrastructure and services necessary for building and deploying AI models. The video emphasizes the integration of Google Cloud services with AI applications, highlighting the seamless connection between cloud services and AI capabilities.

💡Model Garden

The term 'Model Garden' in the video refers to a repository of different AI models available within Vertex AI. It symbolizes the variety and selection of models that users can choose from to build their AI applications, emphasizing the flexibility and customization options provided by the platform.

💡Function Calling

Function calling is a feature of the Gemini models that allows them to return structured data responses, which can then be used to interact with external systems or APIs. The video explains how function calling enables the integration of AI models with other tools and services, showcasing its importance in creating more dynamic and interactive AI applications.

💡LangChain

LangChain is mentioned as a tool that can be used in conjunction with function calling to create more agent-like interactions with generative models. It allows for the transformation of Python functions into executable tools that models can use for information retrieval or API calls, indicating its role in enhancing the capabilities of AI models within applications.

💡Reasoning Engine

The Reasoning Engine is a component of Vertex AI that allows for the deployment of AI agents at scale with the necessary security and reliability. The video describes how agents built with function calling and orchestration frameworks like LangChain can be deployed using the Reasoning Engine, emphasizing its role in productionizing AI solutions for business-critical applications.

Highlights

Introduction to the Vertex AI platform and its capabilities for building generative AI applications.

Discussion on augmenting a Gemini-based large language model with open source models from Hugging Face.

Focus on integrating a time series forecasting model into a Gemini-based model for retail and e-commerce applications.

Introduction of Google's open source time series foundation model, TimesFM, launched on Hugging Face.

Practical steps for integrating the TimesFM model into a generative AI application.

Exploration of the thought process and options for building generative AI applications with Vertex AI.

Explanation of the 'no one model to rule them all' philosophy and the importance of model diversity.

Overview of Google AI Studio and Vertex AI Studio, and their roles in model prototyping and machine learning.

Description of the Vertex AI platform's three components: Model Garden, Model Builder, and Agent Builder.

Introduction to the Model Garden and its collection of over 130 curated foundation models.

Announcement of the integration of Hugging Face Hub with Vertex AI for easy model deployment.

Industry example of integrating an open source model to enhance e-commerce product catalog data.

Challenges in managing product catalogs in e-commerce and how AI can help streamline the process.

Explanation of the system design behind product cataloging and content enhancement solutions.

Discussion on the importance of demand forecasting in retail and the challenges of traditional forecasting tools.

Introduction of TimesFM's capabilities for zero-shot learning in time series forecasting.

Demonstration of how to integrate the TimesFM model with Vertex AI to build a generative AI application.

Definition of a generative AI agent and its components for complex task completion.

Explanation of the four key steps to build a generative AI agent using Vertex AI's function calling and Reasoning Engine.

Live demo of deploying the TimesFM model from Hugging Face to a Vertex AI prediction endpoint.

Live demo of defining Python functions as tools for data retrieval from BigQuery and forecasting with TimesFM.

Live demo of defining a LangChain agent, binding the model with tools, and deploying the agent on Reasoning Engine.

Conclusion and key takeaways from the talk, emphasizing the integration of various models and tools for building generative AI applications.