An LLM journey speed run: Going from Hugging Face to Vertex AI
TLDRIn this talk, Google Cloud's Solutions Architect and Solution Manager discuss augmenting a Gemini-based large language model (LLM) with open source models from Hugging Face. They focus on integrating a time series forecasting model, TimesFM, into an LLM to enhance retail and e-commerce product catalog metadata with demand forecasting capabilities. The session covers the Vertex AI platform's features for building generative AI applications, the importance of model diversity, and demonstrates practical steps for deploying and integrating models to build a complete solution.
Takeaways
- 😀 The Vertex AI platform by Google Cloud allows for the integration of open source models from Hugging Face to enhance Generative AI (Gen AI) solutions.
- 🌟 Rajesh Thallam and Skander Hannachi from Google Cloud discussed the practical steps to integrate a time series forecasting model, TimesFM, into a Gemini-based large language model on Vertex AI.
- 🛠️ Vertex AI offers tools for model training, deployment, and enhancement with extensions and grounding, providing full data control, enterprise security, and compliance.
- 🔍 Google AI Studio and Vertex AI Studio were distinguished, with the former being a prototyping tool and the latter an end-to-end machine learning platform.
- 📈 TimesFM is a new open source time series forecasting model by Google that can perform zero-shot learning for various forecasting tasks without the need for training on historical data sets.
- 🛑 The importance of choosing the right model for specific tasks was emphasized, as there is no single model that can handle all scenarios effectively.
- 🔧 The Vertex AI platform includes the Model Garden, which provides access to over 130 curated foundation models across different modalities and tasks.
- 🌐 The integration of Hugging Face Hub with Vertex AI enables users to deploy thousands of foundation models with a single click, without managing infrastructure.
- 📊 A concrete industry example was provided, demonstrating how to add demand forecasting capabilities to an e-commerce product catalog solution using Vertex AI and TimesFM.
- 🔗 The concept of building generative AI agents was introduced, which are applications that attempt to complete complex tasks by understanding user intent and acting on it using various tools.
Q & A
What is the main topic discussed in the video?
-The main topic discussed in the video is how to augment a Gemini-based generative AI solution with open source models from Hugging Face, specifically by adding a time series forecasting model to enhance retail and e-commerce product catalog metadata and content.
Who are the speakers in the video?
-The speakers in the video are Rajesh Thallam and Skander Hannachi, both Solutions Architects at Google Cloud and part of the Vertex AI applied engineering team.
What is the role of Vertex AI in the discussed solution?
-Vertex AI serves as a platform to build out generative AI applications. It provides tools for model training, deployment, and integration of open source models like the time series forecasting model TimesFM from Hugging Face.
What is the purpose of the Google's open source time series foundation model TimesFM?
-TimesFM is designed to perform time series forecasting tasks without the need for training on historical data sets. It is capable of zero-shot learning, allowing it to generate forecasts on the fly, which is particularly useful for merchants who do not have the operational overhead for traditional forecasting tools.
How does the integration of Hugging Face with Vertex AI benefit users?
-The integration allows users to deploy thousands of foundation models from Hugging Face directly on Vertex AI with a single click, without managing any infrastructure, providing a seamless experience within a secure Google Cloud environment.
What is the difference between Google AI Studio and Vertex AI Studio mentioned in the video?
-Google AI Studio is a prototyping tool for developers and data scientists to interact and test Gemini models, while Vertex AI is an end-to-end machine learning platform on Google Cloud that offers tools for model training, deployment, and more.
What are the three components of the Vertex AI platform?
-The three components of the Vertex AI platform are the Vertex Model Garden, which is a repository of models, the Vertex Model Builder layer for fine-tuning and augmenting models, and the Vertex Agent Builder platform for configuring agent-based and search-based use cases.
What is the significance of the 'function calling' feature in Gemini models?
-Function calling is a native feature of Gemini models that allows them to return structured data responses, enabling interaction with external systems or APIs, which is crucial for building scalable and production-ready generative AI applications.
How does the video demonstrate the practical integration of TimesFM into a generative AI application?
-The video demonstrates the integration by showing the deployment of the TimesFM model on Vertex AI, defining Python functions as tools to interact with BigQuery and call the TimesFM model, defining a LangChain agent, and deploying the agent on Vertex AI's Reasoning Engine as a remote service.
What is the final outcome of the demo presented in the video?
-The final outcome of the demo is a working generative AI application that can query BigQuery for sales data and use the TimesFM model to generate forecasts, which are then made accessible as a remote service through Vertex AI's Reasoning Engine.
Outlines
🌐 Introduction to Vertex AI and Hugging Face Integration
The video begins with Skander Hannachi and Rajesh Thallam introducing themselves as part of Google Cloud's Vertex AI applied engineering team. They express excitement about discussing the augmentation of a Gemini-based generative AI solution with open source models from Hugging Face. The focus is on integrating a time series forecasting model into a Gemini-based large language model using Google's open source time series foundation model, TimesFM, recently launched on Hugging Face. The discussion aims to guide viewers on using the Vertex AI platform to build generative AI applications, with a specific use case of enhancing retail and e-commerce product catalogs with demand forecasting capabilities.
🛠️ Building Generative AI Applications with Vertex AI
Skander and Rajesh delve into the thought process and options available for constructing generative AI applications using the Vertex AI platform. They highlight the platform's foundational model, which is a multi-task model suitable for various use cases. They emphasize that there's no single model that fits all scenarios, advocating for a 'garden' approach where task-specific or smaller models are integrated as needed. The speakers also touch on the differences between Google AI Studio and Vertex AI Studio, with the latter offering an end-to-end machine learning platform. They outline the three components of the Vertex AI platform: the Vertex Model Garden for model repositories, the Vertex Model Builder for custom use case development, and the Vertex Agent Builder for configuring agent-based use cases.
📈 Enhancing E-commerce with Demand Forecasting
The conversation shifts to a concrete industry example, focusing on integrating an open source model with Vertex AI to add demand forecasting capabilities to an e-commerce product catalog solution. Skander discusses the challenges retailers face in managing product catalogs and how the Vertex AI team's solution automates the generation and enhancement of product content using Gemini and Imagine. The solution involves a user journey that starts with a merchant uploading a product description and image, followed by category detection, attribute filtering, and the generation of detailed product descriptions and image edits. The system design is likened to a single path state machine, with potential for expansion into agentic systems that assist with tasks like demand forecasting.
📊 The Challenge of Demand Forecasting and TimesFM
Skander addresses the complexities of demand forecasting, traditionally managed by either cumbersome ERP systems or custom ML pipelines. He introduces TimesFM, Google's newly released open source time series foundational model, which is trained on synthetic data and capable of zero-shot learning for various time series forecasting tasks. TimesFM's strength lies in its ability to generate forecasts on the fly without extensive training, making it accessible for merchants who lack the operational capacity for more traditional forecasting tools. The model's performance is benchmarked against dedicated models, showcasing its accuracy and efficiency.
🔧 Practical Integration of TimesFM with Vertex AI
Rajesh demonstrates the practical steps for integrating the TimesFM model from Hugging Face into Vertex AI. He outlines the user journey for reviewing item performance by building an agent using foundational models from Vertex AI and Hugging Face. Rajesh explains the concept of a generative AI agent, which goes beyond simple text generation to perform complex tasks using tools and orchestration. He details the four components necessary for building agents: models, tools, orchestration, and runtime. The process involves deploying the TimesFM model, defining Python functions as tools, creating a LangChain agent, and deploying the agent on Vertex AI's Reasoning Engine for scalability and security.
🚀 Deploying and Testing the Forecasting Agent
The video proceeds with a live demonstration of deploying the TimesFM model from Hugging Face onto Vertex AI's prediction endpoint. Rajesh guides viewers through data preparation, including downloading a dataset from Kaggle, uploading it to BigQuery, and transforming it for time series forecasting. He then demonstrates deploying the model using Vertex AI SDK, creating a custom container, and testing it locally before pushing it to the model registry and deploying it to an endpoint. The endpoint's response, which includes point and quantile forecasts, is used to illustrate how the model can be utilized for demand forecasting.
🔗 Binding Model with Tools and Defining the Agent
Rajesh defines functions as tools that will interact with BigQuery and the TimesFM model on Vertex AI. He creates Python functions to fetch data from BigQuery and invoke the TimesFM model's forecasting capabilities. The video shows testing these functions locally and plotting the forecasts against historical sales data. The process of defining a LangChain agent and binding the model with these tools is detailed, including setting up the agent with a prompt prefix to ensure the model uses the correct dataset. The agent is tested locally, and its ability to query BigQuery and generate forecasts is demonstrated.
🌐 Deploying the Agent and Conclusion
The final steps of deploying the agent on Vertex AI's Reasoning Engine are covered. Rajesh defines the agent, including its dependencies on Google BigQuery and Vertex AI SDK, and deploys it to Cloud Run. He tests the remote agent's functionality, showing how it can query BigQuery and generate forecasts when prompted. The video concludes with Skander and Rajesh summarizing the key takeaways: the importance of integrating various models for different tasks, the ease of deploying Hugging Face models on Vertex AI, and the potential of combining generative AI with predictive models for powerful solutions. They also highlight the flexibility of deploying scalable generative AI agents using function calling and Reasoning Engine. The presenters invite viewers to discuss further if they have additional questions.
Mindmap
Keywords
💡LLM (Large Language Model)
💡Hugging Face
💡Vertex AI
💡Time Series Forecasting
💡Gemini
💡Google Cloud
💡Model Garden
💡Function Calling
💡LangChain
💡Reasoning Engine
Highlights
Introduction to the Vertex AI platform and its capabilities for building generative AI applications.
Discussion on augmenting a Gemini-based large language model with open source models from Hugging Face.
Focus on integrating a time series forecasting model into a Gemini-based model for retail and e-commerce applications.
Introduction of Google's open source time series foundation model, TimesFM, launched on Hugging Face.
Practical steps for integrating the TimesFM model into a generative AI application.
Exploration of the thought process and options for building generative AI applications with Vertex AI.
Explanation of the 'no one model to rule them all' philosophy and the importance of model diversity.
Overview of Google AI Studio and Vertex AI Studio, and their roles in model prototyping and machine learning.
Description of the Vertex AI platform's three components: Model Garden, Model Builder, and Agent Builder.
Introduction to the Model Garden and its collection of over 130 curated foundation models.
Announcement of the integration of Hugging Face Hub with Vertex AI for easy model deployment.
Industry example of integrating an open source model to enhance e-commerce product catalog data.
Challenges in managing product catalogs in e-commerce and how AI can help streamline the process.
Explanation of the system design behind product cataloging and content enhancement solutions.
Discussion on the importance of demand forecasting in retail and the challenges of traditional forecasting tools.
Introduction of TimesFM's capabilities for zero-shot learning in time series forecasting.
Demonstration of how to integrate the TimesFM model with Vertex AI to build a generative AI application.
Definition of a generative AI agent and its components for complex task completion.
Explanation of the four key steps to build a generative AI agent using Vertex AI's function calling and Reasoning Engine.
Live demo of deploying the TimesFM model from Hugging Face to a Vertex AI prediction endpoint.
Live demo of defining Python functions as tools for data retrieval from BigQuery and forecasting with TimesFM.
Live demo of defining a LangChain agent, binding the model with tools, and deploying the agent on Reasoning Engine.
Conclusion and key takeaways from the talk, emphasizing the integration of various models and tools for building generative AI applications.