Environment Setup - Vertex AI for ML Operations [notebook 00]
TLDRIn this video, Mike guides viewers through setting up their environment for a machine learning project using Vertex AI on Google Cloud. He covers creating a project, enabling necessary APIs, and setting up a Jupyter notebook instance. Mike also demonstrates how to clone a GitHub repository, create a storage bucket in Google Cloud Storage, and extract data from BigQuery into a CSV file. The video concludes with a Q&A section addressing cost and cleanup of resources.
Takeaways
- 📌 The video series is about setting up and running end-to-end machine learning workflows using Jupyter notebooks.
- 🔧 The first step is to set up the project environment, which includes creating a Google Cloud project and enabling necessary APIs.
- 📂 A new project is created in the Google Cloud Console, with a specific focus on keeping costs controlled and easy to delete after experimentation.
- 🔄 Enabling APIs like Vertex AI and Workbench is crucial for using Google Cloud's machine learning services and running notebook instances.
- 📊 The video demonstrates the creation of a Jupyter notebook instance, which is used to clone and work with the provided GitHub repository.
- 👨💻 The speaker, Mike, uses the alias 'statmik' for the project and emphasizes the ease of setting up and deleting projects for cost management.
- 🛠️ The video provides a detailed walkthrough of creating a storage bucket in Google Cloud Storage and extracting data from BigQuery into the bucket.
- 📈 The script includes instructions for installing necessary packages for the workflow, such as TensorFlow 2.3 and Google Cloud Pipeline Components.
- 🚫 The video addresses potential concerns about charges for resources used, offering solutions for cost management and cleanup of resources.
- 📋 The speaker encourages viewers to provide feedback and contribute to the GitHub repository for continuous improvement of the workflows.
- 🎥 The video concludes with a Q&A section, answering common questions about resource management and cleanup.
Q & A
What is the main purpose of this video?
-The main purpose of this video is to guide viewers through the process of setting up their project environment for a series of machine learning workflows using Jupyter notebooks.
What type of workflows are described in the video?
-The workflows described are end-to-end machine learning processes that include grabbing data, preparing data, training a model, evaluating a model, and potentially automating the entire process.
What is the first step in setting up the project?
-The first step is to create a new project in the Google Cloud environment.
How can one review the files directly?
-One can review the files directly by opening them in GitHub and reading them.
What is the role of APIs in this setup process?
-APIs play a crucial role as they need to be enabled for services like Vertex AI and Workbench, which are used for running notebooks and managing resources.
What type of notebook instance is recommended for this tutorial?
-A TensorFlow 2.3 notebook instance without GPUs is recommended as the modeling techniques used in the series are not the most sophisticated and the data is small, ensuring quick training times.
How is the data extracted from BigQuery?
-The data is extracted from BigQuery by creating a client, setting a destination as a bucket in Google Cloud Storage, and then creating an extraction job to move the data from the BigQuery table to the specified destination.
What are the main components of Vertex AI that are highlighted in the video?
-The main components highlighted are data management, model training, model evaluation, and model deployment, which are all part of the machine learning operations journey.
How can one avoid charges after completing the experiments?
-One can avoid charges by either deleting the entire project, which eliminates all resources created within it, or by individually deleting the resources such as the GC bucket and endpoints created.
What additional packages are installed for the workflow?
-Additional packages installed include CU Flow Pipelines for orchestration, Plotly for interactive graphing, and updates to the AI Platform module for interacting with Vertex.
Outlines
🚀 Project Setup and Introduction
The speaker, Mike, introduces himself as a statistician and Googler passionate about learning and sharing. He welcomes viewers to his office and explains that the video series will cover end-to-end machine learning workflows encapsulated in Jupyter notebooks. The workflows involve data grabbing, preparing, model training, evaluation, and deployment, potentially automating the entire process. Mike outlines the project's structure and encourages viewers to follow along by either reviewing the files on GitHub or creating a project on Google Cloud to run the notebooks. He guides viewers through the process of creating a project on Google Cloud, enabling necessary APIs, and setting up a notebook instance to clone the repository and begin working through the notebooks.
📚 Notebook Instance Creation and Repository Cloning
In this paragraph, Mike explains the process of creating a notebook instance for running the notebooks and emphasizes the importance of selecting the right version of TensorFlow for the series. He details the creation of a new notebook instance without GPUs and the selection of a small machine规格. Mike then demonstrates how to clone the repository into the notebook instance and prepare for the next steps. He also provides instructions on how to review the notebooks on GitHub and the benefits of using Jupyter Hub for running them.
🛠️ Setting Up the Data and Environment
Mike continues by discussing the next steps in setting up the environment, which include creating a storage bucket in Google Cloud Storage and using BigQuery to extract data into the bucket. He explains how to create a client for BigQuery, set up a destination for the data, and execute an extraction job. Mike also covers the use of CP flow pipelines for orchestration and plotly for graphing, updating the main module for interaction with Vertex AI. He ensures that viewers understand how to manage and delete resources to avoid unnecessary costs.
💡 Cost Management and Clean Up
This paragraph focuses on cost management and the importance of cleaning up resources to avoid charges. Mike reassures viewers that the setup uses a small compute instance without GPUs to minimize costs. He explains how to delete the entire project to eliminate all associated costs quickly or how to remove individual resources using a dedicated notebook. Mike encourages viewers to provide feedback and suggests that they can contribute to improving the repository by submitting issues on GitHub.
🎉 Conclusion and Next Steps
Mike concludes the setup video by thanking viewers for their attention and enduring the setup process. He encourages viewers to like, subscribe, and use the notification bell for updates on future videos. Mike emphasizes the importance of collaboration and feedback, inviting viewers to contribute ideas, corrections, or improvements to the GitHub repository. He reiterates the goal of making AI and machine learning more accessible and collaborative for a broader audience.
Mindmap
Keywords
💡Environment Setup
💡Jupyter Notebooks
💡Machine Learning Workflows
💡Google Cloud Platform (GCP)
💡Vertex AI
💡Workbench
💡TensorFlow
💡BigQuery
💡Cloud Storage
💡APIs
Highlights
The video series focuses on end-to-end machine learning workflows using Jupyter notebooks.
The project involves grabbing data, preparing it, training a model, evaluating, deploying, and possibly automating the process.
Google Cloud environment is used to recreate and run the notebooks for the tutorial.
A new Google Cloud project called 'stat mic demo 3' is created for the tutorial.
APIs are enabled for Vertex AI and Workbench, which are essential for the project.
A notebook instance is created without GPUs, using TensorFlow 2.3.
The repository is cloned into the notebook instance for practical work.
The project name and region are set within the Jupyter notebook for consistency.
Google Cloud Storage bucket is created and utilized for data storage.
Public dataset from BigQuery is extracted and saved as a CSV file in the cloud storage.
Package installations are updated for TensorFlow, Google Cloud Pipeline Components, and Plotly.
The AI Platform module is updated for interaction with Vertex.
Costs are minimized by using a small compute instance without GPUs and creating small files.
Google Cloud provides free credits for new users to experiment with their services.
Projects can be deleted to avoid future charges, and there's a notebook dedicated to cleaning up resources.
The video series aims to make AI and ML more collaborative, accurate, and approachable.
Feedback and suggestions are encouraged through the GitHub repository for continuous improvement.
The video concludes with a Q&A section addressing questions about charges and resource management.