Learn Vertex AI while building a fraud detection system

Google Cloud Tech
12 May 202213:46

TLDRIn his Google IO 2022 presentation, Ivan Nardini introduces the challenges of a rules-based fraud detection system and presents a modern data-driven alternative using Google Cloud's Vertex AI. The talk covers the importance of data quality, modularity, and MLOps practices in building a scalable and interpretable machine learning model for real-time fraud detection. Nardini shares insights from Google's experience and invites the audience to embrace collaboration and learning in their ML journey.

Takeaways

  • 🚀 The story begins with joining a data science team at a banking company, responsible for building innovative solutions across business units.
  • 📋 Existing fraud detection systems based on rules-driven engines have limitations such as bias, difficulty in scaling, and maintenance challenges.
  • 🔄 The need for a new system arises, one that is data-driven, doesn't require manual intervention, and is interpretable for validation and improvement.
  • 🌐 The new system is to be designed on Google Cloud, aiming for a machine learning model that estimates the probability of fraud in transactions.
  • 🔍 The importance of considering the entire system's requirements and dependencies when selecting and implementing machine learning models.
  • 🧠 MLOps is introduced as a culture, practice, and technology to unify ML development with operations for model deployment.
  • 🔧 Google Cloud's Vertex AI is a managed machine learning platform designed to accelerate the experimentation and deployment of ML models at scale.
  • 🏗️ Building the new fraud detection system on Vertex AI involves data ingestion, model training, prediction, and monitoring, covering the entire lifecycle of the models.
  • 📊 Feature Store is introduced to address data leakage and provide point-in-time lookups for the most up-to-date features aligned with labels.
  • 🛠️ The process of building the fraud detection system includes data transformation, feature definition, model training with ML pipelines, and model serving.
  • 🎯 The Fraud Finder project by Google Cloud demonstrates the application of data science and machine learning to detect fraudulent transactions at scale in real time.
  • 🤝 The success of machine learning and MLOps relies not only on technology but also on collaboration, learning, and a team's diverse skills and backgrounds.

Q & A

  • What is the main challenge with the current rules-driven engine used by the banking company for fraud detection?

    -The main challenge with the current rules-driven engine is that it is biased, hard to scale due to the requirement of hard-coding new rules for each discovered fraud pattern, and difficult to maintain as the team members and investigators involved often change.

  • What are the key features Maya wants in the new fraud detection system?

    -Maya wants a modern data-driven fraud detection engine that uses a machine learning model to estimate the probability of fraud for each transaction without manual intervention and is interpretable for validation and improvement by SMEs and investigators.

  • How does Ivan Nardini approach the design and building of the new system?

    -Ivan Nardini emphasizes the importance of considering the entire system's requirements and dependencies, rather than just focusing on the machine learning model. He highlights the need for good quality data, modularity, and the big picture, which includes culture and best practices encapsulated by MLOps.

  • What is Vertex AI and how does it help with the fraud detection system?

    -Vertex AI is a Google Cloud managed machine learning platform designed to accelerate experimentation and deployment of ML models at scale. It provides capabilities for the entire lifecycle of ML models, from data ingestion to model training, prediction, and monitoring, enabling production analysis of models.

  • What is the significance of a Feature Store in the context of the fraud detection system?

    -A Feature Store is crucial for addressing data leakage by providing point-in-time lookups to fetch the most up-to-date features with respect to the time labels become available. It serves features at scale with low latency, ensuring alignment of features with labels and mitigating training and serving skew.

  • How does the fraud detection system handle real-time transactions?

    -The system uses a data store optimized for low-latency lookup operations at scale to pass features as inputs to the model for online predictions. It also employs a Feature Store to ensure that the most recent features are used for both training and serving, maintaining alignment with the labels.

  • What is the role of BigQuery in the fraud detection system?

    -BigQuery is used for analyzing large volumes of data quickly, allowing the system to derive new variables relevant to predicting fraudulent transactions from historical transaction data.

  • How does the new fraud detection system ensure the quality of the machine learning model?

    -The system ensures model quality by using a Feature Store to provide consistent, up-to-date features, employing machine learning pipelines for reproducible model training at scale, and deploying models only if they meet certain performance thresholds.

  • What is Fraud Finder and how does it relate to the new fraud detection system?

    -Fraud Finder is a project developed by Ivan Nardini and his colleagues at Google Cloud, applying the state of the art of data science and machine learning to detect fraudulent transactions at scale in real time. It serves as an example of how the new system can be built using Vertex AI for MLOps.

  • What are the key takeaways from Ivan Nardini's presentation for someone looking to build a fraud detection system?

    -The key takeaways include the importance of viewing the machine learning model as part of a larger system, the necessity of good quality data and modularity, the role of MLOps in unifying development with operations, and the capabilities of Vertex AI in supporting the entire lifecycle of ML models.

Outlines

00:00

🎤 Introduction and Challenges of Fraud Detection

The video begins with Ivan Nardini welcoming the audience to Google IO 2022 and introducing himself as a Customer Engineer at Google Cloud. He sets the scene by describing a new hire's first day at a banking company's data science team, which is part of the data office and tasked with building innovative solutions. The product manager of the Fraud Detection System, Maya, discusses the limitations of the current rules-driven engine, highlighting issues such as inherent bias, scalability, and maintainability. Maya expresses the need for a new, modern, data-driven fraud detection engine that utilizes machine learning to estimate the probability of fraud without manual intervention and is interpretable for validation and improvement by subject matter experts and investigators. Ivan emphasizes the importance of considering the entire system's requirements and dependencies when selecting models and technology and introduces the concept of MLOps as a solution to unify ML development with operations for model deployment.

05:01

🚀 Building the Fraud Detection System on Vertex AI

In this paragraph, Ivan explains the process of building a fraud detection system on Vertex AI, Google Cloud's managed machine learning platform. He describes the creation of Fraud Finder, a project designed to apply state-of-the-art data science and machine learning to detect fraudulent transactions at scale in real time. Ivan outlines the importance of starting with historical transaction data, transforming it into numerical features relevant for predicting fraud, and the challenges of calculating these features in real time. He introduces the concept of a Feature Store to address data leakage and serve the most up-to-date features aligned with labels, and discusses the need for a time travel machine for features to ensure they are representative of the data when the model goes live. Ivan also touches on the use of BigQuery for data analysis and the creation of a machine learning pipeline for model training and deployment.

10:02

📊 Fraud Finder Demonstration and MLOps Importance

Ivan presents the Fraud Finder data application, showcasing how it can stream transactions and classify them as fraudulent based on a probability threshold. He explains the backend's efficiency in reading features from the Vertex AI Feature Store, processing transactions, and generating predictions. Ivan highlights the dashboard view that includes latency profile and distribution plots of fraudulent and non-fraudulent transactions, as well as information on active endpoints and models on Vertex AI. He concludes by reflecting on the transition from a biased, hard-to-scale, and maintain rules-based system to a data-driven machine learning model for fraud detection. Ivan stresses that while technology is crucial, the success of MLOps also relies on a collaborative culture and team learning, encouraging the audience to start building and deploying their models using Vertex AI on Google Cloud.

Mindmap

Keywords

💡Fraud Detection System

A fraud detection system is a set of processes and algorithms that analyze transactions to detect and prevent illegal activities, such as fraudulent credit card transactions. In the context of the video, the banking company is looking to modernize its fraud detection system by leveraging machine learning and Google Cloud's Vertex AI to improve its ability to identify potential fraud in real-time, without the need for manual rule intervention.

💡Data Science Team

A data science team is a group of professionals with expertise in analyzing and interpreting complex data sets to extract valuable insights and knowledge. In the video, the newly joined team member is part of the data science team at a banking company, responsible for building innovative solutions to tackle challenges like fraud detection.

💡Machine Learning Model

A machine learning model is a computational model that uses statistical methods to give computers the ability to 'learn' from data, improving its performance on specific tasks without being explicitly programmed. In the video, the banking company wants to implement a machine learning model to estimate the probability of each transaction being fraudulent, which would be more effective and less biased than human-designed rules.

💡MLOps

MLOps refers to the set of practices, culture, and technology that combines machine learning with DevOps to automate and streamline the deployment, monitoring, and maintenance of machine learning models in production. The concept is to bridge the gap between data science and operations teams, ensuring that models can be developed and updated efficiently while maintaining high performance and reliability.

💡Vertex AI

Vertex AI is a managed machine learning platform by Google Cloud that helps organizations accelerate the experimentation and deployment of machine learning models at scale. It provides a suite of integrated services for data ingestion, model training, prediction, and monitoring, enabling users to build end-to-end machine learning workflows.

💡Data Transformation

Data transformation is the process of converting data from one format or structure into another to make it suitable for analysis or modeling. In the context of the video, historical transaction data needs to be transformed into a set of features that can be used by the machine learning model to predict fraudulent transactions.

💡Feature Store

A feature store is a centralized repository that stores and manages features used for machine learning model training and prediction. It ensures that the most up-to-date and relevant features are available for both training and serving models, addressing data leakage and ensuring that features are aligned with their corresponding labels in time.

💡Real-Time Processing

Real-time processing refers to the ability of a system to process and analyze data as it is generated, without significant delays. In the context of fraud detection, real-time processing allows the system to analyze transactions as they occur and make immediate decisions regarding potential fraud.

💡Model Training

Model training is the process of teaching a machine learning model to make predictions or decisions based on a set of data. It involves adjusting the model's parameters through algorithms until it can accurately recognize patterns or make reliable predictions on new data.

💡Model Serving

Model serving is the process of deploying a trained machine learning model into a production environment where it can receive input data, make predictions, and return the results to the users or other systems. This is a critical step in transitioning a model from development to actual use.

Highlights

Introduction to building a fraud detection system on Google Cloud by Ivan Nardini at Google IO 2022.

Challenges with the existing rules-driven fraud detection engine, including bias, scalability, and maintenance issues.

The need for a modern, data-driven fraud detection engine that is automated and interpretable.

The importance of considering the entire system's requirements and dependencies when selecting machine learning models and technology.

The concept of MLOps as a combination of culture, practice, and technology to unify ML development with operations.

Google's expertise and technology in MLOps with thousands of ML models training concurrently and deploying globally.

Introduction to Vertex AI, Google Cloud's managed machine learning platform for accelerating ML model experimentation and deployment.

Fraud Finder, a project by Google Cloud in collaboration with customers to detect fraudulent transactions at scale in real time.

Starting the fraud detection system with historical transaction data and the necessity of data transformation.

The use of BigQuery for analyzing large datasets and the challenges of real-time feature calculation for online predictions.

The need for a Feature Store to address data leakage and provide point-in-time lookups for the most up-to-date features aligned with labels.

Training the model offline using a sample of features aligned with labels in a notebook environment.

Creating a model training pipeline and deploying the model to the serving environment once it meets performance thresholds.

The implementation of the fraud detection system on Vertex AI, including Feature Store, model training, and serving.

Fraud Finder's data application showcasing real-time transaction classification and prediction latency distribution.

The significance of teamwork and collaboration in overcoming challenges and leveraging Vertex AI for MLOps.