* This blog post is a summary of this video.

Auditing AI Systems: Models for Red Teaming Artificial Intelligence

Table of Contents

Playing with AI Tools to Understand How They Work

As artificial intelligence systems become more prevalent, it's important for cybersecurity professionals to understand how they operate. One way to gain this knowledge is to experiment with AI models. Sites like Hugging Face provide access to downloadable models that can be tested using services like RunPod.

This hands-on experience allows security experts to observe AI behavior firsthand. While not as robust as systems from big tech players, these models still provide valuable insights into how different algorithms function.

Downloading AI Models to Experiment With

Hugging Face hosts an extensive model hub where developers can find and download a wide variety of AI models for natural language processing, computer vision, audio processing, and more. These include models like GPT-2 for text generation and BERT for language understanding. With access to these downloadable models, security professionals can experiment with running them on their own data. This helps reveal how the models operate, their capabilities, and limitations.

Leveraging Services Like RunPod to Test AI Models

Once models are downloaded from Hugging Face, services like RunPod provide the necessary computational resources to evaluate them. RunPod offers GPU instances optimized for running models locally or in the cloud. By spinning up GPUs through RunPod, security experts can thoroughly test different AI models on their own terms. This hands-on learning is invaluable for later auditing production systems.

Challenges of Training Large-Scale AI Models

While downloading existing models is useful, training new models from scratch requires massive computational resources beyond most organizations. Recent government guidance highlights the extensive hardware needed for state-of-the-art AI.

For example, the Biden Administration outlined hardware thresholds for reporting on large AI system development. This included compute clusters with over 100 Gbps connectivity and 100,000 Nvidia DGX A100 servers.

Hardware Requirements for Massive AI Systems

Training cutting-edge AI models demands specialized hardware like high-end GPUs. For instance, Nvidia's DGX A100 server costs around $400,000 and delivers 10 petaflops of AI performance. To reach exascale capabilities mentioned in the Biden Administration guidance, over 100,000 DGX A100 servers would be needed. This multi-billion dollar infrastructure exceeds the resources of most organizations.

Biden Administration Guidance on Reporting Large AI Models

The Biden Administration outlined two thresholds for reporting on large AI system development to the government:

  • Using over 10^26 floating point operations, especially with biological sequence data
  • Compute clusters with over 10^20 floating point operations per second and 100 Gbps connectivity

Attacking AI Systems: 7 Models from Berryville Institute

When auditing AI systems, security professionals need an approach for analyzing potential vulnerabilities. The Berryville Institute developed a useful framework with 7 models for attacking AI systems.

These models encompass methods for manipulating inputs, poisoning training data, injecting model vulnerabilities, and even extracting private data or reversing entire models.

Manipulating Inputs to Trick AI Models

Input manipulation involves modifying what an AI model receives to generate incorrect outputs. For example, altering images fed into a computer vision model could cause misclassifications. This could trick an autonomous vehicle into misreading traffic signs, or facial recognition into not identifying individuals.

Poisoning Training Data to Backdoor AI Models

Poisoning the training data means intentionally corrupting it to manipulate model behavior. Attackers may add specific instances that cause the model to learn a secret backdoor function. For example, training images could be altered to have a certain pixel pattern that makes the model recognize any image with that pattern as a false positive.

Inserting Vulnerabilities Through Model Manipulation

Rather than poisoning data, attackers may inject vulnerabilities directly into the model itself during training. This produces hidden backdoors in the model's structure and internal parameters. Vetting training code, parameters, and checkpoints is necessary to detect such model manipulation attacks.

Stealing Private Data Via Input Extraction

Input extraction involves submitting inputs to an AI model to extract sensitive private data contained in the training set. The model's outputs reveal information about the original training data. For example, an image model could be queried to generate images of people's faces that it was trained on. This exposes private user data.

Extracting Full Training Datasets

Rather than extracting individual data points, attackers may attempt to reconstruct entire training datasets by querying the model. Large volumes of private data could be extracted from the model over time. Defenses like differential privacy introduce noise during training to prevent precise reconstruction of datasets.

Reversing AI Models Through Extraction

Model extraction seeks to replicate the functionality of an AI model through black box attacks that analyze inputs and outputs. Attackers reverse engineer the model to avoid costly data collection and training. Protections like encryption, watermarking, and obfuscation help prevent extraction of proprietary models.

Recommendations for Auditing AI Systems

When evaluating AI system security, assessing model design and intended use cases is as important as testing for vulnerabilities. Following frameworks like the Berryville Institute's models provides a methodology for analysis.

However, truly understanding how AI models function requires hands-on experience. Experimenting with readily available models builds intuition before attempting to audit proprietary systems.

Understand How AI Models Work Before Auditing Them

Attempting to audit an AI system without practical experience with models is challenging. Building intuition by working with open source models first allows auditors to better evaluate the system's expected behavior. This hands-on learning phase is critical preparation before assessing production systems with proprietary models and data.

Reference Frameworks Like Berryville Institute's

Methodical frameworks help guide the auditing process to cover different vectors of attack and vulnerability analysis. The Berryville Institute's models provide an excellent starting point for AI security assessments. Following established frameworks reduces the risk of overlooking classes of vulnerabilities based on lack of experience with AI systems.

Conclusion

As artificial intelligence permeates more applications, auditing AI system security is crucial. Hands-on experience with public models paired with trusted frameworks like the Berryville Institute's equips professionals to thoroughly evaluate proprietary production systems.

Collaborative work between security experts, model developers, and business stakeholders helps identify potential risks early and determine appropriate controls. With the right preparation and partnerships, organizations can feel confident deploying AI safely.

FAQ

Q: How can I experiment with AI models?
A: You can download open source AI models from places like Hugging Face and test them out using services like RunPod that provide GPU power for running models.

Q: What hardware is needed for large AI systems?
A: The Biden administration guidance calls out systems with over 100Gbps networking and capacity for over 10^20 floating point operations per second, requiring thousands of high-end NVIDIA DGX servers.

Q: What are some ways to attack AI systems?
A: The Berryville Institute outlines 7 models such as input manipulation, data poisoning, model manipulation, and different forms of extracting private data or reversing the models.

Q: How can I audit AI systems effectively?
A: It's important to thoroughly understand how AI models work before attempting to audit them. Leverage existing frameworks like those from Berryville Institute.

Q: What should I know about training AI models?
A: Training large-scale AI models requires specialized hardware like GPU clusters with extreme computing power and large datasets. This puts constraints on auditing capabilities.

Q: Can individuals realistically train cutting-edge AI models?
A: No, specialized computing infrastructure with GPUs providing at least 10^20 operations per second would be needed, costing millions of dollars - infeasible for most.

Q: What risks exist from input manipulation attacks?
A: Tricking optical AI systems through input manipulation could have dangerous consequences if used to fool systems controlling vehicles or infrastructure.

Q: How does data poisoning undermine AI model trust?
A: By manipulating training data, malicious actors could insert intentional vulnerabilities or biases that impact reliability of AI systems.

Q: Why is model extraction an emerging attack vector?
A: The computational expense of training models creates incentive to steal well-performing models. Extraction attacks attempt to replicate models to avoid these costs.

Q: What guidance covers responsible AI development?
A: The recent Biden administration executive order lays out expectations around transparency, safety, and security for organizations building AI systems.