AI Glossary

Model Poisoning

An attack that attempts to alter the behavior of an AI model by manipulating the model itself during training or updating.

Overview

AI systems learn patterns from data and use those patterns to make predictions, generate content, or support decisions.

If an attacker can influence how a model learns, they may be able to affect how it behaves.

This type of attack is known as model poisoning.

Model poisoning occurs when an attacker intentionally alters the training or updating process in a way that changes the behavior of the model.

A helpful way to think about model poisoning is modifying a recipe.

Even small changes to ingredients can affect the final result.

Similarly, changes introduced during model development can influence how an AI system behaves after deployment.

The goal of model poisoning may be to reduce accuracy, create hidden vulnerabilities, introduce bias, or manipulate future outputs.

Because AI systems increasingly support important business processes, protecting model integrity has become an important part of AI security.

Model poisoning can affect the reliability, accuracy, and trustworthiness of AI systems.

An attacker may attempt to influence a model update so that certain inputs consistently produce incorrect outputs.