← Back to course

Lesson 29 · Video

AI Threats Overview

This lesson introduces the modern AI threat landscape and provides a foundational understanding of the most important risks facing AI systems today. Learners explore hallucinations, prompt injection, data poisoning, evasion attacks, and model extraction, while examining how these threats impact trust, security, privacy, and business operations. The lesson also introduces a defense framework built around prevention, detection, and mitigation, providing the foundation for the remaining AI security topics in Module 4.

Free preview

Learning Objectives

Learning Objectives — AI Threats Overview

By the end of this lesson, learners will be able to:

  • Identify the major categories of AI threats.
  • Explain how hallucinations impact AI reliability and trust.
  • Describe prompt injection attacks and their risks.
  • Understand how data poisoning compromises AI systems.
  • Explain evasion attacks and adversarial examples.
  • Define model extraction and intellectual property risks.
  • Recognize the business and security impact of AI threats.
  • Understand the concepts of prevention, detection, and mitigation.
  • Apply AI threat concepts to real-world and certification exam scenarios.
  • Build a foundation for more advanced AI security topics.

Key Concepts

Key Concepts — AI Threats Overview

  • AI Security
  • AI Threat Landscape
  • Hallucinations
  • Prompt Injection
  • Data Poisoning
  • Evasion Attacks
  • Adversarial Examples
  • Model Extraction
  • Model Theft
  • AI Reliability
  • AI Trust
  • Data Integrity
  • AI Safety
  • Security Controls
  • Prevention
  • Detection
  • Mitigation
  • Defense-in-Depth
  • AI Risk Management
  • AI Governance
  • Trustworthy AI
  • Responsible AI
  • Model Security
  • AI Attack Surface

Transcript

Transcript — AI Threats Overview

Welcome to Module 4 and Lesson 4.1: AI Threats Overview.

As Artificial Intelligence becomes increasingly integrated into business operations, critical infrastructure, healthcare, finance, education, and cybersecurity, understanding AI threats has become just as important as understanding AI capabilities.

AI systems introduce powerful new opportunities, but they also introduce entirely new categories of risk.

Unlike traditional software, AI systems learn from data, generate content, and make decisions based on statistical patterns. These unique characteristics create attack surfaces that did not previously exist.

In this lesson, we’ll explore the major threats facing AI systems today and examine how organizations defend against them.

We’ll focus on five key threat categories:

Hallucinations.

Prompt Injection.

Data Poisoning.

Evasion Attacks.

And Model Extraction.

We’ll also introduce a security framework built around prevention, detection, and mitigation.

Let’s begin with hallucinations.

Hallucinations occur when an AI system generates information that appears convincing but is factually incorrect.

This is one of the most widely discussed risks associated with modern Generative AI systems.

Unlike traditional software bugs, hallucinations often appear highly credible.

The model may provide an answer with confidence, detailed explanations, references, or supporting arguments even when the information is incorrect.

For example, a language model may invent research papers, fabricate legal cases, create fictional citations, or provide inaccurate medical guidance.

The danger is not simply that the answer is wrong.

The danger is that users may trust the answer because it appears authoritative.

In healthcare, legal services, financial planning, and other high-risk environments, hallucinations can create serious consequences.

Organizations often reduce hallucination risks through retrieval systems, trusted knowledge sources, human review processes, and output validation.

However, hallucinations remain an ongoing challenge for modern AI systems.

The second major threat is prompt injection.

Prompt injection occurs when an attacker crafts input designed to manipulate a model’s behavior.

The attacker attempts to override instructions, bypass safeguards, or force the model to reveal information that should remain protected.

For example, an attacker may instruct a chatbot to ignore previous rules or disclose hidden system prompts.

Prompt injection is often compared to SQL injection in traditional cybersecurity.

Both attacks exploit trust in user-provided input.

As AI systems become connected to external tools, databases, and workflows, prompt injection becomes even more dangerous.

A successful attack may result in unauthorized actions, sensitive information disclosure, or manipulation of downstream systems.

Organizations defend against prompt injection using layered controls such as input filtering, guardrails, output monitoring, access restrictions, and human oversight.

The third threat category is data poisoning.

Data poisoning targets AI systems during training rather than during deployment.

Attackers intentionally introduce malicious or misleading data into training datasets.

The goal is to influence model behavior.

For example, an attacker may insert false examples into a dataset so the model learns incorrect relationships.

In other situations, attackers may create hidden backdoors that cause specific inputs to trigger unexpected behavior.

Data poisoning is particularly dangerous because the attack occurs before deployment.

The model may appear normal during testing while carrying hidden vulnerabilities.

Organizations defend against poisoning through dataset validation, provenance tracking, integrity checks, auditing, and secure data pipelines.

Protecting training data is a critical component of AI security.

The fourth major threat is evasion attacks.

Evasion attacks occur after a model has been deployed.

Instead of modifying training data, attackers manipulate inputs presented to the model.

These manipulated inputs are often called adversarial examples.

An adversarial example appears normal to humans but causes the AI system to make mistakes.

For example, small modifications to an image may cause a computer vision model to misclassify a stop sign.

Similarly, subtle changes to audio may fool speech recognition systems.

The attacker exploits weaknesses in the model’s decision-making boundaries.

Evasion attacks are particularly concerning in safety-critical environments such as autonomous vehicles, healthcare systems, and security applications.

Organizations reduce risk through adversarial training, ensemble approaches, validation controls, and continuous testing.

The fifth major threat category is model extraction.

Model extraction occurs when attackers attempt to replicate a deployed model.

Instead of stealing source code directly, attackers interact with the model through APIs and analyze responses.

Over time, enough information may be collected to build a similar model.

This creates several risks.

First, intellectual property may be stolen.

Second, attackers gain an environment where they can study the model offline.

Third, extracted models may reveal information about training processes or expose weaknesses that can be exploited later.

Model extraction is becoming increasingly important because many AI services are delivered through public APIs.

Organizations often defend against extraction through rate limiting, throttling, monitoring, watermarking, and anomaly detection.

Although these threats appear different, they share a common characteristic.

Each targets a different stage of the AI lifecycle.

Hallucinations affect outputs.

Prompt injection targets interactions.

Data poisoning targets training data.

Evasion attacks target deployed models.

Model extraction targets intellectual property and operational exposure.

This diversity means organizations cannot rely on a single security control.

Instead, they require a layered defense strategy.

This brings us to the AI security framework introduced in this lesson.

Most AI defenses can be grouped into three categories:

Prevention.

Detection.

And Mitigation.

Prevention focuses on reducing opportunities for attacks.

Examples include securing datasets, validating inputs, implementing authentication, and enforcing governance controls.

Detection focuses on identifying threats and unusual behavior.

Monitoring systems, anomaly detection, logging, and security analytics all support detection efforts.

Mitigation focuses on limiting damage when attacks occur.

Examples include rollback procedures, fail-safe mechanisms, human review processes, incident response plans, and recovery workflows.

Together, these three categories create defense-in-depth.

No individual control is perfect.

Multiple layers of protection provide stronger security than any single measure alone.

For certification exams, remember the five major AI threats:

Hallucinations.

Prompt Injection.

Data Poisoning.

Evasion Attacks.

And Model Extraction.

Also remember the three defensive categories:

Prevention.

Detection.

And Mitigation.

Questions frequently focus on distinguishing one threat type from another or identifying appropriate defensive controls.

To summarize:

AI systems face a unique and evolving threat landscape.

Hallucinations undermine reliability.

Prompt injection manipulates behavior.

Data poisoning corrupts training data.

Evasion attacks exploit deployed models.

Model extraction threatens intellectual property and operational security.

Organizations address these risks through layered defenses that combine prevention, detection, and mitigation.

Understanding these threats provides the foundation for securing AI systems and building trustworthy AI environments.

In the next lesson, we’ll explore adversarial examples and evasion attacks in greater detail.