Latency

The amount of time it takes for a system to respond to a request.

Overview

When you click a button, submit a search query, or ask an AI assistant a question, you expect a response.

The delay between the request and the response is known as latency.

Latency measures how long it takes for information to travel through a system and return a result.

A helpful way to think about latency is a conversation.

If someone answers immediately, the conversation feels smooth.

If there is a long pause after every question, communication becomes slower and less efficient.

The same principle applies to software systems and AI applications.

Low latency generally improves user experience because responses arrive quickly.

High latency can make applications feel slow and frustrating.

As organizations deploy increasingly sophisticated AI systems, reducing latency has become an important goal.

Factors such as infrastructure, networking, model size, and deployment architecture can all influence latency.

Latency directly affects the speed and responsiveness of applications and AI systems.

A chatbot that responds in one second typically feels more responsive than one that requires ten seconds to answer.