How Does ChatGPT Work? A Comprehensive Guide

Photo of author

ChatGPT seems like magic — and by many standards, it very much looks like it. 

However, there’s a complex series of well-defined mechanisms at work underneath the hood of this incredible technology. 

If you want to understand how ChatGPT really works, you’re in the right place. 

I’ll start by explaining all of the different mechanisms of the ChatGPT process. Then, I’ll explain how they work together to make ChatGPT possible. 

Let’s get started. 

What Is ChatGPT? 

ChatGPT  — which stands for Chat Generative Pre-trained Transformer — is an AI model capable of answering complex queries trained on a large corpus of text from the internet. 

Let’s break that down. 

  • Chat refers to the informal conversation that the model is capable of having with users
  • Generative means that the model is capable of producing or reproducing text. 
  • Pre-trained refers to the fact that the model has been trained on a large dataset to enable these generative chats. 
  • Transformer refers to the architecture of the model, which is based on a deep learning technique called a transformer. 

If that still doesn’t make any sense, don’t worry. 

The next sections will explain how ChatGPT works, which is where the real understanding will come into play. 

The Basics: Neural Networks, NLP, LLMs, and Tokens

Before you can understand how ChatGPT works, you need to have a basic idea of four fundamental concepts: 

If you’re already familiar with these concepts, you can skip ahead to the next section, where I break down how ChatGPT takes your input and generates human-like responses.

What Are Neural Networks? 

Broadly speaking, neural networks are a set of algorithms designed to recognize patterns

They are a key technology in machine learning, where machines are trained to learn from data so that they can help us solve complex problems.

They’re called neural networks because they are inspired by the structure and function of the “network” of neurons in the human brain

By mimicking the structure of the human brain, neural networks can attempt to solve complex problems that require human-like intelligence and perception.

Here are a few areas where neural networks are used:

Now, there are different types of neural networks optimized for different use cases.

We’re interested in the “transformer” neural network, as that’s what ChatGPT uses.

It can process a large chunk of data simultaneously (traditional neural networks analyze data sequentially, one piece at a time) which can help to analyze the context of words in a sentence. 

This makes transformers an excellent tool for tasks like language translation, text summarization, sentiment analysis, and much more.

ChatGPT is based on a transformer neural network, which is why it can engage in human-like conversations.

One of the main advantages of transformers is that they can capture long-range dependencies between words in a sentence. This means that they can understand how the meaning of a word depends on the words that come before and after it.

For example, consider the sentence: “I saw a man on a hill with a telescope.” Depending on how you interpret this sentence, you could think that:

  • I used a telescope to see a man on a hill.
  • I saw a man who was on a hill and had a telescope. 

A transformer neural network can use the context of the whole sentence – and the sentences around it – to figure out which interpretation is most likely. It does this by using a mechanism called “attention,” which allows it to focus on the most relevant parts of the input.

Another advantage of transformers is that they can generate natural and fluent text based on the input. This means that they can produce coherent and engaging responses that sound like human speech.

For example, if you ask ChatGPT: “What is your favorite movie?”, it might give a human-esque reply like: “My favorite movie is The Matrix. I love the concept of living in a simulated reality and fighting against the machines.”

A transformer neural network can generate such responses by using a technique called “language modeling”, which allows it to predict the next word in a sequence based on the previous words. (We’ll cover large language modeling in a little bit.)

By combining attention and language modeling, transformers are excellent tools for language translation, text summarization, sentiment analysis, and much more. 

That’s why ChatGPT is powered by a transformer neural network.

Understanding Natural Language Processing (NLP)

Natural Language Processing (NLP) is a branch of AI concerned with techniques, methods, and algorithms to help computers understand, interpret, and generate human language in a way that is meaningful and useful to humans. 

Note: While NLP and models like ChatGPT can generate highly natural and contextually relevant responses, their ‘understanding’ is based on statistical patterns in training data, not true human-like understanding or consciousness. More on this in a later section.

It’s a radical shift from traditional computer-human interactions, where computers needed predefined instructions to respond to humans. 

Using NLP principles, computers can create a geometric space — known as “embedding space” — and populate it with a vocabulary of words called “embeddings”. 

Then, based on its training data, it can understand how each word relates to each other and form semantic relationships between the words. 

When presented with an input query, an NLP-enabled AI can look at the semantic relationships to generate a contextually relevant response by itself, without needing a programmer to preprogram a response.

The benefits of NLP include:

  • Sentiment Analysis: Understanding the emotion behind the text.
  • Intent Analysis: Determining the purpose behind a piece of text.
  • Comprehensive Overview: Providing a detailed analysis of the text, including syntax and semantics.

How ChatGPT Uses NLP

So, how does this relate to ChatGPT?

Well, ChatGPT uses NLP and other advanced techniques to understand the context and nuances of human language, which allows it to generate responses that are not only contextually relevant, but also sound natural and human-like.

To achieve this, ChatGPT uses a transformer neural network that can process large chunks of text at once and capture long-range dependencies between words.

ChatGPT also uses a large corpus of conversational data as its training data, which helps it learn the common patterns and styles of human dialogues.

ChatGPT can then use its learned knowledge to generate responses that are appropriate for the given context and tone of the conversation.

For example, if you ask ChatGPT: “How are you feeling today?” it might reply: “I’m feeling great, thank you for asking. How about you?”

ChatGPT can also adapt to different domains and scenarios by using different parameters and settings. For example, ChatGPT can be fine-tuned to generate responses that are more formal or informal, more humorous or serious, more factual or creative, etc.

ChatGPT: A Large Language Model (LLM)

Large language models (LLMs) like ChatGPT are developed using neural networks following NLP principles. 

A neural network provides the blueprint and the computational capabilities necessary for the development and functioning of LLMs.

These models are designed to understand and generate human language by being trained on a “large” amount of text data. 

This involves feeding the model with vast quantities of text — 570 GB in the case of ChatGPT — and training it by adjusting its parameters to minimize the difference between the predicted and actual next word in a sentence.

The ability of LLMs to understand and generate human-like text is fundamental to various NLP tasks, such as machine translation, summarization, and question-answering.

Here are some real-world use cases of LLMs:

  • Content Creation: They can generate articles, stories, or any other type of content.
  • Question Answering: They can provide accurate and detailed answers to questions.
  • Summarization: They can provide concise summaries of long pieces of text.
  • Translation: They can translate text from one language to another.

ChatGPT is an LLM built on top of the GPT architecture – which is based on the aforementioned transformer neural network. 

How LLMs Work

So, how do LLMs like ChatGPT work? How do they learn from text data and generate new text?

The basic idea is that LLMs use a probabilistic model to estimate the likelihood of a word given the previous words in a sequence. This is called the language model.

This means that LLMs can guess what word comes next based on what words came before. 

For example, if you say “I like to eat”, an LLM can tell how likely it is that you will say “pizza”, “cake”, “salad”, or any other word. The word that is most likely to come next is the one that the LLM will choose.

To train an LLM, we need a large corpus of text data, such as books, articles, blog posts, etc. 

The LLM then processes the text data by splitting it into smaller units called tokens, which can be words, characters, or subwords. 

The LLM assigns each token a unique numerical representation called an embedding, which captures its meaning and context. The embeddings are stored in a matrix called an embedding layer, which acts as a lookup table for the LLM.

This means that LLMs can turn words into numbers that represent their meaning and how they are used. 

The LLM then feeds the embeddings into a series of layers called transformer blocks, which are composed of two sub-layers: a self-attention layer and a feed-forward layer.

The self-attention layer allows the LLM to learn how each token relates to every other token in the sequence. It does this by computing a weighted average of all the embeddings. This helps the LLM capture long-range dependencies and contextual information.

The feed-forward layer applies a non-linear transformation to each token individually, by applying a simple mathematical function. This helps the LLM learn complex patterns and features from the data.

Here’s what all of that really means: LLMs can pay attention to the important parts of the text and ignore the irrelevant parts.

To revisit our previous example, if you say “I saw a man on a hill with a telescope”, an LLM can use the other context it’s been given to figure out who had the telescope and where they were looking. 

The output of each transformer block is then passed to the next one, until it reaches the final layer called the output layer, which generates a probability distribution over all possible tokens in the vocabulary. 

The token with the highest probability is then selected as the output.

This means that LLMs can use what they learned from the previous layers to make a final guess about what word comes next. 

By repeating this process for each token in the sequence, the LLM can generate new — and often accurate — text based on the input.

ChatGPT uses this architecture to create human-like conversations that are relevant, informative, and engaging.

What Are Tokens in ChatGPT?

Tokens are the unit of text that LLMs like ChatGPT process to output data. 

A token can be as short as one character, as long as one word, or sometimes even a part of a word. 

Determining what constitutes a single token is based on the tokenizer used by the model. Different models may use different tokenizers that break down text into tokens in slightly different ways.

Learn more: To see real-life examples of how ChatGPT tokenizes text, check out the OpenAI tokenizer tool

Now, the process of breaking down text into these tokens is known as tokenization. 

This is a crucial first step,  because ChatGPT does not understand text in the way humans do. 

Instead, it processes text by analyzing these tokens to understand the context and generate a response.

The necessity of tokens arises from several factors:

  • Understanding Language: Tokenization helps ChatGPT breakdown a large piece of text and analyze its context, sentiment, intent, and other information.
  • Managing Compute Resources: Processing text in smaller chunks, or tokens, allows ChatGPT to manage compute resources more efficiently.
  • Handling Language Variations: Tokenization enables ChatGPT to handle different variations of words, slang, and other language quirks.

After tokenization, these tokens are processed as sequences — lists of tokens that the model processes as a single unit

Sequences are vital for ChatGPT to understand the context of a conversation and generate contextually relevant responses. 

The model considers the relationship between tokens in a sequence to understand the overall meaning and generate an appropriate response.

For example, the question “How many legs does an albino Siberian tiger have?” would be tokenized into a sequence of tokens:

[‘How’, ‘many’, ‘legs’, ‘does’, ‘an’, ‘al’, ‘b’, ‘ino’, ‘Siberian’, ‘tiger’, ‘have’, ?]. 

ChatGPT analyzes this sequence to understand that the question is about the number of legs a tiger has and then generates a response one token at a time. Here’s how this response generation might look under the hood: 

  1. An
  2. An al
  3. An alb
  4. An albino 
  5. An albino Siberian
  6. An albino Siberian tiger 
  7. An albino Siberian tiger has 
  8. An albino Siberian tiger has four 
  9. An albino Siberian tiger has four legs
  10. An albino Siberian tiger has four legs. 

Note that most words constitute a single token, but “albino” requires 3 tokens. 

This is because albino is a pretty uncommon word, and it’s more efficient for the GPT model to construct it out of its constituent parts rather than have an “albino” token saved and ready for use. 

You can also see this by switching to the token ID view in the tokenizer. 

These token IDs are what ChatGPT is actually dealing with. 

It transforms your text input into these numbers, runs some complex math to make sense of them, and then turns the new numbers it generates back into text in its response so you can understand it. 

So in reality, ChatGPT’s response generation really looks like this: 

  1. [2025]
  2. [2025, 435]
  3. [2025, 435, 65]

And so on until it outputs the entire response. 

To be clear, these are pre-determined token IDs, too. “How” will always be ID 2437, and “Siberian” will always be 47965. 

I find understanding this ID system — and seeing it in action — makes it much clearer that ChatGPT doesn’t understand language at all. 

It’s just amazing at calculating the probability that a certain token ID will come after another token ID based on its understanding of how all of the token IDs in existence relate to each other. 

At the end of the day, the apparent genius ChatGPT has with words and language is all based on math. 

How Does ChatGPT Work? 

Now that we have a deeper understanding of the technologies and mechanisms that make ChatGPT Let’s put everything together for a 

ChatGPT works by first tokenizing your input text and creating a sequence. 

It then processes it to understand the context and adds more tokens to the sequence that’s contextually coherent. 

It keeps adding tokens until it generates a full response. It then transforms the tokenized sequence into a human-readable answer and presents it as output.

Here’s a more detailed step-by-step breakdown of the process:

  1. Tokenization: When you provide an input to ChatGPT, the text is broken down into smaller parts called tokens using Natural Language Processing (NLP) techniques. Tokens can be as short as one character, a part of a word, or as long as a word (e.g., ‘a’, ‘ap’, or ‘apple’ might all be tokens).
  2. Processing: The tokenized input is then processed by the neural network, which predicts the next token in the sequence based on the patterns it learned from its training and the semantic relationship between all the tokens in the current sequence and all the other tokens in its embedding space. It keeps adding more tokens to generate a long, detailed, and contextual response.
  3. Finalization: The process of adding more tokens to the sequence stops when the model decides it generates a contextually sufficient response based on its training. The process can also stop if it reaches the token generation limit, which for GPT 3.5 is 4096 tokens.
  4. Decoding: The tokens in the generated response are then converted back into human-readable text using NLP techniques.
  5. Response: The final text is then returned as the response.

To better understand how this works, here’s an example:

Let’s say you ask ChatGPT: “What is the capital of France?”



ChatGPT will first tokenize the query into a sequence of tokens: [‘What’, ‘ is’, ‘ the’, ‘ capital’, ‘ of’, ‘ France’, ‘?’].

Now, the GPT architecture will process the sequence to predict the next token. In our example, the first token it predicts is ‘the’. 

The new sequence gets processed again and a new token is added till a complete response sequence is generated: [‘The’, ‘ capital’, ‘ of’, ‘ France’, ‘ is’, ‘ Paris’, ‘.’]

In the above example, the model decides the response is complete and ends the sequence right here. However, it can also keep generating and add a few more tokens for a longer sequence like this: [‘The’, ‘ capital’, ‘ of’, ‘ France’, ‘ is’, ‘ Paris’, ‘.’, ‘ Is’, ‘ there’, ‘ anything’, ‘ else’, ‘ you’, “‘d”, ‘ like’, ‘ to’, ‘ know’, ‘?’]

The decision to finalize the response depends on the model’s training and the maximum token limit.

Once the sequence is finalized, GPT will decode the sequence back to human-readable text and then output it as the final response.

The “Pre-Training” Process of GPT

  • A Look at the Training Dataset
  • The Role of Human Involvement in Training

Pre-training is like a way of teaching the GPT model how to understand and create texts before it can do specific tasks or texts. It works by using a large amount of texts from different sources and topics, and making the model learn from them by guessing the missing words or sentences.

To train the GPT model, OpenAI used 5 datasets, totaling a mere 570 GB of data. 

  • Common Crawl, which is a publicly available repository of web pages. 
  • WebText2: A collection of high-quality Reddit submissions from December 2017 onward. 
  • Books1 and Books2: A collection of books available online. The extent of these datasets is not public knowledge.
  • Wikipedia

OpenAI trained ChatGPT on this data in two primary phases. 

First, ChatGPT was trained using a variant of the unsupervised learning method called self-supervised learning. Self-supervised learning is a type of unsupervised learning in which a model learns to predict some aspect of its input, like predicting the next word in a sentence or filling in a missing word.

By doing so, the model learns to represent the underlying structure of the language and can generate coherent and meaningful responses 

After this phase, ChatGPT had learned to represent the underlying structure of language and could generate coherent and meaningful responses. 

However, it was still making a ton of mistakes, which is where the second phase came into play — fine-tuning

ChatGPT went into a fine-tuning phase, in which it was trained on smaller, labeled datasets using supervised learning techniques. 

Humans would manually review its responses and provide feedback on whether they were good or bad — a process known as human reinforcement learning — and over time this feedback turned ChatGPT into the effective model you can use today. 

Wrapping Up 

ChatGPT is amazing, but it’s far from the only amazing AI tool coming out. 

There are some incredible models coming out to compete with GPT, including Google’s Gemini, Meta’s LLaMA, and many others. 

Some of these models — including GPT — are multimodal, which means they can take as input —  and output — images and videos in addition to text. 

You only need to look at image generation tools like Midjourney or DALL-E 3 to see what what’s coming down the road with these multi-modal applications. 

And then you have the incredible technologies being built on top of these models, like Bing Chat or the many high-level AI writing tools built for specific use cases. 

Whatever you’re interested in, there’s some crazy stuff coming out that will inevitably blow your mind. 

And the best way to stay on top of the latest AI tools and tactics is to join the Future Guidebook newsletter. We’ll send you the latest advances and the best tactics to incorporate them into your daily life. 

Subscribe using the form below:

Photo of author

Author

Dibakar Ghosh
Dibakar is an AI enthusiast and a writer for FutureGuidebook.com. An avid fan of sci-fi movies and psychological thrillers, he believes that the AI revolution won’t bring about Skynet but rather helpful robots like TARS! Come on TARS! ????