Curated by THEOUTPOST
On Wed, 11 Dec, 12:04 AM UTC
2 Sources
[1]
Large language models: how the AI behind the likes of ChatGPT actually works
University of Sheffield provides funding as a founding partner of The Conversation UK. The arrival of AI systems called large language models (LLMs), like OpenAI's ChatGPT chatbot, has been heralded as the start of a new technological era. And they may indeed have significant impacts on how we live and work in future. But they haven't appeared from nowhere and have a much longer history than most people realise. In fact, most of us have already been using the approaches they are based on for years in our existing technology. LLMs are a particular type of language model, which is a mathematical representation of language based on probabilities. If you've ever used predictive text) on a mobile phone or asked a smart speaker a question, then you have almost certainly already used a language model. But what do they actually do and what does it take to make one? Language models are designed to estimate how likely it would be to see a particular sequence of words. This is where probabilities come in. For example, a good language model for English would assign a high probability to a well formed sentence like "the old black cat slept soundly" and a low probability to a random sequence of words such as "library a or the quantum some". Most language models can also reverse this process to generate plausible-looking text. The predictive text in your smartphone uses language models to anticipate how you might want to complete text as you are typing. The earliest method for creating language models was described in 1951 by Claude Shannon, a researcher working for IBM. His approach was based on sequences of words known as n-grams - say, "old black" or "cat slept soundly". The probability of n-grams occurring within text was estimated by looking for examples in existing documents. These mathematical probabilities were then combined to calculate the overall probability of longer sequences of words, such as complete sentences. Estimating probabilities for n-grams becomes much more difficult as the n-gram gets longer, so it is much harder to estimate accurate probabilities for 4-grams (sequences of four words) than for bi-grams (sequences of two words). Consequently, early language models of this type were often based on short n-grams. However, this meant that they often struggled to represent the connection between words that occurred far apart. This could result in the start and end of a sentence not matching up when the language model was used to generate a sentence. To avoid this problem, researchers created language models based on neural networks - AI systems that are modelled on the way the human brain works. These language models are able to represent connections between words that may not be close together. Neural networks rely on large numbers of numerical values (known as parameters) to help understand these connections between words. These parameters must be set correctly in order for the model to work well. The neural network learns the appropriate values for these parameters by looking at large numbers of example documents, in a similar way that n-gram probabilities are learned by n-gram language models. During this "training" process, the neural network looks through the training documents and learns to predict the next word based on the ones that have come before. These models work well but have some disadvantages. Although in theory, the neural network is able to represent connections between words that occur far apart, in practice, more importance is placed on those that are closer. More importantly, words in the training documents have to be processed in sequence to learn appropriate values for the network's parameters. This limits how quickly the network can be trained. The dawn of transformers A new type of neural network, called a transformer, was introduced in 2017 and avoided these problems by processing all of the words in the input at the same time. This allowed them to be trained in parallel, meaning that the calculations required can be spread across multiple computers to be carried out at the same time. A side effect of this change is that it allowed transformers to be trained on vastly more documents than was possible for previous approaches, producing larger language models. Transformers also learn from examples of text but can be trained to solve a wider range of problems than only predicting the next word. One is a kind of "fill in the blanks" problem where some words in the training text have been removed. The goal here is to guess which words are missing. Another problem is where the transformer is given a pair of sentences and asked to decide whether the second should follow the first. Training on problems like these has made transformers more flexible and powerful than previous language models. The use of transformers has allowed the development of modern large language models. They are in part referred to as large because they are trained using vastly more text examples than previous models. Some of these AI models are trained on over a trillion words. It would take an adult reading at average speed more than 7,600 years to read that much. These models are also based on very large neural networks, some with more than 100 billion parameters. In the last few years, an extra component has been added to large language models that allows users to interact with them using prompts. These prompts can be questions or instructions. This has enabled the development of generative AI systems such as ChatGPT, Google's Gemini and Meta's Llama. Models learn to respond to the prompts using a process called reinforcement learning, which is similar to the way computers are taught to play games like chess. Humans provide the language model with prompts, and the humans' feedback on the replies produced by the AI model is used by the model's learning algorithm to guide further output. Generating all these questions and rating the replies requires a lot of human input, which can be expensive to obtain. One way of reducing this cost is to create examples using a language model in order to simulate human-AI interaction. This AI-generated feedback is then used to train the system. Creating a large language model is still an expensive undertaking, though. The cost of training some recent models has been estimated to run into hundreds of millions of dollars. There is also an environmental cost, with the carbon dioxide emissions associated with creating LLMs estimated to be equivalent to multiple transatlantic flights. These are things that we will need to find solutions to amid an AI revolution that, for now, shows no sign of slowing down.
[2]
Large language models: How the AI behind the likes of ChatGPT actually works
The arrival of AI systems called large language models (LLMs), like OpenAI's ChatGPT chatbot, has been heralded as the start of a new technological era. And they may indeed have significant impacts on how we live and work in the future. But they haven't appeared from nowhere and have a much longer history than most people realize. In fact, most of us have already been using the approaches they are based on for years in our existing technology. LLMs are a particular type of language model, which is a mathematical representation of language based on probabilities. If you've ever used predictive text) on a mobile phone or asked a smart speaker a question, then you have almost certainly already used a language model. But what do they actually do and what does it take to make one? Language models are designed to estimate how likely it would be to see a particular sequence of words. This is where probabilities come in. For example, a good language model for English would assign a high probability to a well formed sentence like "the old black cat slept soundly" and a low probability to a random sequence of words such as "library a or the quantum some." Most language models can also reverse this process to generate plausible-looking text. The predictive text in your smartphone uses language models to anticipate how you might want to complete text as you are typing. The earliest method for creating language models was described in 1951 by Claude Shannon, a researcher working for IBM. His approach was based on sequences of words known as n-grams -- say, "old black" or "cat slept soundly." The probability of n-grams occurring within text was estimated by looking for examples in existing documents. These mathematical probabilities were then combined to calculate the overall probability of longer sequences of words, such as complete sentences. Estimating probabilities for n-grams becomes much more difficult as the n-gram gets longer, so it is much harder to estimate accurate probabilities for 4-grams (sequences of four words) than for bi-grams (sequences of two words). Consequently, early language models of this type were often based on short n-grams. However, this meant that they often struggled to represent the connection between words that occurred far apart. This could result in the start and end of a sentence not matching up when the language model was used to generate a sentence. To avoid this problem, researchers created language models based on neural networks -- AI systems that are modeled on the way the human brain works. These language models are able to represent connections between words that may not be close together. Neural networks rely on large numbers of numerical values (known as parameters) to help understand these connections between words. These parameters must be set correctly in order for the model to work well. The neural network learns the appropriate values for these parameters by looking at large numbers of example documents, in a similar way that n-gram probabilities are learned by n-gram language models. During this "training" process, the neural network looks through the training documents and learns to predict the next word based on the ones that have come before. These models work well but have some disadvantages. Although in theory, the neural network is able to represent connections between words that occur far apart, in practice, more importance is placed on those that are closer. More importantly, words in the training documents have to be processed in sequence to learn appropriate values for the network's parameters. This limits how quickly the network can be trained. The dawn of transformers A new type of neural network, called a transformer, was introduced in 2017 and avoided these problems by processing all of the words in the input at the same time. This allowed them to be trained in parallel, meaning that the calculations required can be spread across multiple computers to be carried out at the same time. A side effect of this change is that it allowed transformers to be trained on vastly more documents than was possible for previous approaches, producing larger language models. Transformers also learn from examples of text but can be trained to solve a wider range of problems than only predicting the next word. One is a kind of "fill in the blanks" problem where some words in the training text have been removed. The goal here is to guess which words are missing. Another problem is where the transformer is given a pair of sentences and asked to decide whether the second should follow the first. Training on problems like these has made transformers more flexible and powerful than previous language models. The use of transformers has allowed the development of modern large language models. They are in part referred to as large because they are trained using vastly more text examples than previous models. Some of these AI models are trained on over a trillion words. It would take an adult reading at average speed more than 7,600 years to read that much. These models are also based on very large neural networks, some with more than 100 billion parameters. In the last few years, an extra component has been added to large language models that allows users to interact with them using prompts. These prompts can be questions or instructions. This has enabled the development of generative AI systems such as ChatGPT, Google's Gemini and Meta's Llama. Models learn to respond to the prompts using a process called reinforcement learning, which is similar to the way computers are taught to play games like chess. Humans provide the language model with prompts, and the humans' feedback on the replies produced by the AI model is used by the model's learning algorithm to guide further output. Generating all these questions and rating the replies requires a lot of human input, which can be expensive to obtain. One way of reducing this cost is to create examples using a language model in order to simulate human-AI interaction. This AI-generated feedback is then used to train the system. Creating a large language model is still an expensive undertaking, though. The cost of training some recent models has been estimated to run into hundreds of millions of dollars. There is also an environmental cost, with the carbon dioxide emissions associated with creating LLMs estimated to be equivalent to multiple transatlantic flights. These are things that we will need to find solutions to amid an AI revolution that, for now, shows no sign of slowing down.
Share
Share
Copy Link
An in-depth look at the history, development, and functioning of large language models, explaining their progression from early n-gram models to modern transformer-based AI systems like ChatGPT.
Large Language Models (LLMs) like ChatGPT, which have recently gained significant attention, have a rich history dating back to the mid-20th century. The concept of language models, mathematical representations of language based on probabilities, was first introduced by Claude Shannon, an IBM researcher, in 1951 12. Shannon's approach utilized n-grams, sequences of words, to estimate the probability of word occurrences within text.
Early language models faced limitations in representing connections between distant words in a sentence. To address this, researchers developed models based on neural networks, AI systems inspired by the human brain's functionality 1. These neural network-based language models could better represent word connections, relying on numerous numerical parameters to understand these relationships.
A significant breakthrough came in 2017 with the introduction of transformers, a new type of neural network 12. Transformers revolutionized language modeling by processing all input words simultaneously, allowing for parallel training across multiple computers. This innovation enabled the creation of much larger language models trained on vastly more data than ever before.
Modern LLMs, built on transformer architecture, can be trained on an unprecedented scale. Some models are trained on over a trillion words, equivalent to more than 7,600 years of reading for an average adult 12. These models often contain over 100 billion parameters, allowing them to perform a wide range of language tasks beyond simple word prediction.
LLMs learn through a process similar to how humans learn language, by analyzing vast amounts of text data. They can be trained on various tasks, including:
Recent developments have added interactive capabilities to LLMs, allowing users to engage with them through prompts. This feature has led to the creation of generative AI systems like ChatGPT, Google's Gemini, and Meta's Llama 12.
The latest LLMs incorporate reinforcement learning techniques, similar to those used in teaching computers to play chess. This process involves human feedback on the AI's responses, which helps guide and improve the model's future outputs 2. This iterative learning process contributes to the continuous improvement and adaptability of these AI systems.
While LLMs represent a significant leap in AI technology, it's important to note that many of us have been unknowingly using their underlying principles in everyday technology. Features like predictive text on smartphones and smart speaker interactions are based on similar language modeling concepts 12.
As LLMs continue to evolve, they are expected to have far-reaching impacts on how we live and work. Their ability to understand and generate human-like text opens up possibilities for applications in various fields, from content creation to complex problem-solving tasks.
Reference
[1]
An analysis of AI's future through the lens of Google Translate's successes and shortcomings, highlighting the challenges faced by Large Language Models and their implications for various industries.
2 Sources
2 Sources
OpenAI introduces the O1 series for ChatGPT, offering free access with limitations. CEO Sam Altman hints at potential AI breakthroughs, including disease cures and self-improving AI capabilities.
5 Sources
5 Sources
Recent reports suggest that the rapid advancements in AI, particularly in large language models, may be hitting a plateau. Industry insiders and experts are noting diminishing returns despite massive investments in computing power and data.
14 Sources
14 Sources
Researchers warn that the proliferation of AI-generated web content could lead to a decline in the accuracy and reliability of large language models (LLMs). This phenomenon, dubbed "model collapse," poses significant challenges for the future of AI development and its applications.
8 Sources
8 Sources
Recent studies reveal that as AI language models grow in size and sophistication, they become more likely to provide incorrect information confidently, raising concerns about reliability and the need for improved training methods.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved