So, what is a language model?
The tech world is hyped about language models so let's find out what is all the excitement about.
Hey there 👋!
You might have heard a lot lately about “language models”. They’re computer programs that can understand and generate human-like language. Pretty impressive, right? Let’s see what all this hype is about 👀.
What is a Language Model?
A language model is a statistical model that predicts the next word in a sentence. It takes as input some context and outputs an embedding - a vector representation for each term in your vocabulary. 😵💫
Simply put, language models predict what word will come next based on what they have already seen.
For example, if you say, “I am hungry”, and then “I want to eat” 🤤, the language model may predict that you are going to say, “Pizza” 🍕.
N-gram models
N-gram models are one of the first language models developed by NLP researchers.
➡️ These models look at the previous n words in your sentence and predict what word will come next.
Unigram models ( n = 1) are the simplest language models. They are simply the probability of occurrence of each word in the corpora.
Models with higher n-count try to improve over unigram models and encode “context”. Bigram models (n = 2) and Trigram ( n = 3) look at your sentence’s previous two or three words and predict the next word.
✅ Auto-correction, text-to-speech and keyboard suggestions all utilize n-gram models.
Markov Models
Markov Models are an extension of n-gram models. They try to capture language structure by representing each word with a state and thus learning the probability distribution over states.
For example, if your sentence is “I like to drink coffee” ☕️, then the first word might be a state that represents the word “I”. Then, adding the following word (like “like”) becomes a new state representing both words together. It continues until you have added every possible word combination into states in your corpus.
Continuous Space Language Models
With computing becoming cheaper and more accessible, researchers started using neural networks to model language.
They found that training the neural network on a large text dataset could create “continuous space language models” that don’t need to be constrained by discrete states like the Markov Model does. Instead, they can represent all possible words in a sentence as points in a high-dimensional space.
➡️ This means that your model will learn higher-level features about your data and not just individual words or phrases alone to model language.
Word2Vec, published in 2013, was one of the first neural language models to use continuous space representations. The paper’s authors explained that this allowed them to model words more accurately and efficiently, which was a huge step forward in NLP research.
Later, recurrent neural networks like LSTMs took over the NLP world. The advent of attention mechanisms allowed language models to focus on specific parts of sentences while learning what they mean.
➡️ This makes it possible for them to learn about complex relationships between words and the meaning of sentences without requiring human intervention.
Transformers
The current state-of-the-art was introduced in 2017 by researchers from Google and the BERT (Bidirectional Encoder Representations from Transformers). Replacing the prior neural network models, they quickly overtook the NLP world.
➡️ They also allowed researchers to train models on much larger datasets than before, making it easier to build models from scratch using pre-trained model versions.
So, what’s the hype about?
Language models have been around for a while and are routinely used everywhere - from your keyboard auto-complete to the hundreds of chatbots 🤖 you see.
⭐️ But they have started getting incredibly “large” in the past few years - thanks to the fantastic success of transformer models. BERT, RoBERTa, GPT 2/3/chatGPT, and LamBDA are all large language models, some with over several hundred billion training parameters.
⭐️ Being trained on massive training data, these models performed impressively in generating responses to various prompts. They picked up a lot of patterns leading them to answer some complicated queries.
👉 They could come to simple logical conclusions that misled many people about “intelligence” and thus the hype.
AI is not taking away our jobs anytime soon.
The models are still pattern-matching algorithms with a lossy memory of a large chunk of the internet. They often regurgitate made-up or incorrect facts; we all saw what happened with the Google Bard demo, didn’t we?
Is that all?
Large language models are just the tip of the iceberg of what is yet to come. Their most significant contribution - easing communication with humans.
All of today’s tools and algorithms won’t be obsolete by them or any future AI model. Instead, future AI research will focus on bridging them.
⭐️ A few weeks after the ChatGPT sensation, Meta (Facebook) researchers developed a new pattern for this, aptly called Tool Former. They aim to train language models to generate input for a specialized subsystem, retrieve the results and convert them back to natural language.
⭐️ Frameworks like LangChain and Auto-GPT aim to build “agents” that work towards a particular goal by using the power of these LLMs. They start with a description and iteratively devise steps to reach their target by clever prompting. Some of the tasks the community has done are amazing. 🪄
⭐️ Similarly, OpenAI has made available plugins that integrate ChatGPT with external systems over a REST API. A task like asking for available flights will query a flight information system and respond with the most up-to-date results.
⭐️ Google’s Bard is now being integrated across the entire spectrum of GSuite products — maybe finally, we will realize the dream we were sold through “Ok Google” and “Alexa” when they truly become smart assistants 🤖.
The future is exciting! ✨