Explanation for Data Scientists
This explanation is for data scientists. For others, use these links:
Main Idea
The fundamental idea of LLMs (Large Language Models) is to complete the sentence. Given an unfinished sentence they can choose the next word to add to the sentence. This simple task, it turns out, can enable all kinds of complex language tasks as I will show in the next chapter.
MadLibs
Many of us played MadLibs when we were kids.
MadLibs provides sentences with missing words and we are able to write in words to complete the sentence.
How do humans do it?
Let’s think about how we would “solve” the MadLibs.
In our brain we have a memory of the sentences we’ve heard and read in our life. We use that memory to find the best match.
For example, take the first phrase “There are many ___ ways to choose…”. We have likely heard or read the phrase “There are many good ways to choose…” often in our life. So naturally “good” comes to mind to fill in that sentence.
We have probably heard the phrase “There are many other ways to choose…” too but not as commonly as the one above. So “other” may be a second choice in our mind.
We also know what the opposite of “good” is: “bad”. So for some of us, the word “bad” may also come to mind.
So here’s the choice in our head for the blank in “There are many ___ ways to choose…”:
- Good (Most commonly seen or heard by us in the past)
- Other (Heard or read less commonly)
- Bad (Opposite of “good”, the word most commonly heard/read by us)
Now we can just choose one of these.
How to be creative
If we chose the first one, it is most likely to be correct. However it is what everyone else would also choose. So if our goal was to be a bit different we would choose “other” over “good”.
Of course, both “good” and “other” would be choices that others would expect us to make. So if we wanted to be funny we would choose “bad” which is a correct choice but an unexpected one. Readers would probably think we are more creative and fun if we did that.
We could then repeat this process for all the other blanks in the MadLib and have a finished paragraph.
How does ChatGPT do it?
The process is not much different in the LLM (Large Language Model) underneath ChatGPT.
We can provide ChatGPT access to all the text available on the internet. Current estimate is that there are 140 billion sentences on the public internet. If we include digitized books that adds another 110 billion sentences.
With that memory, ChatGPT can now find all the possible words to fill in the blank in “There are many ___ ways to choose…”. It can prioritize the choices by how commonly that choice (or its opposite) is found in the 250 billion sentences.
For most of the time, ChatGPT would choose the most common choice but once in a while it will choose one of other choices. This would make reading ChatGPT output more interesting.
ChatGPT could repeat this process for all the other blanks in the paragraph.
Summary
Just like humans use memory of sentences heard or read in the past to complete a sentence in MadLibs, ChatGPT follows the same process.
Just the simple task of completing a sentence enables it to complete more complex tasks as we will explore in the next part.
If you want a more advanced explanation you can click one of these: