Part 1: All ChatGPT does is complete the sentence

Explanation for Technical People

This explanation is for technical people who are not data scientists. For others, use these links:

Non-Technical | Data Scientist

Main Idea

The fundamental idea of LLMs (Large Language Models) is to complete the sentence. Given an unfinished sentence they can choose the next word to add to the sentence. This simple task, it turns out, can enable all kinds of complex language tasks as I will show in the next chapter.

MadLibs

Many of us played MadLibs when we were kids.

MadLibs provides sentences with missing words and we are able to write in words to complete the sentence.

How ChatGPT does this?

ChatGPT scans the available text on the internet. Current estimate is that there are 140 billion sentences on the public internet. If we include digitized books that adds another 110 billion sentences.

Given a phrase like “There are many ___ ways to choose…” it can find all the sentences like this and pick out the word used in the blank space.

It can then assign a score to each choice by how many times that sentence was found on the internet. (The actual logic for calculating the score is more complicated that I will cover in a later section).

Let’s say there are 100 web pages that contain the sentence “There are many good ways to choose…” and there are 20 pages that contain the sentence “There are many other ways to choose…”. So the score could be:

  1. “Good” = 100
  2. “Other” = 20

Now let’s say from scanning all the sentences on the internet we could also figure out the antonym or opposite of each word. “Bad” is an antonym for “Good”. Let’s say we also include antonyms but use 1/10th the score of the actual word. So “bad” would become a choice with 10 points (1/10th of 100)

  1. “Good” = 100
  2. “Other” = 20
  3. “Bad” = 10

Now all we have to do is choose one. Simplest option would be to always choose the one with the highest score.

How to be creative

While correct, this will get boring since all the sentences we complete will become similar.

If we want to make our output more interesting and creative, we can randomly choose from the choices. However this will mean we will fill in with “Good” just as often as we would fill in with “Bad”. This will result in the less correct answer too often.

So the alternative is that we can use a weighted random choice. So if we chose 130 times, we would choose “Good” 100 times, “Other” 20 times and “Bad” 10 times. This will give us a balance between choosing the best answer and choosing an interesting answer.

If you’d like to move on to the more advanced explanation for data scientists: Data Scientist


Popular Blog Posts