Ever since ChatGPT came into the public view in early 2023, it has astonished data scientists and the average person with its capabilities. It has shown the ability to do tasks that data scientists did not think an AI model could do today.
However underneath the hood, ChatGPT is technologically not very different than other AI models. It is a fairly standard deep learning model that takes a list of inputs and predicts the output. Like thousands of existing AI models.
In simple terms, ChatGPT is just completing the sentence.
Given text like “Tom Brady is the best ___”, ChatGPT just finds the most reasonable word to use to fill in the blank.
So why can ChatGPT do tasks that were never possible before? Like answering complicated questions, writing creative essays and showing empathy.
A New Way of Looking at the Same Old Problems
The technology behind ChatGPT is not revolutionary. There are three ways in which ChatGPT allowed us to look at the same old problems differently:
- Many of problems we classify as human intelligence can actually be reduced to the problem of completing a sentence and doing it iteratively.
- Adding a bit of randomness to the answers is the same as creativity.
- A sufficiently large general purpose model combined with a sufficiently large context can actually answer domain-specific questions quite well.
1- Many of problems we classify as human intelligence can actually be reduced to the problem of completing a sentence
Take the example of answering a question like “What is Tom Brady best at?”. We can reduce this problem to completing the sentence “Tom Brady is the best ___”.
Take the example of writing an essay. We can reduce this problem to iteratively finding the next best word to use in a sentence.
- Tom ___
- Tom Brady ___
- Tom Brady is the best ___
- Tom Brady is the best quarterback
2- Adding a bit of randomness to the model is the same as creativity.
Going back to the example: “Tom Brady is the best ___”.
The most likely answer would be “quarterback” but another answer could be “player from San Mateo” or “best looking person married to Gisele”.
If we always chose the most likely answer then our answers would be boring. But let’s say we added some randomness so while most of the time we choose the answer “quarterback” but sometimes we chose the less likely answers.
Then surprisingly our work becomes creative. And if we do this iteratively then our whole answer becomes creative.
This randomness is controlled by the temperature variable in ChatGPT. In the OpenAI playground (https://platform.openai.com/playground) you can change this to 0 and you will see that the answers become less creative but you will get (almost) the same answer every time you ask the question and your answers are more likely to be correct. Change it to 1 and the answers get more creative, less correct and less deterministic.
This randomness is (one) reason why ChatGPT may give you different answers to the same question every time you ask it.
Context is another reason. Context is the instructions you give ChatGPT and the questions you asked previously in the current session.
3- A sufficiently large general purpose model combined with a sufficiently large context can actually answer domain-specific questions quite well
Let’s go back to the example of asking the question “Tom Brady is the best _____”.
If you ask that question of football fans the answer is very likely to be “Tom Brady is the best quarterback”.
However if you were to ask the question of fans of fashion or tabloid readers the answer is more likely to be “Tom Brady is the best looking man who’s married to Gisele”.
The conventional thinking was that we would need to train different models for the different domains. However the big “aha” was that if the model is large enough it knows what football fans think AND what fashion fans think.
And given a large enough context of questions the person has previously asked, the model can lean towards the football fan answer or the fashion fan answer.
Summary
So really ChatGPT is not a technological advancement but an advancement in our understanding of the same problems in different ways.
We now know that many problems can be reduced to the problem of finding the next word in a sentence.
We have learned that adding a bit of randomness to the answers can result in creativity.
And finally that a large general purpose model combined with a large context (instructions and previous questions) can actually answer the same question properly in different domains.