Deep Learning for the Average Person

This is part 8 of 10 in a series of articles explaining how ChatGPT and other LLMs work. You find the previous articles here: https://healthcareconsumer.ai/chatgpt-under-the-hood/.

In this article, we will get a conceptual understanding of how deep learning works. This will allow us to learn how deep learning works in ChatGPT in the next article.

We embark on this journey by exploring a simple yet illustrative example: predicting the height of high school students based on factors such as age, gender, and favorite color.

Our exploration begins with the utilization of existing student data to deduce formulas that facilitate height prediction for new students.

Then we unveil the mechanism of employing layers in deep learning to process inputs and generate subsequent outputs. By incorporating specialized layers for gender, age, and favorite color, we (conceptually) construct a deep learning model primed for height prediction.

The training phase unfolds a fascinating process wherein the model iteratively refines itself, adjusting layer functionalities to optimize height predictions. This iterative refinement ensures the model’s accuracy in forecasting student heights.

In the forthcoming segments of this series, we will leverage this foundational understanding of deep learning to dissect the training methodologies of conversational agents like ChatGPT and Large Language Models (LLMs).

Deep Learning Models in ChatGPT and other LLMS

There are multiple deep learning models in an LLM including:

A model to convert text to embedding vectors
A model to transform embedding vectors using the Transformer
A model to take an embedding vector and predict the next word to complete the sentence

Let’s get a conceptual understanding of deep learning so we can understand how these three models work.

A Conceptual Understanding of Deep Learning

clear light bulb placed on chalkboard — Photo by Pixabay on Pexels.com

First we’ll understand how deep learning works.

For our example, our goal is to predict the height of new students joining a high school so we can order t-shirts for them in the right sizes. We know the age and gender of each new student.

How can we predict the height of each kid using their age and gender?

Predicting height using age

Let’s start with age first. We know that older kids tend to be taller so we would surmise that age can be used to predict height.

If we take a list of age and height for our existing students we may find:

Age	Average Height
14	5 feet 4 inches
15	5 feet 5 inches
16	5 feet 6 inches
17	5 feet 6 inches
18	5 feet 7 inches

Average Height by Age

We can learn from this data that the following rules can be used to predict the height for any high school student:

From age 14 to 16, the height increases by 1 inch per year.
From age 16 to 17, there is no change in height.
From age 17 to 18, the height increases by 1 inch.

We can then turn these rules into formulas:

If the student is 16 years old or younger:

Predicted Height (in inches) = 63.98 + 1.18 * (Age - 14)

If the student is older than 16:

Predicted Height (in inches) = 63.98 + (1.18 * 2) + 0.49 * (Age - 16)

We have essentially created a simple AI model already!

You can consider this as a deep learning model with one layer. This one layer takes an input of age and generates an output of predicted height.

Age -> Age Layer -> Height Prediction

Predicting height using age AND gender

Of course, we know that girls and boys tend to have different heights. So let’s add another layer to handle this difference.

If we take the age, gender and height for our existing students, we will learn how age impacts the height and how gender impacts the height.

For girls:

Age	Average Height
14	5 feet 3 inches
15	5 feet 4 inches
16	5 feet 5 inches
17	5 feet 5 inches
18	5 feet 5 inches

Average Height by Age for Girls

For boys:

Age	Average Height
14	5 feet 5 inches
15	5 feet 7 inches
16	5 feet 8 inches
17	5 feet 9 inches
18	5 feet 10 inches

Average Height By Age For Boys

So again we can learn a formula that allows us to predict the height of a student by using both age and gender. This will be more accurate than predicting height based on just age since it will account for the height differences between girls and boys.

Congratulations! We’ve created a two layer deep learning model. The first layer is gender and the second layer is age.

Age, Gender -> Gender layer -> Age layer -> Height Prediction

We pass in age and gender, the first layer uses the gender to pick either the girls table or the boys table to pass into the next layer. The next layer receives either the boys table or the girls table and then finds the entry in the table using age to get the predicted height.

Do we really need prior knowledge?

woman wearing brown shirt carrying black leather bag on front of library books — Photo by Abby Chung on Pexels.com

In the example above, we specifically choose Age and Gender as the characteristics (called “features” in AI) to build layers. This is based on our heuristics knowledge that these characteristics make a big difference in height.

Could we just figure out what characteristics make a difference in the height of someone without this heuristic knowledge? Yes!

Let’s now add in a characteristic that doesn’t make a difference in someone’s height: their favorite color.

Let’s say we end up with:

Favorite Color	Average Height
Red	5 feet 6 inches
Blue	5 feet 6 inches
Green	5 feet 6 inches
Yellow	5 feet 6 inches
Purple	5 feet 6 inches

Average Height By Favorite Color

This data tells us that favorite color has no impact on the height.

Without any prior knowledge, just looking at whether different values for a characteristic result in different values for height, we can determine which characteristics are useful in the model. Then we can build layers for only useful characteristics.

Choosing Layers Automatically

selective focus photography of cairn stone — Photo by Pixabay on Pexels.com

Can we use this information to figure out which characteristics we need to create layers for and which ones we can ignore? Yes!

We can just take a list of all possible characteristics and see whether there is a material difference in height between students who have different values for those characteristics. The characteristics that have a material difference in height between values we need to create layers for and we can ignore the ones that do not.

Student of different ages have different heights. Students of different gender have different heights. Students of different favorite color do not.

So we should create layers for age and gender but not for favorite color.

Notice in this exercise, we did not use any pre-existing knowledge about whether age, gender or favorite color impact height. We just used the data to tell us that.

You can replace age, gender and favorite color with any characteristics (that we may not even understand) and this process can figure out whether that characteristic matters or not in determining height.

How to train Deep Learning Models

men playing football — Photo by football wife on Pexels.com

Now that we have an intuitive understanding of Deep Learning and layers, let’s use that to understand how a deep learning model is trained.

First we feed it a list of existing students with values for all the characteristics we care about.

This may look like:

Student	Gender	Age	Favorite Color	Height
Jim	Male	15	Red	5 feet 7 inches
Kyle	Male	16	Red	5 feet 8 inches
Jane	Female	17	Blue	5 feet 5 inches
Jill	Female	18	Blue	5 feet 5 inches
Joane	Female	14	Blue	5 feet 3 inches

The training process then creates a layer for each characteristic: Gender, Age and Favorite Color.

Age, Gender, Favorite Color -> Gender Layer -> Age Layer -> Favorite Color Layer -> Height Prediction

The input of Age, Gender and Favorite Color are known and the height of that student is also known.

What we don’t know is what the Gender layer should feed into the Age Layer, what the Age Layer should feed into the Favorite Color layer and what the Favorite Color layer should pass into the Predicted Height.

This is what training is about. How to learn what these transformations need to be.

Learning Backwards (Back Propagation)

man sitting in the middle of the road in front of herd of cattle — Photo by Vlad Chețan on Pexels.com

The way that the training process learns what each layer should feed into the next one is backwards (technically called back propagation). Starting from the known heights, the training process works backwards to calculate the transformation needed in each layer.

The training process takes the height of the student and adjusts the values passed from the Favorite Color Layer so that the height would match the actual height of the student. Then it adjusts the value of the Age layer so the value passed into the Favorite Color Layer would then result in the matching value of height. Then it adjusts the values of what the Gender Layer passes into the Age Layer. We stop here because there are no more layers left. The actual age, gender and favorite color are passed to the Age layer.

So the training process goes backwards:

Age, Gender, Favorite Color <- Gender Layer <- Age Layer <- Favorite Color Layer <- Predicted Height

The training repeats this process for each student it is provided in the training process adjusting values in each layer. The values passed into each layer are adjusted student by student until they are such that they provide the best match between actual height and predicted height.

This is why a lot of data is needed to train deep learning models. The more data that is provided in training, in general, the more accurate the deep learning model will become. (Technically once the values in each layer stop changing with new training data we can infer that the model has trained with sufficient data and more data will not help make the model more accurate).

How Layers Are Determined

young female carrying cardboard boxes while moving out of house — Photo by Ketut Subiyanto on Pexels.com

In the example above, I created a layer for each characteristic of the student: Gender, age and Favorite Color. Of course this is just one way to choose the layers.

Removing Layers

In the example above, the model will learn that there is no difference between the inputs to the Favorite Color layer and the output of the Favorite Color layer. That is the Favorite Color layer is not adding anything to the model accuracy. So the training process can just eliminate this layer leaving us with a model:

Age, Gender, Favorite Color -> Gender Layer -> Age Layer -> Favorite Color Layer -> Height Prediction

Combining Layers

The training process can also determine that instead of having a separate Age layer and a separate Gender layer, it is better to have a single layer called Gender-Age. It can then replace the Gender layer and the Age layer with a Gender-Age layer.

Age,Gender, Favorite Color -> Gender-Age Layer -> Height Prediction

The training process can continue to eliminate and combine layers until the accuracy of the model does not improve.

Accuracy of a Deep Learning Model

black and white dartboard — Photo by Engin Akyurt on Pexels.com

Above, we’ve talked a few times about the accuracy of a deep learning model. How is accuracy calculated?

Let’s go back to the table of existing students above. Let’s say after we train the model, we use the model to predict the height of existing students. Since we know the actual height of existing students this lets us compare the predicted height to the actual height to see how accurate our model predictions are.

Student	Gender	Age	Favorite Color	Actual Height	Predicted height
Jim	Male	15	Red	5 feet 7 inches	5 feet 6 inches
Kyle	Male	16	Red	5 feet 8 inches	5 feet 8 inches
Jane	Female	17	Blue	5 feet 5 inches	5 feet 6 inches
Jill	Female	18	Blue	5 feet 5 inches	5 feet 5 inches
Joane	Female	14	Blue	5 feet 3 inches	5 feet 4 inches

Comparison of Actual Height to Predicted Height

By calculating the difference between actual height and predicted height we can measure the accuracy of a Deep Learning model.

The training process is constantly trying to aim for the most accurate model. It changes the values in each layer to do so.

Summary

an artist s illustration of artificial intelligence ai this image depicts how ai could adapt to an infinite amount of uses it was created by nidia dias as part of the visualising ai pr — Photo by Google DeepMind on Pexels.com

In this article we learned a conceptual understanding of deep learning.

We started with a simple example of predicting the height of high school students when given their age, gender and favorite color.

We showed how we can use data about existing students to infer formulas that we can use to predict height for a new student.

We then showed how we can do this using layers that take an input and create an output for the next layer. By using a gender layer, an age layer and a favorite color layer, we can create a deep learning model that can predict heights.

The training process then can work backwards to figure out what each layer should do such that we can get the most accurate prediction of height for a student.

In the next part in the series, we will apply this understanding of deep learning to discuss how the ChatGPT and LLMs train.

Healthcare Consumer