Firstly we need to define what we mean by normalizing otherwise things will quickly become muddled. In this tutorial, although the title uses normalization we will actually be discussing a form of standardization. The two are often colloquially confused as Normalization typically refers to working with two data sets and setting these on the same scale. The type of standardization we will discuss is also known as z-score normalization and is used to rescale the features of the data set such that they are aligned with a Normal Distribution.
The resultant confusion is therefore because people often believe that Normalization is related to creating a Normal Distribution. The unique feature of a Normal Distribution is that it is centered around the mean (numerical average), such that:
μ=0 and σ=1
Where μ is the mean and σ is the standard deviation of the sample. The standard scores also known as the z-scores of the sample are how each data set relates to these entities in accordance with our set standard. For example, if z=0 then the data point is exactly equal to the mean, while if z=2 then the data point is two standard deviations removed from the mean on the positive side.
The equation for calculation the z-score is as follows:
z= (x−μ) / σ
Where x is the value of the data point, μ is the numerical average and σ is the standard deviation of the set.
In this tutorial, we’ll show you how to create a Z-score normalization of a data set in excel step by step.
Table of Contents
Step 1: Open Excel and Locate Your Data
As you can see above we have some data relating to height, to fully set the example this is the height of eight-year-old girls in a class. We’ve organized our data in ascending order, but you don’t need to do this if working with your own set. The range of our data is A2:A16.
Step 2: Calculate the Mean
When we calculate the mean in excel, we use the built-in AVERAGE function, the function has one input – the range of the dataset you want to calculate the mean for. In our example above the range is A2:A16, and is basically the cell containing the beginning of the data set to the cell where the dataset ends.
We like being neat though, so we’ll show you how we did it below:
Firstly we choose a random row and column (we picked C2) and type the work Mean in it (blue box), this is to help us remember the calculation we’ll be doing in the cell next to it. You might think you’ll remember but honestly being organized will save you grief with spreadsheets.
Then we select the cell D2, here we want to do the actual calculation so we type the following in the equation box:
You need to substitute in your own range, once you’ve typed the range it should be highlighted in blue as ours is above, if the entire range is selected close the bracket and hit enter!.
In the red box, where we typed the equation we now have a number. This is our numerical average.
Step 3: Calculate the Standard Deviation
In this step, we use another Excel function to calculate the standard deviation of the sample. The process is very similar to what we showed in the previous step, check out the image below to help guide you.
We start by typing Standard Deviation or Std. DEV for short in the cell below where we typed mean in the previous step (red box). Then in the D3 cell next to it, we type:
So again in our case, the range is A2:A16, your range will be different depending on your sample. As we define the range it will be highlighted in the spreadsheet to help guide us. If the correct range is defined, hit enter!
You should have a number, in our case, it’s 5.3…. as shown in the red box.
Once you have both numbers you’re ready to calculate the Z scores of your data!
Step 4: Use the Standardize Function
Excel has a built-in function for working out the Z score of a data point in a set if the mean and standard deviation is defined hence the two previous steps. It is basically the same as the equation we defined in the intro without us actually have to type out the equation. One note is that when we enter the location of the mean and standard deviation we’ll need to use dollar signs $ to fix the points in the sheet as we’ll be dragging the equation along to all data sets.
For example, our mean is in cell D2, however, when we type it into the function we type $D$2 and for the standard deviation in cell D3, we type $D$3 otherwise things will get seriously messed up!
The function is
=STANDARDIZE(x, mean, standard deviation), where x will be the start of our range. so in our case, it is:
=STANDARDIZE(A2, $D$2, $D$3)
Check out the picture below to guide you further:
As you can see we have typed the above formula into cell B2, cells A2, D2, and D3 are highlighted in different colors as they are all called by the standardized function. Now once we close the bracket and hit enter, the Z-score for A2 will appear in the B2 cell.
Once you’ve done that hover your mouse over the right corner of the cell where you typed the equation, the cursor will turn into a plus. When it does click the corned and drag the pointer down along the column. The numbers should start populating as below.
As you can see the Z value closest to zero is in cell B9, which corresponds to 127 cm as can be seen in cell A9, this is very close to the mean of the sample which is 127.4 – as you can see the other children are either shorter or taller than the mean with the tallest child being nearly two standard deviations (1.99) away from the mean.
The negative Z score indicates a number of standard deviations below the mean, while a positive V score indicates that the data point is greater than the mean for the sample.