- Z-scores are numerical representations of how far data values are from the mean, given in terms of standard deviations.
- Z-scores can be calculated using the formula z = (x – Î¼) / Ï, where z is the Z-score, x is the data point, Î¼ is the mean, and Ï is the standard deviation.
- Z-score tables can be used to determine the cumulative probability of obtaining a value less than or equal to a specific score.
Statisticians will know the importance of standardization when it comes to analysis, but this concept can evade the rest of us. When we want to analyze and compare experimental values, it’s important that they share the same units, or lack a unit altogether. This is where Z-scores come into play, as they provide us with an easy way to compare the distribution and probability of our values, as well as identify any anomalous values. In this article, we’re going to cover what Z-scores are and how they work, as well as give you a simple chart for calculating your own Z-scores.
What Is a Z-Score?
In simple terms, a Z-score is a numerical representation of how far your data value is from the mean (average) data value, given in terms of standard deviations. Z-scores can be positive or negative, depending on whether they represent data points above or below the mean, respectively. The size of a Z-score also gives us important information, namely how much the data deviates from the mean. A higher score indicates a higher degree of deviation.
Luckily for us, we can refer to Z-score tables or Z tables, which have pre-determined Z-score values, in order to figure out how much of our data will fall above or below a specific score, as well as calculate the rank of a score in terms of percentiles. It’s worth noting that Z-scores must be used along with a standard normal distribution curve. This is a bell-shaped curve that’s symmetrical around the mean value.
How Do We Calculate Z-Scores?
First, you must have the mean and standard deviation of your data. Then, you can calculate the Z-score with the following equation:
z = (x – μ) / σ
Where z is the Z-score, x is the data point in question, μ is the mean, and σ is the standard deviation.
For example, let’s say we have data that represents students’ scores on a test, and one of these values is 75. The mean score is 60, and the standard deviation is 15. Plugging these numbers into our formula gives:
z = (75 – 60) / 15 = 1
Therefore, our Z-score for this data point is equal to 1. This tells us that the score is one standard deviation above the mean of the data, since the value is positive. Because we’re using a standard normal distribution curve, we also know that values larger than 68% of our data will have a Z-score (i.e. deviation from the mean) of between -1 and 1. 95% of our data will lie between -2 and 2, and 99% will lie between -3 and 3.
How Do We Use Z-Score Tables?
Now we have our Z-score, it’s time to consult a Z-score table. The rows correspond to the first digit of our score, while the columns correspond to the second digit. Many Z-score tables give positive values, but you can use negative Z-score tables as well. By locating the row and column associated with our score, we can obtain a value that represents the cumulative probability, i.e. the chances of obtaining a value less than or equal to our score.
For example, below is a positive Z-score table. Since our Z-score is 1, we locate the row for 1.0, and the column for 0.0. This gives us a value of 0.8413. Therefore, there’s an 84.13% of obtaining a value less than or equal to the value in question. In other words, this represents the amount of data that’s below or equivalent to this value.
What Else Can Z-Score Tables Tell Us?
Z-score tables have a few more tricks up their sleeves. For example, we can also subtract our value from 1 to obtain the area to the right of our score. This is because the normal distribution is symmetrical, meaning both areas on either side of the mean are equal. By calculating this, we get 0.1587, or 15,87%, meaning 15.87% of our data will fall above this score. If our score is negative, the situation is essentially reversed. The value without the negative sign represents the area to the right of our score. Subtracting our value (as a positive value) from 1 gives us the area to the left of our score. We can also subtract the smaller value from two Z-scores from the larger value. This gives us the probability of obtaining values within this range.
Another way Z-scores can be used is when you want to find the critical z-values. This can help you determine whether your test results are statistically significant or not. Let’s say you have a confidence level of 95% with a two-tailed test. This is a test where we want to determine any significant differences from the expected value. Because we haven’t specified a direction, we must split the critical region, i.e. 1 – (confidence level), into two. This gives us a value of 0.02500 on either side of our value. Looking this up in the negative Z-score table gives a value of 1.96.
Now we can calculate Z-scores for any other value we have, and see if this exceeds our critical value. If it does, then we have enough evidence to reject our null hypothesis (default assumption) in favor of the alternative hypothesis. If it falls within our critical region, then we can potentially disregard our null hypothesis. In this way, Z-scores are extremely useful in hypothesis testing. But it’s important to understand that failing to reject our initial hypothesis doesn’t necessarily mean that it’s true. It could just mean that don’t have enough evidence.
Sometimes, we have anomalous data values that can significantly skew our data. The process for identifying these is similar to testing hypotheses. We can see if the Z-score we calculate exceeds the critical value or not. If it does, the data is potentially an outlier, but could also be evidence against the null hypothesis. Many other statistical methods are usually employed to determine this, along with other techniques, such as repeating the experiment. If we consistently obtain data values outside of the critical value, this makes it more likely they’re not outliers. They may provide evidence against our null hypothesis. It’s also important to consider the likelihood of the data value in the context of the overall experiment. Some data values will be more plausible than others.
Z-Score tables may seem confusing at first. But once you know how to use them, they’re very useful for statistical analysis. By calculating your Z-score using your data value, mean value, and standard deviation, you can find the corresponding row and column to obtain the cumulative probability. If your Z-score is negative, it means your data is below the mean, and vice versa for a positive score. Z-score tables provide us with the cumulative probability to the left of our Z-score. Since standard normal distribution curves are symmetrical, you can obtain the probability that’s above the mean by subtracting the cumulative probability from 1 for positive Z-scores. For negative Z-scores, the opposite is true, but using the absolute (positive) value for the score instead. These scores are helpful in many areas, but especially in standardization, hypothesis testing, and identifying outliers.
|Z-Score||A numerical representation of how far your data value is from the mean (average) data value, given in terms of standard deviations.|
|Calculating Z-Scores||First, you must have the mean and standard deviation of your data. Then, you can calculate the Z-score with the following equation: z = (x – Î¼) / Ï|
|Using Z-Score Tables||By locating the row and column associated with our score in a Z-score table, we can obtain a value that represents the cumulative probability, i.e. the chances of obtaining a value less than or equal to our score.|
|Hypothesis Testing||Z-scores can be used to find the critical z-values to determine whether your test results are statistically significant or not.|
|Identifying Outliers||If the Z-score we calculate exceeds the critical value or not. If it does, the data is potentially an outlier, but could also be evidence against the null hypothesis.|
The image featured at the top of this post is ©Alfonso de Tomas/Shutterstock.com.