Back to Blog

How to estimate average bias & use Bland-Altman comparison – tips & examples

Last year we discussed the overall idea of bias in our blog post what you measure when you measure biasLately we’ve realized that there is need for a more thorough explanation of the concept of bias and practices related to quantitative method comparisons. A few weeks ago we took a closer look at correlation and what it really means (and what it doesn’t mean) in method comparisons. Now we will start our journey towards better understanding of bias.

Our first step is to discuss different numbers describing the average bias over the whole data set within a comparison study. We take a look at when can one number describe bias, and which number to use for that purpose. Regression models and bias as a function of concentration are not explained this time. We will get back to them in a later blog post.

 

What is bias and when is it measured?

Bias is the systematic error related to a measurement, i.e. how much the results differ on average from a reference. Practically all measurement setups contain both systematic and random error. The purpose of estimating bias is to get knowledge of systematic error.

In some cases systematic error can be corrected by calibration, adjustment of reference ranges, or otherwise fine-tuning the interpretation of results. (Random error is more difficult to control, as it represents the unexpected, and it should be estimated in a separate measurement setup.)

Bias is evaluated in clinical laboratories for example

  • when introducing new methods or instruments, to make sure that their results are good enough to be used in diagnostics

  • to gain knowledge of consistency between parallel instruments

  • when the reagent lot changes, to see whether the new one performs as well as the previous one

  • when an instrument is moved to a new location

  • to monitor changes in performance over time, e.g. each month

As there are multiple things that can be compared, we will be mostly using a general term measurement procedure to represent a measurement setup that’s being compared to another. A measurement procedure consists of the used instrument and method among other things that may affect the results. Basically any of these individual parameters can be taken under examination to gain knowledge whether varying conditions in the measurement setup cause bias.

How is bias estimated?

There are two ways of estimating bias.

In cases where bias is constant, we can find one value to describe the bias throughout the measuring range. This approach is suitable e.g. when performing parallel instrument comparisons. As we can expect the method to behave consistently in all instruments, it is reasonable to assume that the amount of bias does not vary as a function of concentration.

In cases where bias varies over the measuring range, a single value cannot be used to describe bias at all concentration levels. Instead we need to define bias as a function of concentration. This can be done by calculating a regression equation describing the bias at different concentrations. We’ll get back to this in a later blog post, but generally this is the case when you compare different methods to each other.

When using Validation Manager, you get the estimate for constant bias and regression analysis automatically for all your comparison data. So you really don’t have to make a choice between these two approaches. You can look at the results to decide which result to use as your bias estimate. Especially when performing parallel instrument comparisons with small amounts of data, it may be that regression analysis does not really give any better information about bias than the average bias.

 

Different ways of looking at the difference

To get an estimate for the average bias throughout your measuring range, first thing you need to do is to get an idea about what you are actually comparing. You also need to consider what is the information you are looking for. Based on these choices you will decide whether to use Bland-Altman approach or direct comparison to create your difference plots and to calculate the average bias.

A difference plot is a visualization of the differences of results from each sample as a function of concentration. That means that on y-axis, the position of a data point is determined by how much the results of the candidate and the comparative measurement procedures differ from each other. On x-axis, the position is determined by the concentration to which we are comparing. The y-axis can show the difference in reporting units (absolute scale) or as percentage difference (proportional scale, difference per concentration).

Two graphs: Proportional Bland-Altman plot and constant Bland-Altman plot of a data set containing data from 20 samples.
Image 1: Bland-Altman plots using proportional (on the left) and absolute (on the right) scale. Both graphs visualize the behavior of the same fictional data set.

 

What’s the thing with Bland-Altman all about?

Usually when evaluating bias, we do not have access to a reference method that would give true values. Instead we have a comparative measurement procedure (e.g. an instrument that will be replaced by a new one) that we assume to give about as accurate results as the candidate measurement procedure. Sometimes we even expect the candidate measurement procedure to be better than the comparative measurement procedure. That’s why we cannot calculate the actual trueness (i.e. bias compared to true values) of the new measurement procedure by just comparing the results of our candidate measurement procedure to the comparative measurement procedure.

So what does the bias mean if we don’t have access to a reference method? How can we estimate trueness?

In this case, use of Bland-Altman difference is recommended. It compares the candidate measurement procedure to the mean of the candidate and the comparative measurement procedures.

The point of Bland-Altman difference is pretty simple. The results of both candidate and comparative measurement procedures contain error. Therefore the difference between their results does not describe the trueness of the candidate measurement procedure. Instead, it only gives the difference between the two measurement procedures. That’s why, if we want to estimate trueness, we need a way to make a better estimation about the true concentrations of the samples than what the comparative measurement procedure alone would give.

Typically we only have two values to describe the concentration of a sample: the one produced by the comparative measurement procedure, and the one produced by the candidate measurement procedure. As mentioned above, the candidate measurement procedure gives often at least as good an estimate as the comparative measurement procedure. That’s why the average of these results gives us a practical estimate for the true values (i.e. the results of a reference method). It reduces the effect of random error, and even the weaknesses of an individual measurement procedure in estimating the true value. This makes Bland-Altman difference a good choice when determining the average bias.

 

When not to use Bland-Altman difference?

If you are comparing the results of the new measurement procedure to a reference method (i.e. true values) you should compare the measurement procedures directly. This means that the results given by the candidate measurement procedure are simply compared to those given by the comparative measurement procedure.

Also, if you are not really interested in the trueness of the new measurement procedure, but rather want to know how the new measurement procedure behaves compared to the one you used before, direct comparison is the right choice. This is typically the case in parallel instrument comparisons, lot to lot comparisons and instrument relocations. If you want to, you can use replicate measurements to minimize the effect of random error in your results.

When a laboratory replaces an old method with a new one, they may choose either of the approaches depending on their needs. E.g. when you adjust the reference ranges, difference between the methods tells you how to change the values. That’s why direct comparison is often more practical than Bland-Altman approach.

To use direct comparisons and/or average results of sample replicates in Validation Manager, you can simply change the default analysis rule settings on your study plan.

Your choices define what the reported numbers really mean. That’s why it’s important to consider whether Bland-Altman comparison or direct comparison suits better to your purposes. Otherwise you may end up making false conclusions out of your results.

Four graphs representing a data set of 20 samples: constant comparative difference plot, constant Bland-Altman difference plot, proportional comparative difference plot and proportional Bland-Altman difference plot.
Image 2: Difference plots of the same data set using direct comparison (on the left) and Bland-Altman approach (on the right). In this case, the scale on X-axis reaches higher values on Bland-Altman plots than on direct comparison plots. This is because the candidate measurement procedure gives higher results than the comparative measurement procedure. With some other data set, it could be the other way round. Also the scale on Y-axis differs between the graphs that show proportional difference. This is basically due to same reasons as the different scales on X-axis. The shape of the distribution also differs between graphs. In this case this is most prominent on high concentrations, where direct comparison gives a more even scatter of the results around the mean value. That’s why the conclusions that you can draw from a graph may differ depending on whether you are using direct comparison or Bland-Altman approach.

 

Visual evaluation of the difference plot

After making these decisions and adding your data to Validation Manager, you can examine the difference plot visually. Make sure that the desired measuring interval is adequately covered. In an optimal situation, the measured concentrations would be rather evenly distributed over the whole concentration range. Often this is not the case. Then you need to make sure that all clinically relevant concentration areas have enough data to give understanding about the behavior of the measurement procedure under verification.

Constant Bland-Altman plots of two different data sets, both containing data from 20 samples. On the right, less than half of the measuring range is well covered.
Image 3: Constant Bland-Altman plots of two different fictional data sets. On the left, data is distributed evenly throughout the measuring range. On the right, there’s quite much data at low concentrations but only a couple of data points at high concentrations. If high concentrations are medically relevant, it is advisable to add more data to high concentrations.

 

When would a single value describe the bias of the whole data set?

For one bias estimate to describe the method throughout the whole measuring range, bias should seem constant as a function of concentration either in absolute or proportional scale. Evaluating this visually is easier if the variability of differences also behaves consistently across the measuring interval. Variability can be constant on an absolute scale (constant Standard Deviation SD) or on a proportional scale (constant Coefficient of Variation, CV). If bias is negligible, you can choose whether to examine it on absolute or proportional scales based on which one shows more constant spread of the results. If there is visible bias, you will need to estimate visually whether it seems more constant on absolute or on proportional scale.

Four graphs representing two different data sets: Proportional Bland-Altman plot and constant Bland-Altman plot of each data set, both containing data from 20 samples.
Image 4: On the left, proportional and constant Bland-Altman plots of a fictional data set with sample concentrations distributed quite evenly throughout the measuring range. Looking at the graphs, on proportional scale the data set seems like it may be described by average bias, as the blue horizontal line that represents the mean difference is quite nicely fitted into the data set. On high concentrations though, all the dots are above the mean difference, which may be due to variance in the results but may also indicate growing bias. To be sure, one might measure more samples with high concentrations. On the right, proportional and constant Bland-Altman plots of another fictional data set with sample concentrations distributed quite evenly throughout the measuring range. Looking at the graphs, on proportional scale one might think that bias varies throughout the measuring range and there seems to be one potential outlier, but on constant scale the bias seems constant and there are no outliers.

When looking at the difference plot, you should also screen for possible outliers in the data. Validation Manager helps you in this by doing a statistical analysis on the data. On the difference plot, Validation Manager shows a red circle around values that seem to be outliers. If there are outliers, you should investigate the cause and assess whether the result really is just a statistical outlier. If it is, you can easily remove the data point from the analysis by marking it as an outlier.

Two graphs: Proportional Bland-Altman plot and constant Bland-Altman plot of the same data set containing data from 20 samples. On proportional scale, one sample with low concentration is recognized as a potential outlier based on statistical analysis. On constant scale, that sample does not seem like an outlier, instead another sample on higher concentrations is recognized a potential outlier.
Image 5: Bland-Altman plots using proportional (on the left) and absolute (on the right) scale. Both graphs visualize the behavior of the same fictional data set. Statistical analysis on the data reveals one possible outlier on both of the graphs and Validation Manager highlights them with a red circle. Before marking an outlier, we need to consider whether the behavior of the data is better described on proportional or on absolute scale. Only after that we can know which one of the circled data points would be a statistical outlier. Other considerations may also be needed before marking an outlier.

Finally you should evaluate how the results behave as a function of concentration. Could you draw a horizontal line through the difference plot that would describe the behavior of bias on all concentration levels? Validation Manager helps you in this by drawing a blue line to represent the mean difference. If a horizontal line would describe the data set, you can consider the bias to be constant.

 

What value to use as average bias estimate?

When bias is constant throughout the measuring range, we can use one value to describe bias. But what value should we use?

Validation Manager shows mean difference on your overview report. It is calculated as an average over individual differences of samples, according to equations shown below.

Mean difference is calculated by summing over all differences and dividing the result by the amount of data points. To get absolute mean difference, the individual differences are formed simply by subtracting each result of the comparative measurement procedure from the result of the candidate measurement procedure related to the same sample. To get proportional mean difference using direct comparison, each of these absolute differences are divided by the result of the comparative method related to the same sample. To get proportional Bland-Altman difference, the individual absolute differences are divided by the mean value of the candidate and comparative measurement procedures.
Image 6: Equations for calculating mean difference, where x_n,cand is a value measured from sample n using candidate measurement procedure, x_n,comp is a value measured from sample n using comparative measurement procedure, and N is the number of samples. The equation to use depends on your decisions and observations described earlier in this blog post.

This is the value that you usually should use if you want to use the average bias as your bias estimate. To evaluate the reliability of this value, there are two more things that you should be looking at.

The overall report in Validation Manager gives you an overall impression of the results. A compact table gathers the most important calculated values of all verified methods and instruments. In this example, the overview report table contains data for six analytes.
Image 7: Overview report table with some example data. For every comparison pair, the table shows measuring range, mean difference and bias calculated using the selected regression model. Results are highlighted with green or orange color depending on whether or not the calculated values are within set goals. Below the calculated mean difference, the 95% confidence interval is shown.

First, the confidence interval. As your data set does not represent your measurement procedures perfectly, there is always an element of uncertainty in the calculated values. Confidence interval describes the amount of doubt related to the calculated mean value. If the CI range is wide, it is advisable to measure more samples to gain more accurate knowledge of the behavior of the measurement procedure.

Second, you should assess whether the variability of the data is even enough. For skewed data sets, mean difference does not give a good bias estimate. An easy way to estimate this is to check whether the mean and median differences are close to each other. If they match, mean difference gives a good estimate. If mean and median values differ significantly compared to each other, median difference gives a better estimate. The drawback of using median is that estimating the confidence interval is trickier than it is for mean difference. It requires more data points to give reliable results, so again you may need to measure more samples to have confidence on your results.

Two graphs: Proportional Bland-Altman plot and constant Bland-Altman plot of the same data set containing data from 20 samples. On proportional scale, one sample with low concentration is recognized as a potential outlier based on statistical analysis. On constant scale, that sample does not seem like an outlier, instead another sample on higher concentrations is recognized a potential outlier.
Image 8: Bland-Altman plots using proportional (on the left) and absolute (on the right) scale. Both graphs visualize the behavior of the same fictional data set. The blue horizontal line in each graph represents the mean value (on proportional or absolute scale), so that you can visually evaluate how well it describes the data set. Above the graph you can find calculated values for mean and median differences (both proportional and absolute differences). This data set is clearly skewed, so that mean difference gives somewhat higher estimate for bias than the median difference. Therefore median value would give a better estimate of bias than mean value. If you want to compare these graphs to graphs showing a data set that is not skewed, take a look at e.g. Image 4.

Some of our users are interested in sample specific differences. While these values may be relevant in some cases, they do not represent bias very well. If you do not average over multiple replicate measurements, sample specific results show random error in addition to the systematic error. That’s why they do not tell much about bias.

Table showing a list of 20 samples, sample names on the first column. The middle columns show results for each sample measured with the candidate measurement procedure and with the comparative measurement procedure. The last column shows the calculated difference for each sample and the average of all differences.
Image 9: Test runs table showing measured results and calculated difference for all samples related to the compared measurement procedures. The user can select whether to view these differences as absolute or proportional differences.

How to make it easier if you have loads of data

If you have multiple identical instruments measuring the same analytes with the same tests, going through all of this for all the instruments takes a lot of time. Fortunately, if you measure the same samples with multiple parallel instruments, you do not have to go through all the results in such detail. Often we can assume consistent behavior between parallel instruments. Then it’s enough to verify one instrument thoroughly, and do a lighter examination for others. You need to make sure though that the measuring interval is adequately covered for all of the instruments.

You can verify all the parallel instruments within the same study in Validation Manager. Set performance goals for all of the analytes and import your data. The overview report shows you if there are instruments or analytes that require more thorough investigation. Then you can easily dig into these results.

The overview report in Validation Manager shows you if there are instruments or analytes that require more thorough investigation. In this example, the overview report table contains data for six analytes.
Image 10: The overview report. Green color shows where the goals have been met and orange color shows where there seems to be more bias than what is acceptable. Issues such as potential outliers may be indicated with a warning triangle. Clicking a row on the table opens detailed results including the difference plots.

 

When is regression analysis needed?

When you look at your difference plot, in some cases you will see that the bias changes over the measuring range. In those cases it is not enough to estimate bias with mean or median differences. Instead you should use regression analysis to estimate how bias changes over the measuring interval. It may also be necessary to measure more samples to cover the measuring range better and to get a more reliable estimate for bias.

Two graphs: Proportional Bland-Altman plot and constant Bland-Altman plot of the same data set containing data from 20 samples, with bias varying as a function of concentration. On proportional scale, the lowest concentration seems like a potential outlier, but on constant scale the scatter of the data points seems even.
Image 11: Bland-Altman plots using proportional (on the left) and absolute (on the right) scale. Both graphs visualize the behavior of the same fictional data set. The blue horizontal line in each graph represents the mean value (on proportional or absolute scale), so that you can visually evaluate how well it describes the data set. In this data set, bias seems to be negative on low concentrations and positive on high concentrations. Neither of the graphs show constant bias, although on proportional scale, most of the data points are within 95% confidence interval (i.e. between the blue dotted lines). Yet, as on the absolute scale bias seems pretty linear and the scatter of the data points is quite even, there probably is no reason to treat the lowest concentration as an outlier. Therefore it is advisable to use regression analysis to estimate bias as a function of concentration.

Real data is not always scattered evenly enough that it would be easy to see how the bias behaves. Since Validation Manager gives you both mean difference and regression analysis automatically, you don’t have to wonder which one to calculate. Instead you can look at both of them to decide whether you bias is small enough or not, and which one of them would describe your data set better.

If you want to learn more about regression analysis, stay tuned for our next blog post around this topic of determining bias. While waiting for that, you can check out our blog post that introduced the quantitative comparison studies. It gives you an idea of the possibilities that Validation Manager offers for bias calculations, and instructions on how to conduct the study.

 

If you want to dig deeper into difference plots and related calculations, here are some references for you to go through: