Last year we discussed the overall idea of bias in our blog post what you measure when you measure bias. Lately we’ve realized that there is need for a more thorough explanation of the concept of bias and practices related to quantitative method comparisons. A few weeks ago we took a closer look at correlation and what it really means (and what it doesn’t mean) in method comparisons. Now we will start our journey towards better understanding of bias.
Our first step is to discuss different numbers describing the average bias over the whole data set within a comparison study. We take a look at when can one number describe bias, and which number to use for that purpose. Regression models and bias as a function of concentration are not explained this time. We will get back to them in a later blog post.
What is bias and when is it measured?
Bias is the systematic error related to a measurement, i.e. how much the results differ on average from a reference. Practically all measurement setups contain both systematic and random error. The purpose of estimating bias is to get knowledge of systematic error.
In some cases systematic error can be corrected by calibration, adjustment of reference ranges, or otherwise finetuning the interpretation of results. (Random error is more difficult to control, as it represents the unexpected, and it should be estimated in a separate measurement setup.)
Bias is evaluated in clinical laboratories for example

when introducing new methods or instruments, to make sure that their results are good enough to be used in diagnostics

to gain knowledge of consistency between parallel instruments

when the reagent lot changes, to see whether the new one performs as well as the previous one

when an instrument is moved to a new location

to monitor changes in performance over time, e.g. each month
As there are multiple things that can be compared, we will be mostly using a general term measurement procedure to represent a measurement setup that’s being compared to another. A measurement procedure consists of the used instrument and method among other things that may affect the results. Basically any of these individual parameters can be taken under examination to gain knowledge whether varying conditions in the measurement setup cause bias.
How is bias estimated?
There are two ways of estimating bias.
In cases where bias is constant, we can find one value to describe the bias throughout the measuring range. This approach is suitable e.g. when performing parallel instrument comparisons. As we can expect the method to behave consistently in all instruments, it is reasonable to assume that the amount of bias does not vary as a function of concentration.
In cases where bias varies over the measuring range, a single value cannot be used to describe bias at all concentration levels. Instead we need to define bias as a function of concentration. This can be done by calculating a regression equation describing the bias at different concentrations. We’ll get back to this in a later blog post, but generally this is the case when you compare different methods to each other.
When using Validation Manager, you get the estimate for constant bias and regression analysis automatically for all your comparison data. So you really don’t have to make a choice between these two approaches. You can look at the results to decide which result to use as your bias estimate. Especially when performing parallel instrument comparisons with small amounts of data, it may be that regression analysis does not really give any better information about bias than the average bias.
Different ways of looking at the difference
To get an estimate for the average bias throughout your measuring range, first thing you need to do is to get an idea about what you are actually comparing. You also need to consider what is the information you are looking for. Based on these choices you will decide whether to use BlandAltman approach or direct comparison to create your difference plots and to calculate the average bias.
A difference plot is a visualization of the differences of results from each sample as a function of concentration. That means that on yaxis, the position of a data point is determined by how much the results of the candidate and the comparative measurement procedures differ from each other. On xaxis, the position is determined by the concentration to which we are comparing. The yaxis can show the difference in reporting units (absolute scale) or as percentage difference (proportional scale, difference per concentration).
What’s the thing with BlandAltman all about?
Usually when evaluating bias, we do not have access to a reference method that would give true values. Instead we have a comparative measurement procedure (e.g. an instrument that will be replaced by a new one) that we assume to give about as accurate results as the candidate measurement procedure. Sometimes we even expect the candidate measurement procedure to be better than the comparative measurement procedure. That’s why we cannot calculate the actual trueness (i.e. bias compared to true values) of the new measurement procedure by just comparing the results of our candidate measurement procedure to the comparative measurement procedure.
So what does the bias mean if we don’t have access to a reference method? How can we estimate trueness?
In this case, use of BlandAltman difference is recommended. It compares the candidate measurement procedure to the mean of the candidate and the comparative measurement procedures.
The point of BlandAltman difference is pretty simple. The results of both candidate and comparative measurement procedures contain error. Therefore the difference between their results does not describe the trueness of the candidate measurement procedure. Instead, it only gives the difference between the two measurement procedures. That’s why, if we want to estimate trueness, we need a way to make a better estimation about the true concentrations of the samples than what the comparative measurement procedure alone would give.
Typically we only have two values to describe the concentration of a sample: the one produced by the comparative measurement procedure, and the one produced by the candidate measurement procedure. As mentioned above, the candidate measurement procedure gives often at least as good an estimate as the comparative measurement procedure. That’s why the average of these results gives us a practical estimate for the true values (i.e. the results of a reference method). It reduces the effect of random error, and even the weaknesses of an individual measurement procedure in estimating the true value. This makes BlandAltman difference a good choice when determining the average bias.
When not to use BlandAltman difference?
If you are comparing the results of the new measurement procedure to a reference method (i.e. true values) you should compare the measurement procedures directly. This means that the results given by the candidate measurement procedure are simply compared to those given by the comparative measurement procedure.
Also, if you are not really interested in the trueness of the new measurement procedure, but rather want to know how the new measurement procedure behaves compared to the one you used before, direct comparison is the right choice. This is typically the case in parallel instrument comparisons, lot to lot comparisons and instrument relocations. If you want to, you can use replicate measurements to minimize the effect of random error in your results.
When a laboratory replaces an old method with a new one, they may choose either of the approaches depending on their needs. E.g. when you adjust the reference ranges, difference between the methods tells you how to change the values. That’s why direct comparison is often more practical than BlandAltman approach.
To use direct comparisons and/or average results of sample replicates in Validation Manager, you can simply change the default analysis rule settings on your study plan.
Your choices define what the reported numbers really mean. That’s why it’s important to consider whether BlandAltman comparison or direct comparison suits better to your purposes. Otherwise you may end up making false conclusions out of your results.
Visual evaluation of the difference plot
After making these decisions and adding your data to Validation Manager, you can examine the difference plot visually. Make sure that the desired measuring interval is adequately covered. In an optimal situation, the measured concentrations would be rather evenly distributed over the whole concentration range. Often this is not the case. Then you need to make sure that all clinically relevant concentration areas have enough data to give understanding about the behavior of the measurement procedure under verification.
When would a single value describe the bias of the whole data set?
For one bias estimate to describe the method throughout the whole measuring range, bias should seem constant as a function of concentration either in absolute or proportional scale. Evaluating this visually is easier if the variability of differences also behaves consistently across the measuring interval. Variability can be constant on an absolute scale (constant Standard Deviation SD) or on a proportional scale (constant Coefficient of Variation, CV). If bias is negligible, you can choose whether to examine it on absolute or proportional scales based on which one shows more constant spread of the results. If there is visible bias, you will need to estimate visually whether it seems more constant on absolute or on proportional scale.
When looking at the difference plot, you should also screen for possible outliers in the data. Validation Manager helps you in this by doing a statistical analysis on the data. On the difference plot, Validation Manager shows a red circle around values that seem to be outliers. If there are outliers, you should investigate the cause and assess whether the result really is just a statistical outlier. If it is, you can easily remove the data point from the analysis by marking it as an outlier.
Finally you should evaluate how the results behave as a function of concentration. Could you draw a horizontal line through the difference plot that would describe the behavior of bias on all concentration levels? Validation Manager helps you in this by drawing a blue line to represent the mean difference. If a horizontal line would describe the data set, you can consider the bias to be constant.
What value to use as average bias estimate?
When bias is constant throughout the measuring range, we can use one value to describe bias. But what value should we use?
Validation Manager shows mean difference on your overview report. It is calculated as an average over individual differences of samples, according to equations shown below.
This is the value that you usually should use if you want to use the average bias as your bias estimate. To evaluate the reliability of this value, there are two more things that you should be looking at.
First, the confidence interval. As your data set does not represent your measurement procedures perfectly, there is always an element of uncertainty in the calculated values. Confidence interval describes the amount of doubt related to the calculated mean value. If the CI range is wide, it is advisable to measure more samples to gain more accurate knowledge of the behavior of the measurement procedure.
Second, you should assess whether the variability of the data is even enough. For skewed data sets, mean difference does not give a good bias estimate. An easy way to estimate this is to check whether the mean and median differences are close to each other. If they match, mean difference gives a good estimate. If mean and median values differ significantly compared to each other, median difference gives a better estimate. The drawback of using median is that estimating the confidence interval is trickier than it is for mean difference. It requires more data points to give reliable results, so again you may need to measure more samples to have confidence on your results.
Some of our users are interested in sample specific differences. While these values may be relevant in some cases, they do not represent bias very well. If you do not average over multiple replicate measurements, sample specific results show random error in addition to the systematic error. That’s why they do not tell much about bias.
How to make it easier if you have loads of data
If you have multiple identical instruments measuring the same analytes with the same tests, going through all of this for all the instruments takes a lot of time. Fortunately, if you measure the same samples with multiple parallel instruments, you do not have to go through all the results in such detail. Often we can assume consistent behavior between parallel instruments. Then it’s enough to verify one instrument thoroughly, and do a lighter examination for others. You need to make sure though that the measuring interval is adequately covered for all of the instruments.
You can verify all the parallel instruments within the same study in Validation Manager. Set performance goals for all of the analytes and import your data. The overview report shows you if there are instruments or analytes that require more thorough investigation. Then you can easily dig into these results.
When is regression analysis needed?
When you look at your difference plot, in some cases you will see that the bias changes over the measuring range. In those cases it is not enough to estimate bias with mean or median differences. Instead you should use regression analysis to estimate how bias changes over the measuring interval. It may also be necessary to measure more samples to cover the measuring range better and to get a more reliable estimate for bias.
Real data is not always scattered evenly enough that it would be easy to see how the bias behaves. Since Validation Manager gives you both mean difference and regression analysis automatically, you don’t have to wonder which one to calculate. Instead you can look at both of them to decide whether you bias is small enough or not, and which one of them would describe your data set better.
If you want to learn more about regression analysis, stay tuned for our next blog post around this topic of determining bias. While waiting for that, you can check out our blog post that introduced the quantitative comparison studies. It gives you an idea of the possibilities that Validation Manager offers for bias calculations, and instructions on how to conduct the study.
If you want to dig deeper into difference plots and related calculations, here are some references for you to go through:
 CLSI Approved Guideline EP09A3 – Measurement Procedure Comparison and Bias Estimation Using Patient Samples
 J.M Bland & D.G. Altman (1995), “Comparing methods of measurement: why plotting difference against standard method is misleading”, The Lancet Vol 346 Issue 8982 p 10851087
 J.S. Krouwer (2008), “Why BlandAltman plots should use X, not (YX)/2 when C is a reference method”, Statistics in Medicine Vol 27 Issue 5 p 778780
 Wikipedia article about BlandAltman plot has also useful references