American Journal of Orthodontics and Dentofacial Orthopedics, 20220501, Volume 161, Issue 5, Pages 748751, Copyright © 2021
Diagnosis of a dental condition is the process by which we determine whether a person has the condition (target condition) of interest or not, and this is usually achieved by using relevant diagnostic tests. This short paper aimed to introduce the measures used to describe the performance of diagnostic tests.
Measures of test accuracy
Diagnostic accuracy refers to the ability of a test to correctly detect the presence or not of the target condition. Diagnostic tests are not perfect in discriminating between patients with or without the target condition, and therefore the diagnostic accuracy of each test should be evaluated.
The evaluation comes through comparing the diagnostic test examined, called index test, with a reference standard. The reference standard is a test or a procedure considered a reliable guide as to the absence or presence of the target condition. There can be 2 index test errors: falsepositive and falsenegative results. A test may be a false positive when the person does not have the target condition or a false negative when the person has the condition ( Table I ).
Target condition status (reference standard)  

With the target condition (diseased)  Without the target condition (healthy)  
Index test result  Positive
Negative 
True positive (TP): The participant has the condition, and the test result is positive False negative (FN): The participant has the condition, and the test result is negative 
False positive (FP): The participant has not the condition, and the test result is positive True negative (TN): The participant has not the condition, and the test result is negative 
Index tests can be based on a binary marker, directly providing a positive or negative test result, like the xrays, which can directly reveal a root fracture. There are also continuous index tests, like blood tests; these tests usually measure the levels of a substance or biomarker and require setting a cutoff (threshold) value to dichotomize the test results and then decide on the basis of this cutoff. A test is considered positive if the measured value exceeds the predefined threshold. For example, when testing for periodontitis by measuring Creactive protein levels in the blood, either a threshold of 5 mg/dL or 7mg/dL can be used. Hence, when a participant has a Creactive protein value greater than 5 (or 7) mg/dL, (s)he is considered to have periodontitis. Often, a set of different thresholds can be used for a single index test; however, as can be seen in Figure 1 and Table II , lower thresholds produce more true and false positives.
Threshold  Data  Sensitivity  Specificity  DOR  LR+  PPV  NPV  

Threshold 1 ( 5 mg/dL )  TP = 238 FN = 17 
FP = 104 TN = 255 
$$ 238 238 + 17 = 93 %

$$ 255 255 + 104 = 71 %

32.5  3.2  70%  94% 
Threshold 2 ( 7 mg/dL )  TP = 116 FN = 71 
FP = 51 TN = 376 
$$ 116 116 + 71 = 62 %

$$ 376 51 + 376 = 88 %

12.0  5.2  69%  84% 
The diagnostic performance of index tests is commonly described using 2 basic concepts: sensitivity and specificity. The sensitivity, or truepositive rate, is the probability an individual has a positive index test result when the target condition is present; it describes the ability of a test to correctly identify diseased patients. The specificity, or truenegative rate, is defined as the probability an individual has a negative index test result when the target condition is absent; it describes the ability of a test to rightly identify healthy participants. Both can be treated as proportions.
An ideal test would have both sensitivity and specificity close to 100%, in the sense that false negatives and false positives are close to zero. High specificity and high sensitivity of a test indicate that this test would be very useful, especially if it is easier to conduct than the gold standard; for example, a diagnosis from clinical examination (index test) vs magnetic resonance imaging (gold standard) for temporomandibular joint disc displacement. Unfortunately, 100% sensitivity and specificity are very uncommon in real life, and the choice between optimal sensitivity vs optimal specificity can depend on the question at hand. High sensitivity is important when the cost of a false negative is high, whereas high specificity is important when the goal is to rule out the target condition on the basis of a test result.
When test thresholds vary, sensitivity and specificity are inversely proportional; with a threshold change, an increase in sensitivity leads to a decrease in specificity and vice versa (threshold effect). In Figure 1 , the 2 different threshold values for testing for periodontitis are displayed: a lower (5 mg/dL) and a higher (7 mg/dL). When the threshold increased from 5 to 7 mg/dL, the number of truepositive cases decreased, whereas truenegative cases increased. Consequently, the test’s sensitivity (ie, the ratio of true positive over patients with the target condition) decreased, whereas the specificity (ie, the ratio of true negative over patients without the target condition) increased.
In Table I , at the 5 mg/dL threshold, sensitivity is 93% and specificity 71%. That means that the test correctly gives a positive result for 93% of participants with periodontitis (7% of participants with the target condition were classified falsely as negative), and a negative test result for 71% of healthy participants regarding periodontitis (29% of participants without the target condition is classified falsely as positive). When the threshold increases at 7 mg/dL, sensitivity decreases to 62%, and specificity increases to 88%. In brief, threshold selection plays a crucial role in diagnostic test accuracy studies as a change may change the patients’ classification and, consequently, diagnostic test accuracy measures.
A likelihood ratio (LR) of a diagnostic test describes how much the probability of having the target condition changes, given a test result. It is defined as the probability of a participant to have the target condition, given a test result, divided by the probability of a participant not having the target condition, given the same test result. Test results are either positive or negative. Consequently, there are 2 ratios, the positive LR (LR+) and the negative LR (LR−), which describe how many times more likely positive (or negative for LR−) test results are in the participants’ group with the target condition rather than the participants’ group without the target condition. LRs range from zero to infinity and can be derived using sensitivity and specificity ( Table III ). The greater the LR+ than 1, the better the test for confirming the target condition, and the lower the LR− the better the test ruling out the target condition. For example, in the data provided in Table II , the LR+ for 5 mg/dL test is 3.2. This means that a positive periodontitis test result is 3.2 times more likely in participants with periodontitis than in participants without periodontitis.
Measure  Definition  Formula 

Sensitivity  Probability of test to detect the diseased patients 
$$ TP TP + FN

Specificity  Probability of test to detect the healthy patients 
$$ TN TN + FP

LR  Positive: how many times more likely positive test results are in participants with the target condition vs participants without the target condition Negative: how many times more likely negative test results are in participants without the target condition vs participants without the target condition 
$\frac{}{}$ S e n s i t i v i t y 1 − S p e c i f i c i t y 1 − S e n s i t i v i t y S p e c i f i c i t y

Diagnostic odds ratio  How many times more likely is a positive test result in participants with vs participants without the target condition 
$$ S e n s i t i v i t y × S p e c i f i c i t y ( 1 − S e n s i t i v i t y ) × ( 1 − S p e c i f i c i t y )

Predictive values  Positive: probability to have the condition given a positive test result Negative: probability not to have the condition given a negative test result 
$$ TP TP + FP TN TN + FN

Prevalence  The proportion of participants with the target condition 
$\frac{}{}$ TP + FN TP + FN + TN + FP

ROC curve  A plot of sensitivity against 1 − specificity, constructed to illustrate the diagnostic performance of a test. The closer the curve to the upper left corner of the ROC space, the better the test 
The sensitivity and specificity of a test are typically reported as a pair. The diagnostic odds ratio (DOR) is a common approach to combine the 2 quantities into a single measure; it is defined as the ratio of the odds of test positivity in diseased over the odds of test positivity in healthy patients and can also be derived using the estimated sensitivity and specificity. ^{ ,} DOR is easy to calculate but often difficult to interpret. It ranges from zero to infinity: a DOR greater than 1 indicates that the test has a good discriminating ability, whereas the higher the DOR, the better the test. In the data provided in Table II , a DOR at threshold 5 mg/dL is 32.5, and a DOR at threshold 7 mg/dL is 12.0.
Sensitivity and specificity refer to the performance of a test, and given the status of the medical condition, we see if the test performs well or poorly. However, a question of interest would be the following: given the results of the test, what is the condition of the person? This can be provided by the positive predictive value (PPV) and negative predictive value (NPV).
PPV is the probability that a participant truly has the target condition given a positive index test result. ^{ ,} NPV is the probability that a participant does not have the target condition given a negative index test result. ^{ ,} In the data provided in Table II , at 5 mg/dL threshold, PPV is 70%, and NPV is 94%. Hence, a patient is 70% likely to have periodontitis, given a positive test result, whereas a patient is 94% likely not to have periodontitis, given a negative test result.
The prevalence measures how common is the target condition in a defined population and is expressed as a proportion. Sensitivity and specificity are the test’s characteristics and remain unaffected by any prevalence changes. Consequently, because DOR and likelihood ratios are estimated through sensitivity and specificity, they are also robust measures, irrespective of the prevalence of the target condition. However, changes in prevalence can influence the predictive values. More specifically, as prevalence increases, PPV would increase, whereas NPV would decrease as for every truepositive test result, and there would be fewer false positives. In contrast, a decrease in prevalence would decrease PPV and increase NPV. ^{ ,} ^{ ,}
Receiver operating characteristic curve
The receiver operating characteristic (ROC) curve is a graphical way to represent the performance of diagnostic tests. A ROC curve is created by plotting sensitivity (yaxis) against 1 − specificity (xaxis); it illustrates the tradeoff between sensitivity and specificity at every threshold included. The closer the curve to the upper left corner, the better the test; such a test would have sensitivity and specificity close to 100%. In ROC space, the diagonal line represents tests with no accuracy. A test with a ROC curve close to the diagonal line tends to be less accurate, whereas a ROC curve beneath the diagonal implies a misclassification problem (healthy are classified as diseased and vice versa). In Figure 2 , the ROC curves of blue fluorescence (BF), violet fluorescence (VF), and orange fluorescence (OF) for diagnosing dental caries are displayed. BF ROC curve is the closest to the top left corner, above VF and OF ROC curves. This shows that BF is the best among the 3 tests. However, OF ROC curve is under the noaccuracy line, which means that patients with dental caries may be wrongly classified as nonproblematic patients using the OF.
You're Reading a Preview
Become a DentistryKey membership for Full access and enjoy Unlimited articles
If you are a member. Log in here