Annals of Clinical & Laboratory Science 36:194-200 (2006)
© 2006 Association of Clinical Scientists
Computation of Decision Levels from Differentiated Logistic Regression Probability Curves
Vincent A. DeBari
Department of Internal Medicine, School of Graduate Medical Education, Seton Hall University, South Orange, New Jersey
Address correspondence to Vincent A. DeBari, Ph.D., School of Graduate Medical Education, Seton Hall University, 400 South Orange Avenue, South Orange, NJ 07079, USA; email debarivi{at}shu.edu.
 |
Abstract
|
|---|
The determination of clinical decision levels (DL) or "cut-offs" for laboratory parameters involves the analysis of sensitivity and specificity at varying levels of the predictor variable (PV). Commonly, receiver-operator characteristic (ROC) curves are used for this purpose. However, the association between a binary outcome choice and a continuous PV is often tested for statistical significance by logistic regression (LoRe), which also provides estimates of outcome probability (P) at various levels of the PV. Utilizing a graphical procedure based on the 1st [
(P)] and 2nd [
"(P)] derivatives of the probability curve, DL were computed for simulated data sets (sims) and for actual data from a case-control study and compared with those obtained from ROC curves. Sims were constructed for 5 sets of two outcomes (n = 50, each outcome) of normally distributed data with progressive overlap and for 2 sets of fewer data (n = 15 and 9 per outcome, respectively). Additionally, data from a study of the relationship between serum Mg+2 concentration and outcomes in chronic obstructive pulmonary disease (COPD) were analyzed. DL from LoRe was taken to be the point where
"(P) = 0. For sims, the DL from LoRe correlated well with the optimum DL from ROC analysis (n = 7; r2 = 0.93; p = 0.0004). DL for Mg+2 in COPD data from LoRe was 0.83 mmol/L compared to mean of 0.82 mmol/L by ROC. These data suggest that, when the strength of association between outcomes and PV is analyzed by LoRe, DL can be determined from the probability curves. Moreover, LoRe may provide a useful method to determine DL with less ambiguity than those obtained from ROC curves, as well as provide measures of dispersion for the DL.
Keywords: Logistic regression, medical decision making, receiver-operator characteristic curves, epidemiology, biostatistics
 |
Introduction
|
|---|
When laboratory data are used to distinguish between a dichotomous outcome (presence or absence of disease, for example), a commonly used approach is the generation of a receiver-operator characteristic (ROC) curve, a technique that was originally developed (and hence the name) to discriminate between targets and sea clutter for ship radar-receiver operators shortly after World War II [1]. The ROC curve allows one to determine a decision level (DL) based on an analysis of the sensitivity and specificity of values of a continuous predictor variable (PV) at varying levels of the PV. The ROC curve is constructed by plotting sensitivity on the Y-axis and the quantity (1-specificity) on the X-axis; thus, the optimum DL is the point closest to the upper left-hand corner of the graph [2], ie, where the area under the curve approaches 1.
Useful though they may be, ROC curves suffer from several drawbacks. Simple visual inspection often fails to produce an obvious DL, requiring reliance on likelihood ratios (sensitivity/1-specificity). Moreover, because the denominator of the likelihood ratio is zero at any point where the specificity = 100%, the likelihood ratio becomes indeterminate. Finally, as Krouwer [3] has pointed out, ROC curves do not display the PV, adding to the difficulty in interpreting them.
The problem of statistically relating a dichotomous or binary outcome to a continuous PV is one that frequently faces epidemiologists in observational studies relating risk factors to outcomes [4]. Logistic regression analysis (LoRe; NB: This abbreviation is used herein to avoid confusing it with LR, often used for likelihood ratio) allows one to estimate the probability, P, at varying levels of the PV. The purpose of this study is to provide evidence for the postulate that the point of inflection of the LoRe probability curve occurs at the DL. That being the case, the first derivative of the curve,
(P), will display a minimum or maximum (depending on how the outcome is expressed as a function of the PV) and, as a corollary, that the second derivative,
"(P), will be zero at the DL. Moreover, these mathematical maneuvers will be made via a simple graphical procedure, utilizing readily available software.
 |
Materials and Methods
|
|---|
Software.
All statistical and curve-fitting calculations, other than LoRe analysis, were performed with PrismTM software (GraphPad Software, San Diego, CA). LoRe was performed using a "web"- based software routine [5] which also assesses goodness-of-fit, based on Hosmer and Lemeshow [6].
Study design.
The postulated relationship between the probability curve generated from LoRe and the optimum DL from ROC curve analysis was tested with 2 series of simulated data (sims) and with actual data from a previously published study [7].
Sim series I was constructed with each of the two outcomes having a PV (n = 50, each outcome) ranging from 5 to 90 arbitrary units. From comparison IA to IB, the PV for one outcome (the one with the higher set of PV values) was moved closer to the other, such that the distributions were completely separated in IA and completely overlapped in IF. These sims were tested for normality by the DAgostinoPearson omnibus normality test [8]. All were strongly Gaussian (p >0.9; difference from normality) and are shown in Fig. 1
.

View larger version (28K):
[in this window]
[in a new window]
|
Fig. 1. Distributions of PV for two groups (open bars and filled bars) in sim series I. In progressing from Fig. 1A to Fig. 1F, the PV values overlap, successively, to a greater extent. Thus, in Fig. 1A, the PV shows no overlap and in Fig 1F, complete overlap.
|
|
Series II consisted of 2 sims. For sim IIA, n = 15 and for IIB, n = 9. Although these distributions were not intentionally constructed to be normally distributed, they were tested by the DAgostinoPearson test and were found to fit normal distributions (for IIA, p = 0.52 and 0.37 for each of the distributions and for IIB, p = 0.44 and 0.80). This finding was confirmed by the ShapiroWilk normality test [9]. These sims are shown in Fig. 2
.
Actual data, from a recent study [7] in which serum Mg+2 concentration was examined in stable patients with chronic obstructive lung disease (COPD) and in patients exhibiting exacerbations of COPD, were also subjected to the analysis described herein.
Procedure.
Each of the 2 outcomes was assigned a dummy variable, 0 and 1, and were formatted for exportation to the LoRe software and the analysis performed. The resulting tables of P vs PV were then imported into PrismTM and plotted with P as the dependent variable as a function of PV. PrismTM requires spacing closer than the points input to these plots to generate derivative curves. Thus, a table of intermediate values of P, generated from nonlinear regression modeling of the curves, was plotted. These curves (not shown) have points which are adjacent to one another and which, in the smallest symbol font available in PrismTM, appear as continuous lines. At this point, the derivatives of the curves can be sequentially generated, ie, the first derivative,
(P), is generated from the curve of P vs PV and the second derivative,
"(P), from
(P).
Other analytical methods.
Pearsons parametric method was used to correlate the DL values obtained by the ROC curve analysis with the method described in this paper.
 |
Results
|
|---|
Series I.
The distribution pairs for sim series IA through IF (Fig. 1, A-F
) were subjected to ROC curve analysis as well as LoRe and the family of curves for these analyses are presented in Fig. 2
. Hypothesis testing, ie, the association between the continuous PV and either of the binary outcomes demonstrated, by LoRe, statistical significance (IAID, p <105 for all four; IE, p = 0.0039; IF, p = 1.00). Because the area under the ROC curve was 0.5 for sim IF and a straight line for P vs PV (note that at any value of PV, P is 0.5, demonstrating the rather intuitive observation that when there is no difference between PV for two outcomes, there is a 50% chance of any value of PV being associated with one of the two outcomes), this sim was excluded from any further analysis.
The differentiated curves for Series I are shown in Fig. 3
. Note that for IE, only
"(P) is shown, and that, as noted in the legend, the issue of scale had to be considered for graphical presentation of these data. The DL, taken to be the point where
"(P) = 0, is easily read from the second derivative plots.
The DL from
"(P) was compared with that from the ROC curves (taken to be the value of the PV where the ROC curve comes closest to the upper left hand corner) and linear regression of these variables is shown in Fig. 4
. There is strong correlation (r = 0.987; p = 0.0018) with an intercept of 3.25 and a slope of 0.95.

View larger version (16K):
[in this window]
[in a new window]
|
Fig. 4. Regression line for DL from LoRe versus that from ROC for sim series I. The dotted lines denote 95% CIs for the regression.
|
|
Series II.
The distributions for this series of sims is provided in Fig, 5. The resultant ROC curves and the relevant portions of the second derivative curves of P as a function of PV are shown in Fig. 6
. As expected, these sims, with relatively few data points and having a less normal character than those in series I, result in ROC curves that are considerably less smooth than the sims in series I. Nevertheless, the values of DL obtained from ROC curves and those obtained using the present method are in good agreement (45.5 vs 46.8 for IIA and 38.5 vs 41.6 for IIB).

View larger version (21K):
[in this window]
[in a new window]
|
Fig. 6. ROC curves and, as insets, "(P) for series IIA and IIB (Figs 6A and 6B, respectively). DL estimated from the curves are indicated as "x."
|
|
Actual data: Mg+2 in COPD.
The data reported in this study [7] were analyzed and the results are presented in Fig. 7
. It should be noted that the first derivative of the probability curve in reference [7] (Fig. 7A
) is presented as -
(P), as P was shown to decrease as Mg+2 increased. In the interest of consistency,
(P) is presented to demonstrate a maximum rather than a minimum, recognizing that, in either case,
"(P) = 0. The second derivative curve is given in Fig. 7B
.
The curve of -
(P) is overlaid with a fitted Gaussian distribution and shows excellent agreement with the Gaussian model. This suggests a method to determine the equivalent of a confidence interval (CI) for the probability in a manner analogous to the way a CI for a population is developed. In that the points of inflection in a normal (Gaussian) distribution occur at ± 1 standard deviation (SD) from the mean,
"(P) will exhibit minima and maxima at the points of inflection of
(P). These are shown by dotted lines in Fig. 7B
. Using these relationships for the upper and lower CI of probabilities, CIP , the 95% CIP can be calculated from:
Thus, for these data for which a DL of 0.83 mmol/L was obtained, the 95%CIP is 0.810.85. The comparison of the DL using the method described herein with 2 adjacent levels of Mg+2 from reference [7] is provided in Table 1, where it is observed that the value obtained from this method compares favorably with the 2 best estimates from ROC analysis
 |
Discussion
|
|---|
The prediction of an outcome based on the value of a laboratory parameter is a common problem in epidemiology. A number of statistical approaches have been developed to achieve the goal of deciding the quantity of a dependent variable which best predicts a clinical outcome. These include artificial neural networks [10] and discriminant analysis [11] as well as ROC curves and LoRe. Although all of these methods have been applied to multivariate systems [12,13], LoRe and ROC curves are mainstays of univariate decision analysis [2,14,15].
For the analysis of a single dependent variable, the equivalence between ROC analysis and binary LoRe analysis has been established [2]. In fact, a recent paper proposes a Bayesian approach (pre- vs post-test odds) to the application of logistic regression to DL for diagnostic tests [16]. In this paper, it is postulated that the optimum DL for a dependent variable is the value of the parameter at the at the point of inflection of the LoRe probability curve, ie, at P = 0.5. Corollaries to this postulate are: (i) the DL is given by the maximum or minimum of the first derivative of the probability curve, and (ii) the DL is the point where the second derivative equals 0. These postulates were tested using readily available software that is capable of providing facile graphical presentation of the DL. It should be noted that an elegant method for the generation of ROC curves from LoRe has been proposed using a semiparametric approach [17].
Clearly, ROC curves remain useful in the interpretation of diagnostic tests. When either sensitivity or specificity is a priority, the optimum DL may not reflect the value of the test that is most useful in a clinical setting [18]. In a sense, the analysis presented herein reflects the computation of a DL for which the Youden index is maximized [19].
The use of
"(P) has the advantage of estimating the dispersion about the DL. Clearly, the steepness of the probability curve is related to the precision of the point estimate of DL, ie, the steeper the linear slope, m, at P = 0.5, the narrower the variance, s2, will be, ie, at the boundary conditions where m is vertical or horizontal, respectively:
and
Thus, the model provides a measure of dispersion and, assuming, the central limit theorem can be extended to this model, the calculation of CIs, as described herein. Furthermore, a DL obtained for the relationship between a continuous variable and an outcome pair can be compared for statistical significance with that from another outcome pair using parametric methods, in a manner analogous to that for probabilities using the z statistic [20] and then calculating a p value from Z.
It would also be of interest to attempt to extend this model to include covariates. Such a multivariate approach has been applied to ROC curve analysis [21] and might be extensible to a graphical multivariate logistic regression technique using an approach, as described here, for a single PV.
View this table:
[in this window]
[in a new window]
|
Table I. Comparison of decision levels (DL) derived from receiver-operator characteristic (ROC) curves with decision levels computed by this method.
|
|
 |
Acknowledgements
|
|---|
Portions of this work were presented at the 2005 Annual Meeting of the Association of Clinical Scientists (1115 May, Troy, MI). I thank members of the Association for their comments, especially the suggestion of eventually extending this work to multivariate systems. I am indebted to Dr. M. Anees Khan for inviting me to collaborate with his Pulmonary Medicine group at St. Josephs Regional Medical Center on the study cited in reference [7]. I am grateful to Dr. David Felten, former dean of Seton Hall Universitys School of Graduate Medical Education, for his support and encouragement.
 |
References
|
|---|
- McFall RM, Treat TA. Quantifying the information value of clinical assessments with signal detection theory. Annu Rev Psychol 1999;50:215241.[Medline]
- Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clin Chem 1993; 39:561577.[Abstract/Free Full Text]
- Krouwer JS. Cumulative distribution analysis graphs an alternative to ROC curves. Clin Chem 1987;33:23052306.[Free Full Text]
- Janes H, Pepe M, Kooperberg C, Newcomb P. Identifying target populations for screening or not screening using logistic regression. Statist Med 2005;24:13211338.
- Pezzullo JC. Logistic Regression. http://members.aol.com/john71/logistic.html (rev. 2001).
- Hosmer Jr DW, Lemeshow S. Applied Logistic Regression, Wiley, New York, 1989; pp 135145.
- Aziz, HS, Blamoun AI, Shubair MK, Ismail MMF, DeBari VA, Khan MA. Serum magnesium levels and acute exacerbation of chronic obstructive pulmonary disease : A retrospective study. Ann Clin Lab Sci 2005; 35:423427.[Abstract/Free Full Text]
- Pearson ES, DAgostino RB, Bowman LR. Tests for departure from normality: Comparison of powers. Biometrika 1977;64:231246.[Abstract/Free Full Text]
- Shapiro S, Wilk MB. An analysis of variance test for normality. Biometrika 1965;52:591611.[Free Full Text]
- Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol 1996;49:12251231.[Medline]
- Hastie T, Tibshirani R, Buja A. Flexible discriminant analysis by optimal scoring. J Am Stat Assoc 1994; 89:12551271.
- Ture M, Kurt I, Yavuz E, Kurum T. Hipertensionun tahmini icin coklu tahmin modellerinin karsilastirilmasi si lojistik regresyon ve esnek ayirma analizieri. Anadolu Kardiyoloji Dergisi 2005;5:2428 [Turkish].
- Rasouli M, Okhovatian A. Enderami A. Serum protein profile as an indicator of malignancy: multivariate logistic regression and ROC analysis. Clin Chem Lab Med 2005;43:913918.[Medline]
- Lumbreras-Lacarra, B, Ramos-Rincon J, Hernandez-Aguado I. Methodology in diagnostic laboratory test research in Clinical Chemistry and Clinical Chemistry and Laboratory Medicine. Clin Chem 2004;50:530536.[Abstract/Free Full Text]
- Obuchowski NA, Lieber ML, Wians FH Jr. ROC curves in Clinical Chemistry: Uses, misuses and possible solutions. Clin Chem 2004;50:11181125.[Abstract/Free Full Text]
- Jannsens ACJW, Deng Y, Borsboom GJJM, Eijkemans MJC, Habbema JDF, Steyerberg EW. A new logistic regression approach for the evaluation of diagnostic test results. Med Decis Making 2005;25:168177.[Abstract/Free Full Text]
- Qin J, Zhang B. Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 2003;90:585596.[Abstract/Free Full Text]
- Needleman I, Moles DR. A guide to decision making in evidence-based diagnostics. Periodontology 2005;39: 164177.[Medline]
- Lux CJ, Conradt C, Steilzig A, Komposch G. Evaluation of the predictive impact of cephalometric variables: logistic regression and ROC curves. J Orofac Orthop 1999;95107.
- Obuchowski NA. An ROC-type measure of diagnostic accuracy when the gold standard is continuous-scale. Statist Med 2005; in press (available online, 15 Nov 2005).
- Schultz EK Multivariate receiver-operating characteristic curve analysis: Prostate screening as an example. Clin Chem 1995;41:12481255.[Abstract/Free Full Text]
This article has been cited by other articles:

|
 |

|
 |
 
P. Jokinen, H. Helenius, E. Rauhala, A. Bruck, O. Eskola, and J. O. Rinne
Simple Ratio Analysis of 18F-Fluorodopa Uptake in Striatal Subregions Separates Patients with Early Parkinson Disease from Healthy Controls
J. Nucl. Med.,
June 1, 2009;
50(6):
893 - 899.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. A. DeBari
Surrogate Gaussian First Derivative Curves for Determination of Decision Levels and Confidence Intervals by Binary Logistic Regression
Ann. Clin. Lab. Sci.,
January 1, 2009;
39(3):
313 - 317.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Jayatilaka, R. Shakov, R. Eddi, G. Bakaj, W. J. Baddoura, and V. A. DeBari
Clostridium difficile Infection in an Urban Medical Center: Five-year Analysis of Infection Rates among Adult Admissions and Association with the Use of Proton Pump Inhibitors
Ann. Clin. Lab. Sci.,
January 1, 2007;
37(3):
241 - 247.
[Abstract]
[Full Text]
[PDF]
|
 |
|