Annals of Clinical & Laboratory Science 39:313-317 (2009)
© 2009 Association of Clinical Scientists
Surrogate Gaussian First Derivative Curves for Determination of Decision Levels and Confidence Intervals by Binary Logistic Regression
Vincent A. DeBari
Department of Internal Medicine, School of Health and Medical Sciences, Seton Hall University, South Orange, New Jersey
Address correspondence to Vincent A. DeBari, Ph.D., School of Health and Medical Sciences, Seton Hall University, 400 South Orange Avenue, South Orange, NJ 07079, USA; tel 973 877 2813; fax 973 877 5767; email: debarivi{at}shu.edu.
 |
Abstract
|
|---|
It has been demonstrated that decision levels (DL) and their confidence intervals (CI) can be estimated from the second derivative, f (P), of the logistic regression probability curve (LRPC). Although this method generally provides smooth curves from which DL and CI can be obtained, there are datasets that generate "noisy" curves making these measurements difficult. The purpose of this study was to develop a procedure to obviate this noise, thus allowing the more facile estimation of DL and CI. Data from two clinical studies were examined. Logistic regression analysis was performed and the first derivatives, f (P), were fitted to Gaussian models. The derivatives of these surrogate f (P) were generated to provide f (P) and were compared with data from receiver operating characteristic (ROC) curves. For both sets of data, the surrogate curves demonstrated strong fits to the natural f (P) with r2 = 0.986 for one study and 0.832 for the second. The f (P) generated from the surrogate curves demonstrated single maxima (M) and minima (m), compared with the f (P) generated from the natural f (P) in which multiple M and m were observed. Easily discernible DL and CI were observed for both datasets with differences from ROC-estimated DL of 1.7% for the first study and 4.8% for the second. The use of a surrogate Gaussian simulation of f (P) may be a useful alternative to natural f (P) when using the f (P) of the LRPC to determine DL and CI.
Keywords: logistic regression, curve-fitting, medical decision making, receiver-operator characteristic curve, epidemiology, biostatistics
 |
Introduction
|
|---|
A common decisional problem in medicine is the determination of the point at which a continuous predictor variable provides the best differentiation between two dichotomous outcomes, eg, disease presence/absence. This is most commonly done by the use of receiver operating characteristic (ROC) curves [1], which provide a measure of the strength of association between the predictor variable and the outcome by analysis of the area under the ROC curve as well as a cut-off or decision level (DL) at which sensitivity and specificity are optimized.
Binary logistic regression (BLR) can also be used to test the association of a predictor variable with a dichotomous outcome, providing an odds ratio (OR) derived from the logit function, B, given by OR = eB. Additionally, the strength of association can be tested by one of several inferential maneuvers [2]. Several years ago, a method was proposed that made use of the second derivative, f (P), of the BLR probability curve (LRPC) to provide a DL as well as 95% confidence intervals (CI) for the predictor variable [3].
In the initial description of this method, the procedure was illustrated with both simulated datasets and with actual data from a case-control study that had been published earlier [4]. In all of these examples, the curve-fitting routine used to generate the continuous LRPC provided smooth, regular derivatives, both for f (P), the first derivative, and for f (P).
However, in several subsequent studies [5,6], the derivatives yielded poorly developed, noisy curves, due to variations in the LRPC, ultimately leading to some degree of ambiguity in the f (P) from which the DL and 95% CI are calculated. Herein, a simple maneuver is described to obviate the noise encountered when the LRPC is not perfectly fitted.
 |
Methods
|
|---|
The procedures used to perform BLR, subsequent generation of the LRPC, and determination of DL and the 95% CI have been described in the initial description of this method [3]. Briefly, a web-based software routine (http://statpages.org/logistic.html) [7] was used to perform BLR. This provides a table of probabilities (P) at varying levels of the predictor variable, as well as OR and inferential statistics,
2, and p-value. The table is input as primary data to a statistical software package (Prism®, GraphPad Corp., San Diego, CA). Prism® is used to fit a continuous curve and a table of interpolated values that it requires in order to differentiate the curves.
In this paper, an additional step is employed, ie, after f (P) is generated, the "raw" f (P) curve is fitted to a best-fit, Gaussian model, via the non-linear curve-fitting software built into Prism®. The Gaussian f (P) curve is then differentiated, to yield f (P), from which DL [where f (P) =0] and 95% CI [from the f P minimum (m) and maximum (M)] are calculated, as previously described [3].
The data used in this report derive from two clinical studies. The first of these has been previously published [6] and derived from a study of the association of D-dimer (as fibrinogen equivalent units in µg/ml) levels as the predictor variable and severity of pulmonary embolism (mild-to-moderate) vs (severe-to-very severe) as the binary outcome. e second is a study in progress for which an abstracted version exists [8]. This latter study provided preliminary data on the relationship between the size of a patent ductus arteriosus (PDA) as the outcome and brain (or beta-type) natriuretic protein (BNP) as the predictor variable.
 |
Results
|
|---|
In Fig. 1
, a stepwise demonstration of the application of the method is provided. The raw pairs of probability (P) of severe pulmonary embolism versus the predictor variable (plasma D-dimer concentration) are plotted (Fig. 1A
). In Step 1, a series of interpolated values is then generated and plotted in Fig. 1B
. Step 2 differentiates this curve, providing f (P) shown in Fig. 1C
. This curve is then smoothed (Step 3) by fitting to a Gaussian model (Fig. 1D
). This fit was extremely good, with a non-linear regression coefficient, r2, of 0.986. If the "native" f (P) curve is differentiated, a unusable, noisy f (P) is obtained (Fig. 1E
). However, differentiation of the Gaussian surrogate (Step 4), yields a smooth f (P) from which DL and CI were calculated (DL = 12.20 µg/ml; 95% CI: 10.3 to 14.9 µg/ml, calculated from M =10.2 and m = 14.1). The DL value compares favorably (1.2% difference) with that obtained from the native f (P) from which DL could best be estimated as 12.35 µg/ml (and from which CI were impossible to determine) and by 1.7% from the ROC curve, which yielded a DL of 12.41 µg/ml. The sensitivity and specificity from these three procedures are summarized in Table 1
and suggest a modest improvement in analytical parameters obtained by implementing the procedure described herein.

View larger version (21K):
[in this window]
[in a new window]
|
Fig. 1. Stepwise demonstration of application of Gaussian simulation to the determination of DL and 95% CI from the second derivative of the LoRe probability curve. ese are the data from ref. [6]. A. Raw output from LoRe (64 pairs) with the best-fit curve. B. Curve derived from 256 pairs of interpolated points derived from A. C. First derivative (f P) of curve in B. D. Gaussian simulation (best-fit) of curve in C. E. Second derivative (f P) of curve B (first derivative of curve in C). F. Second derivative of probability curve obtained by taking the derivative of the Gaussian simulation in curve D. Each step is enumerated, with the blocked step (from C to E) suggesting that the curve in E, with its multiple maxima and minima, should be avoided.
|
|
View this table:
[in this window]
[in a new window]
|
Table 1. Summary of sensitivity, specificity, and likelihood ratios (sensitivity/(1-specificity) for 3 estimates of the DL for the study described in ref. [6].
|
|
An analysis of the DL for BNP with the dichotomous outcome of PDA severity is provided in Fig. 2
. The f (P) is given as the main figure with the corresponding ROC curve in the inset. At a level of 119 pg/ml, the DL determined from the point where f (P) = 0, the sensitivity and specificity are 85.5% and 96.7%, respectively, whereas the sensitivity and specificity determined from ROC curve analysis yields values of 87.5% and 96.7%.

View larger version (25K):
[in this window]
[in a new window]
|
Fig. 2. Application of this method to data in ref. [8]. Curve of f P showing DL at f P = 0 and M and m from which 95% CI were calculated. These values, with corresponding sensitivity and specificity are shown. The inset is the ROC curve, accompanied by the AUC, DL value at maximum likelihood ratio, and the corresponding sensitivity and specificity.
|
|
 |
Discussion
|
|---|
The use of the second derivative of the probability curve determined by logistic regression analysis was described several years ago [3]. In the original description, simulations were used to develop the procedure and were applied to data from an earlier study [4] of serum Mg+2 concentrations as a predictor of exacerbation of disease in patients with chronic obstructive pulmonary disease. The use of the f (P) provides a simple graphical procedure to obtain DL from logistic regression analysis and offers several advantages over ROC curve analysis, namely, it graphically presents the predictor variable on an axis (x), unlike ROC curves whose axes relate sensitivity and specificity; it provides a clearly defined DL [at f (P) = 0] whereas ROC curves can be ambiguous in the region of interest; and, finally, the f (P) curve allows one to calculate upper and lower CI in a manner analogous to the computation used for the CI range for a point estimate (the mean).
In recent years, it has become apparent that improvement in the method would be required if it were to achieve its full utility. This is due to noise arising from deviations in the original LRPC and that becomes amplified as the first and second derivatives of this curve are generated. This noise can be rather benign and provide a reasonably good estimate of DL and 95% CI, as observed in one study [5], or it may, with some datasets [6], lead to a complete inability to ascertain CI and only a reasonable estimate of the true DL. Described herein is a simple method that appears to overcome the ambiguity caused by noisy f (P) curves.
It is clear from the data comparison in Fig. 2
that the method described in this paper may not always provide the best DL; note that the ROC curve maximum likelihood ratio yielded a value for sensitivity slightly better than that obtained from f (P), notwithstanding identical specificity. This limitation derives from slight shifts in the maximum in the fitted Gaussian curves. Some confirmation of this comes from the observation that, although the regression coefficient for this fitted curve was strong (r2 = 0.832), it was much less so than that obtained for the Gaussian curve fitted to the data on D-dimer and extent of PE (Fig. 1
). Nevertheless, reasonable estimates of CI can still be obtained, a deficiency of ROC curves.
Is it possible to ascertain precisely the level of r2 for the surrogate that could be judged to provide a suitable fit to f (P) of the fitted probability curve? Likely this value is best judged by inspection. The commercial software used in this application allows the curves to be superimposed and that maneuver might well be used as a reasonable, though subjective, estimate of validity. The proposal is put forward that, until more datasets can be evaluated, it would be reasonable to expect r2 to be
~0.8. An alternate approach was suggested by one of the reviewers of this paper, ie, to test the fit using a Kolmogorov-Smirnov (K-S) model. Such a comparison could be made using the Lilliefors K-S test as modified by Dallal and Wilkinson [9]. This might be of some value but, again, there remains the problem of selecting a level of
against which the obtained p-value could be compared and considered adequate. Again, as more datasets are developed, the value of the K-S test may well be demonstrated.
In conclusion, the simple fitting of a Gaussian-model surrogate for the first derivative of the LRPC, may provide an advantage in the application of the f (P) for the determination of DL and CI when a continuous predictor variable is associated with a dichotomous outcome.
 |
Acknowledgements
|
|---|
I thank Drs. V.K. Kalra and F.M. Kiblawi for providing the preliminary data from their ongoing study [8]. The clinical investigations providing data used in this paper were presented at the 2009 Annual Meeting of the Association of Clinical Scientists in Tampa, FL, on 13 to 18 May 2009.
 |
References
|
|---|
- Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clin Chem 1993;39:561–577.[Abstract/Free Full Text]
- Hosmer DW Jr, Lemeshow S. Applied Logistic Regression, New York, Wiley, 1989; pp135–145.
- DeBari VA. Computation of decision levels from differentiated logistic regression probability curves. Ann Clin Lab Sci 2006;36:194–200.[Abstract/Free Full Text]
- Aziz HS, Blamoun AI, Shubair MK, Ismael MMF, DeBari VA, Khan MA. Serum magnesium levels and acute exacerbation of chronic obstructive pulmonary disease: A retrospective study. Ann Clin Lab Sci 2005; 35:423–427.[Abstract/Free Full Text]
- Moammar MQ, Azam HM, Blamoun AI, Rashid AO, Ismail M, Khan MA, DeBari VA. Alveolar-arterial oxygen gradient, pneumonia severity index and outcomes in patients hospitalized with community acquired pneumonia. Clin Exp Pharmacol Physiol. 2008;35:1032–1037.[Medline]
- Blamoun J, Alfakir M, Sedfawy AI, Moammar MQ, Maroules M, Khan MA, DeBari VA. The association of D-dimer levels with clinical outcomes in patients presenting with acute pulmonary embolism. Lab Hematol 2009;15:4–9.[Medline]
- Pezzullo JC, Sullivan KM. Logistic Regression. http://statpages.org/logistic.html ver. 05.07.20 (last accessed 19 March 2009).
- Kalra VK, Kiblawi FM, Zauk A, DeBari VA. Plasma brain natriuretic peptide as a biochemical marker for patent ductus arteriosus in pre-term neonates. Ann Clin Lab Sci 2009;39:212 [Abstract].
- Dallal GE, Wilkinson L. An analytical approximation to the distribution of Lilliefors test statistic for normality. Am Stat 1986;40;294–296.