Purpose: Computer-aided detection/diagnosis (CAD) of prostate cancer (PCa) on multiparametric MRI (mpMRI) is an active area of research. In the literature, the performance of predictive models trained to detect PCa on mpMRI has typically been reported in terms of voxel-wise measures such as sensitivity and specificity and/or area under the receiver operating curve (AUC). However, it is unclear whether models that score higher by these measures are actually superior. Here, we propose a novel method for lesion identification as well as novel measures that assess the quality of the detected lesions. Methods: A total of 46 axial MRI slices of interest from 34 patients and the associated histopathologic ground truths were used to develop and to characterize the proposed measures. The proposed lesion-wise score sℓ is based on the Jaccard similarity index with modifications that emphasize the overlap and colocalization of predicted lesions with ground truth lesions. Thresholding of sℓ allowed for the sensitivity and specificity of lesion detection to be assessed, while the proposed lesion-summary score sσ is a weighted average of sℓs that provides a single summary statistic of lesion detection performance. The proposed measures were used to compare the lesion detection performance of a predictive model vs that of a radiologist on the same data set. The measures were also used to evaluate the degree to which viewing the cancer prediction improved diagnostic accuracy. Results: The lesion-wise score qualitatively reflected the goodness of predicted lesions over a wide range of values (sℓ = 0.1 to sℓ = 0.8) and was found to encompass a larger range of values than the Dice coefficient did over the same range of prediction qualities (0–0.9 vs 0–0.75). The lesion-summary score was shown to vary linearly with voxel-wise sensitivity and quadratically with voxel-wise specificity and correlated well with voxel-wise AUC (ρ = 0.68) and the Dice coefficient (ρ = 0.88). Radiologist performance was found to be significantly improved after viewing the model-generated cancer prediction maps as quantified by both sσ (P = 0.01) and DSC (P = 0.04), with improvements in both lesion detection sensitivity and specificity. Conclusion: The proposed measures allow for the assessment of lesion detection performance, which is most relevant in a clinical setting and would not be possible to do with voxel-wise measures alone.
Bibliographical noteFunding Information:
This work was supported in part by the National Institutes of Health (grants R01-CA155268, P41-EB015894, and T32-GM008244), the Department of Defense (grant W81XWH-15-1-0477), and the Minnesota Research Evaluation and Commercialization Hub (MN-REACH).
© 2018 American Association of Physicists in Medicine
- computer-aided detection and diagnosis (CAD)
- observer performance
- performance assessment