Purpose Computed tomography (CT) is a fast and ubiquitous tool to evaluate intra-abdominal organs and diagnose appendicitis. However, traditional CT reporting does not necessarily capture the degree of uncertainty and indeterminate findings are still common. The purpose of this study was to evaluate the reproducibility of a standardized CT reporting system for appendicitis across a large population and the system's impact on radiologists' certainty in diagnosing appendicitis. Methods Using a previously described standardized reporting system, eight radiologists retrospectively evaluated CT scans, blinded to all clinical information, in a stratified random sample of 237 patients from a larger cohort of patients imaged for possible appendicitis (2010-2014). Receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) were used to evaluate the diagnostic performance of readers for identifying appendicitis. Two-thirds of these scans were randomly selected to be independently read by a second reader, using the original CT reports to balance the number of positive, negative and indeterminate exams across all readers. Inter-reader agreement was evaluated. Results There were 113 patients with appendicitis (mean age 38, 67% male). Using the standardized report, radiologists were highly accurate at identifying appendicitis (AUC=0.968, 95%CI confidence interval: 0.95, 0.99. Inter-reader agreement was >80% for most objective findings, and certainty in diagnosing appendicitis was high and reproducible (AUC=0.955 and AUC=0.936 for the first and second readers, respectively). Conclusions Using a standardized reporting system resulted in high reproducibility of objective CT findings for appendicitis and achieved high diagnostic accuracy in an at-risk population. Predictive tools based on this reporting system may further improve communication about certainty in diagnosis and guide patient management, especially when CT findings are indeterminate.
Bibliographical noteFunding Information:
Research reported in this publication was supported by the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health, United States, under Award no. T32DK070555. The Comparative Effectiveness Research Translation Network (CERTAIN) is a program of the University of Washington that provided research infrastructure and analytical support for this project. The content is solely the responsibility of the authors and it does not necessarily represent the official views of the National Institutes of Health or the University of Washington.