ABSTRACT
PURPOSE
This study aimed to evaluate whether an artificial intelligence (AI) system can identify basal lung metastatic nodules examined using abdominopelvic computed tomography (CT) that were initially overlooked by radiologists.
METHODS
We retrospectively included abdominopelvic CT images with the following inclusion criteria: a) CT images from patients with solid organ malignancies between March 1 and March 31, 2019, in a single institution; and b) abdominal CT images interpreted as negative for basal lung metastases. Reference standards for diagnosis of lung metastases were confirmed by reviewing medical records and subsequent CT images. An AI system that could automatically detect lung nodules on CT images was applied retrospectively. A radiologist reviewed the AI detection results to classify them as lesions with the possibility of metastasis or clearly benign. The performance of the initial AI results and the radiologist’s review of the AI results were evaluated using patient-level and lesion-level sensitivities, false-positive rates, and the number of false-positive lesions per patient.
RESULTS
A total of 878 patients (580 men; mean age, 63 years) were included, with overlooked basal lung metastases confirmed in 13 patients (1.5%). The AI exhibited an area under the receiver operating characteristic curve value of 0.911 for the identification of overlooked basal lung metastases. Patient- and lesion-level sensitivities of the AI system ranged from 69.2% to 92.3% and 46.2% to 92.3%, respectively. After a radiologist reviewed the AI results, the sensitivity remained unchanged. The false-positive rate and number of false-positive lesions per patient ranged from 5.8% to 27.6% and 0.1% to 0.5%, respectively. Radiologist reviews significantly reduced the false-positive rate (2.4%–12.6%; all P values < 0.001) and the number of false-positive lesions detected per patient (0.03–0.20, respectively).
CONCLUSION
The AI system could accurately identify basal lung metastases detected in abdominopelvic CT images that were overlooked by radiologists, suggesting its potential as a tool for radiologist interpretation.
CLINICAL SIGNIFICANCE
The AI system can identify missed basal lung lesions in abdominopelvic CT scans in patients with malignancy, providing feedback to radiologists, which can reduce the risk of missing basal lung metastasis.
Main points
• An artificial intelligence (AI) system for pulmonary nodule detection on computed tomography (CT) images can be utilized as a second reader after the radiologist’s interpretation, to identify overlooked pulmonary nodules.
• As a second reader, the AI may analyze images after the radiologist’s interpretation and provide feedback to the radiologist only when the AI suspects that the radiologist has overlooked a pulmonary nodule. In this scenario, the oversight of significant pulmonary nodules can be prevented without the need to review the AI results of all the examinations. In our study, the applied AI system could accurately identify basal lung metastases captured in abdominopelvic CT images that were overlooked by radiologists, suggesting its potential as a second reader after the radiologist’s interpretation.
• We believe that our study contributes significantly to the literature by highlighting the effectiveness of AI in improving the accuracy of interpreting abdominopelvic CT images in patients with malignancies. Additionally, it underscores the importance of AI as a second reader to reduce interpretation errors.
Abdominopelvic computed tomography (CT) is frequently performed in patients with cancer to evaluate various cancers of the abdominopelvic or extra-abdominopelvic organs. Lung metastasis frequently occurs in the advanced stages of various solid organ cancers, and abdominopelvic CT images inevitably capture the base of the lungs. Therefore, evaluating the presence of nodules suggestive of metastasis to the lung base is an important component in the interpretation of abdominopelvic CT scans in patients.1-4 However, in a busy clinical environment, a radiologist may pay relatively less attention to the basal lungs compared with the abdominal organs, which are the main targets of evaluation.4 Therefore, metastatic nodules in the basal lungs can be overlooked by interpreting radiologists, which may adversely affect a patient’s treatment policy decisions or prognosis, leading to a medicolegal dispute.
The automatic detection of pulmonary nodules on chest CT images is one of the most widely investigated topics in artificial intelligence (AI)-based medical image analysis. Various studies have reported the radiologist-level performance of AI and the enhanced performance of radiologists using AI for lung nodule detection on CT scans.5, 6 Based on these impressive results, commercial AI-based software medical devices have begun to be utilized in daily clinical practice as computer-aided detection (CAD) tools.7-10
In addition to its use as a CAD tool, AI’s utilization as a second reader–that is, for use in analyzing images after the radiologist’s interpretation and providing feedback to the radiologist in case of suspected interpretation errors–can be another attractive scenario for applying AI in daily practice.11-15 AI, as a second reader, can provide a safety net for radiologists against the risk of interpretation errors or medicolegal disputes without requiring the rigorous effort of scrutinizing the AI’s results following every examination. The detection of pulmonary nodules in the basal lungs, as acquired using abdominopelvic CT, can serve as a compelling scenario for employing an AI second reader.15 This is because it is beyond the primary focus of examination, yet carries a relatively high risk of interpretation errors, which could result in critical outcomes.
In consideration of the above, we aim to evaluate whether an AI system could detect metastatic pulmonary nodules in the basal lungs that have been overlooked by radiologists on abdominopelvic CT images.
Methods
This single-center, retrospective, diagnostic cohort study was approved by the Seoul National University Hospital Institutional Review Board on January 5, 2022 (approval number: 2112-142-1284). During the approved research period, patient data required for this study were accessed for research purposes. The requirement for informed consent was waived by the institutional review board.
Patients
Patients were consecutively included in a single tertiary referral institution in South Korea with the following criteria: a) patients diagnosed with solid organ cancers (International Statistical Classification of Diseases and Related Health Problems, 10th revision, C00 to C75); b) patients who underwent abdominopelvic CT between March 1 and March 31, 2019; and c) abdominopelvic CT scans interpreted as negative for basal lung metastasis in the formal reports of radiologists, based on a manual review of unstructured radiological reports by a thoracic radiologist. Patients who underwent chest CT on the same day as abdominopelvic CT and those lost to follow-up within 3 years without a clinical diagnosis of lung metastasis were excluded (Figure 1).
The first CT examination was performed on patients who underwent CT more than once. For multiphase CT examinations, images that captured the largest portion of the basal lungs were included in the analyses.
Diagnosis of pulmonary metastasis
To confirm the clinical diagnosis of pulmonary metastasis in patients, one thoracic radiologist (E.J.H., with 5 years of experience as a faculty thoracic radiologist) reviewed the medical records and CT images (including the index abdominopelvic CT and follow-up chest and abdominopelvic CT images). Pulmonary lesions that were pathologically confirmed as metastases, as well as lesions with persistent growth on follow-up CT images and a clinical impression of metastasis, were regarded as pulmonary metastases. Pulmonary lesions that were stable for >3 years were considered benign. All individual pulmonary metastases present on the index abdominopelvic CT images but not documented in the radiologist’s report were recorded as “overlooked metastases.”
Artificial intelligence system
To detect pulmonary metastases in the basal lungs captured by abdominopelvic CT, an AI model based on a commercialized deep-learning-based CAD system (AVIEW Lung Nodule CAD, Coreline Soft, Seoul, Korea) was used. The CAD system was designed to detect pulmonary nodules in chest CT images and was approved for clinical use in Korea as an assistant tool for physicians in interpreting chest CT scans.
Since the original CAD system was optimized for low-dose chest CT images for lung cancer screening, the performance of the AI model may degrade when used for detecting pulmonary metastasis. Therefore, additional training of the AI model was conducted to optimize its performance in detecting small metastatic pulmonary nodules. A total of 3,558 CT scans were conducted, with 21,469 clinically diagnosed pulmonary metastases from a single institution (the same institution as where the present study was conducted). All of the abdominopelvic CT images were analyzed using an additionally trained AI model. Each pulmonary nodule was annotated by drawing three-dimensional bounding boxes on the CT images, along with a probability score (between 0 and 1) for the presence of a lesion (Figures 2-5). Then, these annotated CT images were used for the existing AI model for the original CAD system.
Radiologist’s evaluation of artificial intelligence findings
All abdominopelvic CT images with corresponding AI results were reviewed by a fellowship trainee in thoracic radiology (H.S.C., 1st year of fellowship training) who was blinded to the diagnosis of pulmonary metastasis. The radiologist classified all lesions identified by the AI into three groups: those with the potential for pulmonary metastasis, clearly benign lesions, and pseudo-lesions. Subsequently, the radiologist checked the diagnoses of pulmonary metastasis to confirm that the lesions detected by AI were overlooked pulmonary metastases and classified the individual AI-detected lesions as either true-positives or false-positives.
Performance metrics and statistical analysis
First, the discriminative performance of the AI model in identifying patients with overlooked metastases was evaluated using an area under the receiver operating characteristic curve (AUC-ROC) analysis. Subsequently, the performance and efficacy of the AI model were evaluated using metrics at threshold probability scores of 0.4, 0.5, 0.6, and 0.7.
• Patient-level sensitivity = number of patients with true-positive detection of overlooked metastases/number of patients with overlooked metastases.
• Patient-level false-positive rate = number of patients with false-positive detection of overlooked metastases/number of patients without overlooked metastases.
• Patient-level positive predictive value (PPV) = number of patients with true-positive detection of overlooked metastases/number of patients with positive AI results.
• Lesion-level sensitivity = number of true-positive detections of overlooked metastases/number of all overlooked metastases.
• Number of false-positive lesions per patient = number of false-positive detections of overlooked metastases/number of patients.
• Lesion-level PPV = number of true-positive detections of overlooked metastases/number of all lesions detected by AI.
All metrics were obtained for both the AI results and the radiologist’s review of the AI results (following the exclusion of clearly benign lesions or pseudo-lesions). The performance metrics of the AI results and the radiologist’s review of the AI results were compared using McNemar’s tests, chi-squared tests, and paired t-tests.
Decision curve analysis was conducted to evaluate the net benefit of using the AI tool as a second reader for detecting overlooked pulmonary metastasis, considering the benefit of true-positive results and the cost of false-positive results.
Statistical analysis
All statistical analyses were performed using MedCalc statistical software (MedCalc Software Ltd, Ostend, Belgium, 22.006 version). Statistical significance was set at P < 0.05.
Results
Patient characteristics
A total of 878 abdominopelvic CT images from 878 patients (580 men; mean age ± standard deviation: 62 ± 11 years) were included in the study (Figure 1). The most common primary malignancy was hepatocellular carcinoma (411, 47%), followed by stomach cancer (169, 19%) and colorectal cancer (96, 11%). A total of 707 CT examinations (81%) were obtained after the administration of intravenous contrast media. Table 1 presents the demographic information of the patients and their CT imaging characteristics.
Sixty-nine (7.8%) patients were diagnosed with lung metastases within 3 years of an abdominopelvic CT, including 5 patients who had already been diagnosed with lung metastases at the time of the CT. In a retrospective evaluation of abdominopelvic CT images, 13 (1.5%) patients had pulmonary metastases that were overlooked during interpretation. Of these 13 patients, 3 had already been diagnosed with lung metastases at the time of the CT. For the other 10 patients, the time interval between the abdominopelvic CT with overlooked lung metastases and the clinical diagnosis of lung metastasis was 141 days (interquartile range, 78–195 days).
Performance of the artificial intelligence system
For the discrimination of CT examinations with and without overlooked pulmonary metastases, the AI system exhibited an AUC-ROC value of 0.911 [95% confidence interval (CI), 0.890–0.929; Figure 6]. The results of the AI analyses and their performances for different thresholds are listed in Table 2 and Table 3. At the lowest threshold (0.4), the AI system detected 475 lesions (0.54 per examination) in 251 patients (positive rate, 28.7%). In contrast, it detected 100 lesions (0.11 per examination) in 59 (positive rate, 6.7%) patients at the highest threshold (0.7). The sensitivities of the AI system for the identification of patients with overlooked metastases were 92.3% (12/13; 95% CI, 64.0%–99.8%) at the lowest threshold and 69.2% (9/13; 95% CI, 38.6%–90.9%) at the highest threshold. Correspondingly, the patient-level false-positive rates ranged from 5.8% (50/865; 95% CI, 4.3%–7.6%) to 27.6% (239/865; 95% CI, 24.7%–30.7%), and the PPVs ranged from 4.8% (12/251; 95% CI, 2.5%–8.2%) to 15.3% (9/59; 95% CI, 7.2%–27.0%). The accuracy of the AI system ranged from 72.7% (638/878; 95% CI, 69.6%–75.6%) to 93.8% (824/878; 95% CI, 92.1%–95.4%).
Among 26 overlooked pulmonary metastases in eight patients, the sensitivities of the AI system were 92.3% (24/26; 95% CI, 74.5%–99.1%) at the lowest threshold and 46.2% (12/26; 95% CI, 26.6%–66.6%) at the highest. Correspondingly, the number of false-positive detections per examination ranged from 0.10 (88/878; 95% CI, 0.03–0.17) to 0.51 (451/878; 95% CI, 0.35–0.69), and the PPVs ranged from 5.1% (24/475; 95% CI, 3.3%–7.4%) to 12.0% (12/100; 95% CI, 6.4%–20.0%).
In the decision curve analysis, using the AI system as a second reader for detecting overlooked pulmonary metastases exhibited a higher net benefit than the default scenario without AI when the risk threshold was ≤3.7% (Figure 7). In other words, using the AI would be beneficial if the ratio of the cost from false-positive results to the benefit from true-positive results is ≤3.7:96.3 (1:26).
Review of the artificial intelligence results by the radiologist
Following the review of the AI results by the radiologist, 57.9% (275/475) of the lesions detected by the AI were regarded as false-positive detections at the lowest threshold, while 65.0% (65/100) were regarded as false-positive detections at the highest threshold. As a result, the positivity rate after the radiologist’s review was 13.8% (121/878) at the lowest threshold and 3.4% (30/878) at the highest threshold.
The sensitivities in the identification of patients with overlooked metastases were 92.3% (12/13; 95% CI, 64.0%–99.8%) at the lowest threshold and 69.2% (9/13; 95% CI, 38.6%–90.9%) at the highest threshold, consistent with the initial analyses by the AI. Meanwhile, the patient-level false-positive rates ranged from 2.4% (21/865; 95% CI, 1.5%–3.7%) to 12.6% (109/865; 95% CI, 10.5%–15.0%), representing a significant reduction compared with the initial analyses by the AI (all p < 0.001). Additionally, the patient-level PPVs ranged from 9.9% (12/121; 95% CI, 5.2%–16.7%) to 30.0% (9/30; 95% CI, 14.7%–49.4%) (Table 2) and were increased from the initial analyses by the AI, although the difference was not statistically significant. The accuracy ranged from 87.5% (768/878; 95% CI, 85.1%–89.6%) to 97.2% (853/878; 95% CI, 95.8%–98.2%). The accuracies exhibited significant improvement compared with the initial analyses by the AI (all P < 0.001).
The lesion-level sensitivities after the radiologist’s review also remained similar to those following the initial analyses by the AI [92.3% (24/26; 95% CI, 74.5%–99.1%) at the lowest threshold; 46.2% (12/26; 95% CI, 26.6%–66.6%) at the highest threshold]. Meanwhile, the number of false-positive detections per examination ranged from 0.03 (23/878; 95% CI, 0.02–0.04) to 0.20 (176/878; 95% CI, 0.17–0.25), representing a significant reduction compared with the initial analyses by the AI (all P < 0.001). In addition, the lesion-level PPVs exhibited a significant increase compared with the initial analyses by the AI [P ≤ 0.001; 12.0% (24/200; 95% CI, 7.8%–17.3%) at the lowest threshold; 34.3% (12/35; 95% CI, 19.1%–52.2%) at the highest threshold].
Table 4 displays the patterns of false-positive detections by the AI system. The most common cause of false-positive detection was pulmonary nodules with the possibility of metastasis, based on the radiologist’s review. Among clearly benign lesions that were regarded as false-positive detections by the radiologist’s review, findings of infection or inflammation were the most common causes of false-positive detections, followed by calcified nodules.
Clinical significance
The AI system may identify missed basal lung lesions in abdominopelvic CT scans in patients with malignancy, providing feedback to radiologists, which can reduce the risk of missing basal lung metastasis.
Discussion
An AI system for pulmonary nodule detection on CT images can be utilized as a second reader after the radiologist’s interpretation to prevent radiologists from overlooking clinically relevant pulmonary nodules. In the present study, we used an AI system to detect metastatic pulmonary nodules in the basal lungs captured by abdominopelvic CT images that were overlooked by radiologists. The results showed that the AI system could identify CT images with overlooked pulmonary metastases, with an AUC-ROC value of 0.911 and maximum patient-level and lesion-level sensitivity of 92.3%, respectively. Although the AI generated several false-positive detections (maximum false-positive rate of 27.6%, 0.51 false-positive detections per patient), the radiologist’s review of the AI results could effectively reduce the rate and number of false-positive detections (maximum false-positive rate of 12.6%, 0.20 false-positive detections per patient; P < 0.001, respectively).
Multiple studies have reported good performance of AI in the detection of pulmonary nodules on chest CT images.16-19 In this study, the performance of the AI reached a level similar to that of radiologists. However, considering that AI cannot replace a radiologist’s interpretation, its efficacy needs to be investigated based on its method of utilization. Since the most widely accepted methods of utilization involve CAD tools,20-25 many studies have reported that AI can improve the performance of radiologists in lung nodule detection.5, 6, 15-17 In addition to the use of AI as a CAD tool, several other utilization methods may also be feasible.15 For instance, one promising method is its use as a second reader. In this context, the AI may analyze images after the radiologist’s interpretation and provide feedback to the radiologist only when the AI suspects that the radiologist has overlooked a pulmonary nodule. In this scenario, the oversight of significant pulmonary nodules can be prevented without the need to review the AI results of all the examinations.
We performed decision curve analyses to evaluate the net benefit of applying the AI system for true-positive and false-positive identifications. The scenario with AI as a second reader showed a higher net benefit than the scenario without AI when the ratio between the harm of false-positive interpretations to the benefit of true-positive interpretations is ≤1:26. In most clinical situations, overlooking pulmonary metastases could have significant consequences, potentially depriving the patient of timely systemic treatment. Meanwhile, false-positive detections by AI may lead to a review by the radiologist, and the associated costs would be relatively much smaller compared with the risks of overlooking pulmonary metastases. Therefore, we believe that using the AI as a second reader would be a reasonable scenario.
In our study, an AI system was applied to the abdominopelvic CT scans of patients with cancer who were interpreted as negative for basal lung metastasis. In a retrospective evaluation of available follow-up examinations, overlooked pulmonary metastases were identified in 1.5% of patients, a frequency that should not be ignored. In this context, the AI could accurately discriminate between CT images with and without overlooked pulmonary metastases (AUC-ROC, 0.911). Furthermore, at a sensitive operating threshold, the AI could identify most CT scans with overlooked metastases (sensitivity: 92.3%). Notably, the identification of false-negative interpretations by radiologists using AI has been investigated in the field of chest radiography. Specifically, Nam et al.26 and Jang et al.27 reported that AI can identify lung cancers overlooked by radiologists on chest X-rays. In addition, Hwang et al.28 reported that AI can identify various clinically relevant abnormalities on chest radiographs that were previously interpreted as normal.
Because benign pulmonary nodules and pulmonary metastases are often difficult to differentiate, false-positive detection by AI is inevitable. When used as a second reader,15 false-positive detection may lead to unnecessary feedback to the radiologist, followed by reinterpretation by the radiologist. In our study, the maximum false-positive rate was 27.6%, indicating that the AI may generate false-positive feedback in 27.6% of CT images without overlooking metastases. Based on the review of the AI results by a radiologist, more than half of the AI detections were regarded as clearly benign nodules (findings of pulmonary infection and calcified nodules). Notably, the radiologist’s review was effective because it significantly reduced the rate of false positives while maintaining a similar sensitivity for metastasis. The results also suggest that further improvements in AI performance may reduce the false-positive rate and the frequency of unnecessary reinterpretation by radiologists.
Pulmonary metastases and benign pulmonary nodules are often indistinguishable, even when evaluated by a radiologist. Therefore, as expected, there were considerable false-positive detections even after the radiologist’s review (maximum false-positive rate: 12.6%). Moreover, the identification of benign nodules may lead to the requirement of chest CT examinations for further evaluation or follow-up of the pulmonary nodules. Considering that all patients were under follow-up for malignancies, we believe that additional chest CT scans may not significantly harm the patients.
Our study has several limitations. First, since our study was conducted at a single tertiary medical institution, the reproducibility of our results remains uncertain. Future studies may be required to confirm the reproducibility of our results in other clinical situations. Second, although we consecutively included 878 abdominopelvic CT scans, the absolute number of overlooked pulmonary metastases is quite small (n = 13), limiting the statistical power. A multicenter study with a larger sample size may be required to confirm the efficacy of AI as a second reader. Third, in this study, AI was retrospectively applied to abdominopelvic CT scans. Therefore, the practical efficacy of AI systems remains unknown. A prospective study following the integration of AI into the workflow may be required to investigate its real-world efficacy. Finally, the effect of AI beyond the detection of overlooked metastases, including its effects on patient outcomes and changes in treatment decision-making, remains unknown.
In conclusion, the applied AI system could accurately identify basal lung metastases captured in abdominopelvic CT images that were overlooked by radiologists, suggesting its potential as a second reader after the radiologist’s interpretation. Further prospective studies are warranted to investigate the real-world efficacy of AI as a second reader as well as the impact of AI beyond the detection of metastases.