Careful reading boosts virtual colonoscopy results

It is observer error, not technical inadequacy, that produces the lion's share of missed lesions at virtual colonoscopy. A careful review of the 2004 Rockey trial results suggests that better training and careful reading could have significantly improved the trial's disappointing results.

Large virtual colonoscopy trials to date have been characterized by extremely variable sensitivity results, Taral Doshi and his colleagues noted in the latest issue of Radiology (July 2007, Vol. 244:1, pp. 165-173). Most studies are not further analyzed, therefore the causes of false interpretations are not generally well understood.

The Rockey trial, performed at Duke University Medical Center in Durham, NC, and other facilities in 1999 and 2000 and published in 2004, examined 614 subjects at elevated risk of colorectal polyps using virtual colonoscopy (VC or CT colonography [CTC]), conventional colonoscopy, and air-contrast barium enema (ACBE) to assess the adequacy of each method for detecting colorectal polyps and cancer.

VC was performed without stool tagging using either primary 2D or 3D interpretation. But correlation of the results was state-of-the-art. Segmental unblinding was used to reconcile the results from all three methods to ensure that lesions were not missed. VC's sensitivity for polyps 10 mm and larger in diameter was only 59%, for reasons not completely understood.

"We hypothesized that false-negative interpretations at CT colonography are largely due to observer-related error," wrote Doshi and his colleagues Dr. David Rusinak, Dr. Don Rockey, Dr. Abraham Dachman, Dr. Robert Halvorsen, and Kenji Suzuki, Ph.D. The researchers were from the University of Chicago in Illinois; Virginia Commonwealth University in Richmond, VA; and University of Texas Southwestern Medical Center in Dallas.

In their retrospective review, Doshi and his team performed an initial unblinded review of all CTC image data to generate reconciliation reports for all false-negative polyp candidates 6.0 mm in diameter and larger.

"After reports from the original study and reconciliation reports were reviewed, errors were classified as observer (measurement or perceptual) errors, technical errors (e.g., those caused by insufficient distension, fluid), or not reconcilable," they wrote.

Sensitivity values both per polyp and per patient were calculated from adenomas 6.0 mm or larger in the original dataset, and again by assuming elimination of technical and observer errors.

Two experienced readers performed an initial reading of all false-negative results in the original study, using primary 2D analysis, followed by 3D problem-solving if no polyp match was found. If a polyp was present, the reader was asked to report if it was prospectively diagnosable and measure its size.

Discrepant findings "were reconciled by using data from the original study and by re-examination of the specific and neighboring colonic segments and individual polyp candidates," the authors explained. "The experienced readers made the final determination in consensus as to whether a lesion that was identified could reasonably correspond to the lesion missed in the original study, and as to whether the lesion was prospectively diagnosable."

While confidence levels were not assigned, normative interpretation thresholds were used to guide their reading, Doshi and colleagues explained. Specific rules were used to determine the largest diameter of each finding.

Of the 228 available polyps, the results showed that 147 were adenomas. Per-patient sensitivity was 70% and 68% at the 10-mm and 6-mm thresholds, respectively.

"When all histologic types were considered, 114 polyps were false-negative findings," the authors reported. "Of these, 53% (60 of 114) were attributed to observer-related errors, and 26% were attributed to errors classified as technical."

After a detailed reconciliation of the individual polyps in an effort to exclude any potentially correctable observer error, the per-polyp sensitivity of lesions 10 mm and larger increased to 93%, with per-patient sensitivity of 91%.

After accounting for observer and technical errors, eight (5.4%) of 147 adenomas 6.0 mm and larger remained that could not be detected, they explained. If all of the technical and observer errors were considered true-positive findings, the sensitivity for adenomas 6 mm and larger would have been 95% both per polyp and per patient.

Polyp detection has improved significantly in the years since the Rockey trial was performed, the authors noted. Faster scanners largely eliminating breathing artifacts, as well as thinner collimation and reconstruction intervals, have improved the conspicuity of submillimeter polyps.

Some of these improvements postdate the Rockey trial, they added. But they said its missed polyps could not be explained by technical factors alone, leading to their search for the cause of missed detections.

Although more advanced software was used to reinterpret the CT data, the original data, which could not have improved for the study, was far more important a factor, the authors wrote. 

Size measurement errors in the original study did matter, they noted. "In this dataset we found seven polyps that were accurately identified but were undermeasured during initial prospective CT colonographic evaluation," Doshi and his team wrote. "These measurement errors led to false-negative results because of the size discrepancy with the associated colonoscopic findings," thus contributing to lowered sensitivity of VC.

Still, observer perpetual error was a much larger contributor of interpretation errors than polyp measurement mistakes, and was thus the single greatest source of error. Of the 147 adenomas and cancers 6.0 mm or larger, 31 were false-negative findings due to observer perceptual error and were visible in retrospect. The per-patient sensitivity for clinically relevant lesions 6.0 mm or larger increased by almost 20% after observer error was eliminated in VC interpretation.

"This finding suggests that increasing reader diagnostic performance is a necessary step toward making CT colonography a more viable screening method," they wrote. "One potential approach to improving reader performance is the use of computer-aided detection as a kind of 'spell checker' to present the radiologist with polyp candidates for evaluation. Additionally, improved training for readers that stresses the pitfalls of interpretation at CT colonography may help increase reader sensitivity."

"With this information, we hope that efforts in improving the sensitivity of CT colonography can be targeted more effectively, emphasizing the need for careful training and reading methods, and supporting the potential for improved sensitivity with computer-aided detection to direct attention to polyp candidates," the team concluded.

By Eric Barnes staff writer
July 26, 2007

Related Reading

VC CAD improves results for readers at all levels, April 7, 2006

VC experts have an edge over less experienced readers, March 8, 2005

Virtual colonoscopy: Ready already for those who know how, April 15, 2005

Reading method, insufflation affect polyp measurements, November 4, 2005

Duke VC trial results disappointing, November 1, 2004

Copyright © 2007

Page 1 of 654
Next Page