VC CAD face-off reveals system differences By Eric Barnes AuntMinnie.com staff writer November 27, 2006
CHICAGO - Pitting three colon computer-aided detection (CAD) systems against each other and waiting for a winner to emerge -- it sounds like fun, at least for the winner. But things aren't so simple when the CAD systems being tested have completely different settings, different operating criteria, and find a completely different set of polyps in the same dataset.
At Sunday's virtual colonoscopy CAD session of the 2006 RSNA meeting, Dr. Patrick Rogalla presented the results of a study led by his colleague Dr. Lasse Krug from Berlin's Institut für Radiologie and Charité Hospital.
The researchers sought to assess sensitivity and average false-positive rates for CAD as a second reader in three prerelease CAD systems:
The task was complicated by very different operating characteristics among the systems, however, Rogalla said in his presentation.
"There was a time point at which all three systems had to be up and running, all the cases had to be loaded at the same time, all the readers had to be there at the same time, and we requested permission from all the companies for their software versions although they are not released" in their final versions, he said. "The systems have different implementations and different operating points."
For example, the Medicsight implementation to the Vitrea workstation requires preselection of an expected sphericity level, which the researchers handled by running five arbitrary sphericity levels in addition to the available default level (0%-100%, 20%-100%, 40%-100%, 60%-100%, and 80%-100%), Rogalla said. The ViewForum analyzes the technical probability of each polyp and produces a list of confidence levels, which the researchers arbitrarily divided into five groups. The PEV system acts as a CAD, marking polyps so they can be viewed. It offers no user interaction or probability levels.
"We utilized as a reference standard a consensus of experts for the diagnosis based on four independent criteria," Rogalla said. "We selected 35 patients; the selection criteria were full distension of entire colon, fecal tagging, and at least one polyp seen by the radiologists," whose experience in virtual colonoscopy ranged from 2,000 to 3,000 cases each.
Following a fecal tagging regimen (2 x 30 mL iodinated contrast), purgative bowel cleansing with double-dose Phospho-soda prep (Fleet Pharmaceuticals, Lynchburg, VA), and automated CO2 insufflation (ProtoCO2l, E-Z-EM, Lake Success, NY), the subjects were scanned in the prone and supine positions with 1-mm collimation and 0.8-mm slice thickness. The three CT scanners included a Somatom Sensation 16 and 64 (Siemens) and Aquilion 64 (Toshiba America Medical Systems, Tustin, CA).
In all, the researchers found 57 polyps ranging from 3-40 mm in diameter (average 12 mm). Findings larger than 10 mm were confirmed by optical colonoscopy, Rogalla said, with totals based on consensus reading, optical colonoscopy, or surgical resection.
The Vitrea/Medicsight implementation yielded sensitivities of 59% to 90% over the five sphericity levels; false positives per case increased with sensitivity over the different sphericity levels from six to 27, with the fewest false positives at the default setting. "No additional true polyps were found on this system," he said.
Over the five confidence levels, the Philips system yielded sensitivities from 59% to 91%. False positives increased from zero to five as confidence levels decreased, and this system also failed to yield any additional polyps not found by the radiologists.
The Siemens system, which did not permit any changes in settings, yielded the lowest sensitivity at 75%, but also the lowest average false positives at three per case.
"CAD performance is very heterogeneous, and strongly depends on the algorithm used," Rogalla said. "fROC curves are, in my view, a positive way to compare the systems, although they have different limitations and interactions."
He cautioned that there were no known flat polyps in the cohort, so these sensitivities could not be assessed. The results will surely depend on the particular datasets used. Interestingly, he said, the three configurations "all found completely different polyps."
Given a choice, Rogalla said, "I would run all three systems" on every dataset.
By Eric Barnes
AuntMinnie.com staff writer
November 27, 2006