Skin cancers, notably melanoma, pose a significant health risk, with rising incidence rates and mortality rates. Early detection through screening is crucial, and the Department of Veterans Affairs has prioritized improved melanoma screening. Teledermatology, with digital dermatoscopy, offers a promising avenue for initial lesion assessment, prompting the exploration of algorithmic screening using deep learning. This paper investigates the efficacy of incorporating melanin index (MI) and erythema index (EI) as additional features, along with the conversion of images to the HSV color space, for use with deep learning models in melanoma classification. Considering the distinct clustering of human skin color in various color spaces, our study aims to explore the advantages of alternative color representations. Building on prior work and organized into two phases, our experiments utilize a diverse set of model architectures trained on images of varying sizes. Phase 1 aligns with the 2020 SIIM-ISIC Melanoma Classification Challenge, while Phase 2 involves an expanded dataset and more robust metrics. Despite optimistic outcomes in the earlier phase, our findings reveal no significant performance improvement when incorporating MI or EI in deep learning models for melanoma detection. This study contributes valuable insights for refining deep learning approaches in dermatoscopy, offering a cautionary note on the efficacy of specific features and of color space transformation.
Many published studies use deep learning models to predict COVID-19 from chest x-ray (CXR) images, often reporting high performances. However, the models do not generalize well on independent external testing. Common limitations include the lack of medical imaging data and disease labels, leading to training on small datasets or drawing classes from different institutions. To address these concerns, we designed an external validation study of deep learning classifiers for COVID-19 in CXR images including XCAT phantoms as well. We hypothesize that a simulated CXR image dataset obtained from the XCAT phantom allows for better control of the dataset including pixel-level ground truth. This setup allows for multiple advantages: First, we can validate the publicly available models using simulated chest x-rays. Secondly, we can also address clinically relevant questions with this setup such as effect of dose levels and sizeof COVID-19 pneumonia in performance of deep learning classifier. We
Research studies of artificial intelligence models in medical imaging have been hampered by poor generalization. This problem has been especially concerning over the last year with numerous applications of deep learning for COVID-19 diagnosis. Virtual imaging trials (VITs) could provide a solution for objective evaluation of these models. In this work utilizing the VITs, we created the CVIT-COVID dataset including 180 virtually imaged computed tomography (CT) images from simulated COVID-19 and normal phantom models under different COVID-19 morphology and imaging properties. We evaluated the performance of an open-source, deep-learning model from the University of Waterloo trained with multi-institutional data and an in-house model trained with the open clinical dataset called MosMed. We further validated the model's performance against open clinical data of 305 CT images to understand virtual vs. real clinical data performance. The open-source model was published with nearly perfect performance on the original Waterloo dataset but showed a consistent performance drop in external testing on another clinical dataset (AUC=0.77) and our simulated CVIT-COVID dataset (AUC=0.55). The in-house model achieved an AUC of 0.87 while testing on the internal test set (MosMed test set). However, performance dropped to an AUC of 0.65 and 0.69 when evaluated on clinical and our simulated CVIT-COVID dataset. The VIT framework offered control over imaging conditions, allowing us to show there was no change in performance as CT exposure was changed from 28.5 to 57 mAs. The VIT framework also provided voxel-level ground truth, revealing that performance of in-house model was much higher at AUC=0.87 for diffuse COVID-19 infection size <2.65% lung volume versus AUC=0.52 for focal disease with <2.65% volume. The virtual imaging framework enabled these uniquely rigorous analyses of model performance, which would be impracticable with real patients.
Purpose: Accurate classification of COVID-19 in chest radiographs is invaluable to hard-hit pandemic hot spots. Transfer learning techniques for images using well-known convolutional neural networks show promise in addressing this problem. These methods can significantly benefit from supplemental training on similar conditions, considering that there currently exists no widely available chest x-ray dataset on COVID-19. We evaluate whether targeted pretraining for similar tasks in radiography labeling improves classification performance in a sample radiograph dataset containing COVID-19 cases.Approach: We train a DenseNet121 to classify chest radiographs through six training schemes. Each training scheme is designed to incorporate cases from established datasets for general findings in chest radiography (CXR) and pneumonia, with a control scheme with no pretraining. The resulting six permutations are then trained and evaluated on a dataset of 1060 radiographs collected from 475 patients after March 2020, containing 801 images of laboratory-confirmed COVID-19 cases.Results: Sequential training phases yielded substantial improvement in classification accuracy compared to a baseline of standard transfer learning with ImageNet parameters. The test set area under the receiver operating characteristic curve for COVID-19 classification improved from 0.757 in the control to 0.857 for the optimal training scheme in the available images.Conclusions: We achieve COVID-19 classification accuracies comparable to previous benchmarks of pneumonia classification. Deliberate sequential training, rather than pooling datasets, is critical in training effective COVID-19 classifiers within the limitations of early datasets. These findings bring clinical-grade classification through CXR within reach for more regions impacted by COVID-19.
Medical images can vary due to differences in imaging equipment and conditions. This variability negatively can impact the consistency and accuracy of diagnostic processes. Hence, it is critical to decrease the variability in image acquisition to achieve consistent analysis, both visually and computationally. There are three main categories that can contribute to image variability: equipment, acquisition protocol, and image processing. The purpose of this study was to employ a deep neural network (DNN) method to reduce variability in radiography due to these factors. Given radiography images acquired with different settings, the network was set up to return harmonized images, targeting a reference standard. This was implemented via a virtual imaging trial platform, utilizing an X-ray simulator (DukeSim) and 77 anthropomorphic, computational phantoms (XCAT). The phantoms were imaged at 120 kV at four different dose levels with DukeSim emulating a typical flat panel radiography system. The raw radiography images were then post-processed using a commercial algorithm at eight different settings resulting in a total of 2464 radiographs. For each XCAT, the reference standard was defined as the noise-less and scatter-less radiography image with image processing parameters based on a radiologist’s preference. The simulated images were then used to train and test the DNN. The test set resulted an average structural similarity index greater than 0.84, and an 𝐿1 error less than 0.02, indicating the harmonized images were visually and analytically more consistent and closer to the desired reference appearance. The proposed method has great potential to provide for effective and uniform interpretation of radiographic images.
As computer-aided diagnostics develop to address new challenges in medical imaging, including emerging diseases such as COVID-19, the initial development is hampered by availability of imaging data. Deep learning algorithms are particularly notorious for performance that tends to improve proportionally to the amount of available data. Simulated images, as available through advanced virtual trials, may present an alternative in data-constrained applications. We begin with our previously trained COVID-19 x-ray classification model (denoted as CVX) that leveraged additional training with existing pre-pandemic chest radiographs to improve classification performance in a set of COVID-19 chest radiographs. The CVX model achieves demonstrably better performance on clinical images compared to an equivalent model that applies standard transfer learning from ImageNet weights. The higher performing CVX model is then shown to generalize effectively to a set of simulated COVID-19 images, both quantitative comparisons of AUCs from clinical to simulated image sets, but also in a qualitative sense where saliency map patterns are consistent when compared between sets. We then stratify the classification results in simulated images to examine dependencies in imaging parameters when patient features are constant. Simulated images show promise in optimizing imaging parameters for accurate classification in data-constrained applications.
Imaging phantoms are test patterns used to measure image quality in computer tomography (CT) systems. A new phantom platform (Mercury Phantom, Gammex) provides test patterns for estimating the task transfer function (TTF) or noise power spectrum (NPF) and simulates different patient sizes. Determining which image slices are suitable for analysis currently requires manual annotation of these patterns by an expert, as subtle defects may make an image unsuitable for measurement. We propose a method of automatically classifying these test patterns in a series of phantom images using deep learning techniques. By adapting a convolutional neural network based on the VGG19 architecture with weights trained on ImageNet, we use transfer learning to produce a classifier for this domain. The classifier is trained and evaluated with over 3,500 phantom images acquired at a university medical center. Input channels for color images are successfully adapted to convey contextual information for phantom images. A series of ablation studies are employed to verify design aspects of the classifier and evaluate its performance under varying training conditions. Our solution makes extensive use of image augmentation to produce a classifier that accurately classifies typical phantom images with 98% accuracy, while maintaining as much as 86% accuracy when the phantom is improperly imaged.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.