Open Access
29 April 2021 Convolutional neural networks for Alzheimer’s disease detection on MRI images
Amir Ebrahimi, Suhuai Luo, Alzheimer’s Disease Neuroimaging Initiative
Author Affiliations +
Abstract

Purpose: Detection of Alzheimer’s disease (AD) on magnetic resonance imaging (MRI) using convolutional neural networks (CNNs), which is useful for detecting AD in its preliminary states.

Approach: Our study implements and compares several deep models and configurations, including two-dimensional (2D) and three-dimensional (3D) CNNs and recurrent neural networks (RNNs). To use a 2D CNN on 3D MRI volumes, each MRI scan is split into 2D slices, neglecting the connection among 2D image slices in an MRI volume. Instead, a CNN model could be followed by an RNN in a way that the model of 2D CNN + RNN can understand the connection among sequences of 2D image slices for an MRI. The issue is that the feature extraction step in the 2D CNN is independent of classification in the RNN. To tackle this, 3D CNNs can be employed instead of 2D CNNs to make voxel-based decisions. Our study’s main contribution is to introduce transfer learning from a dataset of 2D images to 3D CNNs.

Results: The results on our MRI dataset indicate that sequence-based decisions improve the accuracy of slice-based decisions by 2% in classifying AD patients from healthy subjects. Also the 3D voxel-based method with transfer learning outperforms the other methods with 96.88% accuracy, 100% sensitivity, and 94.12% specificity.

Conclusions: Several implementations and experiments using CNNs on MRI scans for AD detection demonstrated that the voxel-based method with transfer learning from ImageNet to MRI datasets using 3D CNNs considerably improved the results compared with the others.

1.

Introduction

Alzheimer’s disease (AD) is a fatal irreversible, progressive neurodegenerative disorder that causes brain cells to waste away and die. Typically, AD begins in middle/old age, with protein accumulation inside/around neurons. The most prevalent and one of the early symptoms of AD is problems remembering new things, since AD-related changes usually begin in the brain parts charged with learning. Symptoms include, but are not limited to, behavioral changes; deep confusion regarding time, events, and places; and doubts about family members and friends. They usually develop slowly and worsen over time, leading to a continual deterioration in memory and difficulty in swallowing, talking, and walking.13

AD is the prevalent dementia type, which includes about 60% to 80% of the total cases of dementia.1 Dementia refers to severe loss of memory and other cognitive capabilities that interfere with daily life. It is estimated to have affected about 50 million people in the world and 459,000 Australians in 2020.1,4 The statistics are estimated to almost double every 20 years.5 Dementia is the second leading cause of death among Australians, accounting for 15,016 deaths in 2019.6

Despite exploring various treatments for preventing or slowing AD, the success rate has been low, particularly in the latest phases of the disease.7 Studies indicate that AD-related changes in the brain might begin about 20 years before symptoms emerge.1 Therefore, there is a time gap that could be highly valuable to slowing AD’s progression. Early detection of AD extends the independence of patients for a longer period. Recent research may enable greater comprehension of the disease and improvement of new treatments.2,8

In practice, AD detection is based on checking brain scans, a clinical assessment, and asking questions of the patient and their relatives.9,10 This process is usually challenging because of the limited knowledge in identifying the parts of the brain affected by AD. Moreover, AD symptoms, like brain shrinkage, can also be observed in healthy, elderly normal control (NC) groups.11 In the last two decades, a wide range of studies has been performed to detect AD using artificial intelligence. Common classification algorithms in machine learning, such as neural networks and support vector machines (SVMs), have been applied to brain scans such as positron emission tomography (PET) and magnetic resonance imaging (MRI). Detecting AD using these algorithms is difficult for scholars because of the low image quality, issues of brain segmentation and preprocessing steps, the absence of a database with a sufficient number of subjects, and the complexity of medical images. Successful classification necessitates a robust power to distinguish specific features in similar brain images.12 In a systematic literature review,13 18 of 114 reviewed studies compared deep learning models with machine learning models. They all reported the superiority of the former. Therefore, in this paper, we only investigate and compare deep learning models.

The rise in the computation capacity of graphics processing units (GPUs) has supported the evolution of modern and innovative deep learning algorithms. As a subgroup of machine learning, deep learning models analyze data processing and pattern recognition in the human brain to solve complicated decision-making tasks. Deep learning approaches have enhanced intelligent systems in numerous areas.14 Research on medical images has been encouraged by deep learning methods in applications using two-dimensional (2D) natural images.15,16 Among deep learning models, convolutional neural networks (CNNs) have recently demonstrated revolutionized outcomes in disease detection and organ segmentation.17 In contrast to traditional machine learning methods, CNNs can merge three main steps of classification: feature extraction, feature selection, and classification. It was recently stated that CNNs are the most frequently used method—about 70%—for AD detection.13

MRI is the most extensively utilized biomarker for the detection of AD using deep learning. It has been used in more than 80% of AD detection studies in single-modal approaches.13 This paper plans to use MRI scans to classify AD patients from NCs using CNNs. We aim to use CNNs to uncover latent representations, discover relationships among slices of images, and recognize patterns related to AD in brain scans. Our research’s main contribution is to expand the idea of transfer learning from 2D images to three-dimensional (3D) MRI scans. Hence, learnable parameters from 2D CNNs are transferred to 3D CNNs. To begin, the related work is first outlined to present the structure of CNNs and the background of employing 2D and 3D CNNs in AD detection using neuroimaging. Next, different types of CNNs in 2D and 3D approaches to manage MRI volumes with or without transfer learning are examined in detail. Subsequently, our proposed deep model for introducing the concept of transfer learning to 3D CNNs is explained. Finally, experimental results are discussed, followed by the conclusion.

2.

Related Work

Given that the inputs are in the form of a vector, which is followed in many other algorithms using multi-layer perceptron, vectorization ruins the structure information of surrounding voxels or pixels in images. CNNs are regarded as the most outstanding deep models for analysis of images. The brain’s visual cortex is the inspirational source for CNNs. CNNs are designed to understand spatial information of images by stacking several convolutional layers to extract more abstract features.17 Further, in contrast to multi-layer neural networks, there are significantly fewer parameters in CNNs because of shared weights in convolutional layers and the presence of pooling layers.17

Despite the initial success of CNNs,18 their extensive application has not been realized until recent years—various novel approaches and computer systems have been developed to train them efficiently.17 CNNs attracted great attention after they succeeded in the ImageNet competitions, for which they were successfully employed for a classification task on a database with about one million images with 1000 various classes.19 This section aims to overview the basic concepts, architectures, and application of CNN models used in AD detection.

2.1.

CNN Architecture

Typical CNNs consist of several layers, including but not limited to convolutional layers, activation layers, pooling layers, fully connected layers, and a Softmax layer. A CNN is trained as follows: a forward step computes the loss cost between ground-truth labels and predicted outputs and then a backward step applies penalization on learnable parameters. The performance of the CNN depends mainly on the architecture of the layer and the filters’ settings, leading researchers to focus on developing different architectures to improve performance. Here the key elements in a CNN architecture are explained.

2.1.1.

Convolutional layers

A convolutional layer is the first and underlying layer in the architecture, in which the convolution of an input image with a kernel is conducted, and feature maps are produced. Discriminative scale/shift-invariant features of local areas in the image are extracted by the first convolutional layers of deep CNNs. The last convolutional layers enable task-specific classification through the extracted features. The key benefit of convolutional layers is the weight-sharing idea in the same feature map, which decreases parameters and leads to model simplicity.20

2.1.2.

Activation layers

A non-linear activation function such as a Sigmoid, Tanh, or ReLU typically follows convolutional layers to create a feature map corresponding to each filter. Introducing non-linearity allows models to learn complex representations. Old-style activation functions were in the form of a Sigmoid function, but this function’s saturation property leads learning algorithms to operate poorly in the neural networks’ training process. To solve this problem, ReLU turned out to be useful and popular. It was employed in most of the studies,13 although Sigmoid or Tanh use is still widespread. ReLU applies an element-wise activation function max(0,x). The training time using ReLU was reported to be considerably faster than that of Tanh and Sigmoid.21 However, to solve the issue of zero gradients in gradient-based learning algorithms, leaky ReLU was suggested with the output value of 0.01x for the input value of x<0 instead of forcing negative values to 0.22

2.1.3.

Pooling layers

A pooling layer is the third layer type that appears after convolutional layers. This layer performs the input feature map’s down-sampling via the replacement of each area with its average or maximum. Pooling helps reduce the parameter number in a network and, simultaneously, retain the most influential features. Average pooling was often used historically,23 but it has recently fallen out of favor compared with the max-pooling operation. Max pooling has been shown to have faster convergence and better classification performance; it can select superior invariant features and improve generalization.22,24

2.1.4.

Fully connected layers

The fourth type of layer is the fully connected layer with a performance similar to that of traditional neural networks. It contains a large portion of learnable parameters in a CNN.20 Following the previously mentioned chain of layers, feature maps are flattened to a vector, in which they are no longer spatially positioned. Then a fully connected layer is added. Fully connected layers are connected to the previous layer’s feature elements. They help to understand non-linear associations among the local features extracted by convolutional layers.

2.1.5.

Softmax layer

The Softmax layer categorizes subjects by selecting the highest anticipated probabilities of each category. The vector’s largest value is underlined by the Softmax function, while the lower values are considerably suppressed.

2.2.

CNN Models in the Literature

Deep learning approaches in AD detection using neuroimaging can be divided into two types: unsupervised and supervised.13,20 Unsupervised models try to obtain an abstract representation from images. A typical unsupervised method comprises an auto-encoder (AE) for feature extraction and an SVM as the classifier. Nevertheless, supervised methods are more prevalent in the literature than are unsupervised methods. Feature extraction and classification are combined as a single entity in supervised approaches. As discussed previously, CNNs are the most used model for AD detection in deep intelligent systems.

The preference in some studies is to design the CNN structure.25,26 Nevertheless, employing notable structures such as LeNet,18 AlexNet,27 CaffeNet,28 VGGNet,29 GoogLeNet,30 ResNet,31 DenseNet,32 and Inception33 can be helpful. According to previous studies and ImageNet competitions, these models have been successful in image classification.20 This section explains the employed 2D/3D CNN models in the literature and the weight initialization process.

2.2.1.

2D CNNs

CNNs were initially proposed to recognize patterns from 2D images. Although 3D CNNs can classify 3D brain scans, they need many parameters compared with 2D CNNs.34,35 Therefore, using 2D CNNs is more common than using 3D for AD detection using 3D brain scans.13 By dividing the MRI volumetric data into 2D image slices, 2D information can be extracted from 3D images. Assuming that certain features of interest in 3D MRIs are preserved in 2D images, this process reduces the number of parameters in CNNs. Here different deep models using 2D CNNs are reviewed.

Generally, 2D CNNs capture the middle part of brain scans as the input data and ignore the remainder. Some studies extract gray matter (GM) tissue as the input data. Research studies use standard planes of brain scans, such as the sagittal, coronal, or axial planes. The axial plane is the most widely used plane.13 Farooq et al. employed axial slices of MRI (GM) so that slices were eliminated from the beginning and end, where there is no information. They implemented 2D CNNs based on GoogLeNet, ResNet-18, and ResNet-152.36 Valliani and Soni37 employed the median axial slice of subjects to train ResNet-18. Farooq et al. used 166 axial slices of MRI (GM) to train GoogLeNet, and ResNet-152.38 Seven 2D CNNs on seven groups of 2D images slices (five mid-axial slices in each group), each consisting of three convolutional layers, was proposed by Luo et al.39 In this study, a subject was classified as AD when at least one of the classifiers categorized it as AD.

After discarding the first and the last axial slices, assuming them to be without anatomical information, Wu et al.40 combined every three neighboring slices into an RGB color image to train CaffeNet and GoogLeNet. By removing the last 10 axial slices from MRI (GM) and slices with 0 mean pixels, LeNet and GoogLeNet models were used by Sarraf and Tofighi for AD detection.41 In two other studies, a sorting mechanism based on entropy was proposed to select the most informative slices from the axial plane of each MRI scan.42,43 The highest entropy computed from the histogram was associated with the most informative slices. The slices were then used to train VGGNet-16 and Inception-V4.

Gunawardena et al.44 used several image slices from the coronal plane to train a 2D CNN with two convolutional layers. This work showed that a brain scan’s coronal view covers the essential brain parts related to AD. The coronal view offers a discriminative advantage that was depicted by Wang et al. with DenseNet-121.45 All planes were used, and the coronal plane was selected as the most accurate. In another study, 20 mid-coronal slices were employed for training a 2D CNN based on VGGNet-16.21 Sagittal slices were also employed to train a 2D CNN with two convolutional layers46 or six convolutional layers.22

The use of all three planes of 3D brain scans can offer complementary features useful for our classification process. Thus some research considered all image planes. Islam and Zhang47 designed three 2D CNNs for three views. Each CNN comprised 4 dense blocks (12 convolutional layers in each) and 4 convolutional layers; the final classification was done by majority voting. In another multi-view study, different 2D CNNs (such as GoogLeNet, AlexNet, ResNet, and VGGNet) were employed with and without long short-term memory (LSTM).48 LSTM is considered a type of recurrent neural network (RNN) with a more complicated structure. No significant differences have been reported among all views, and multi-view models were reported to have higher accuracy than were single-view models.48 The most critical disadvantage of multi-view approaches is that they could lead to ambiguity in the final decision.

2.2.2.

3D CNNs

Given that MRI scans are 3D images, and a spatial relationship exists among 2D image slices, using 3D CNNs is the trend. The most direct method for AD detection is to take the entire MRI scan as the input. However, a large number of parameters are involved in training on a small dataset, which could lead to overfitting.49 In straightforward methods, 3D CNNs with 550 and 1251 convolutional layers were proposed; 3D CNNs pretrained with AEs with 3 convolutional layers52 were also suggested. Others, based on VGGNet and ResNet,53 ResNet-18,54 and ResNet-37,55 were recently implemented. In another study, the features were combined from multi-scale 3D convolutional AEs with three hidden layers and a 3D CNN with six convolutional layers.56

A VGGNet-based 3D CNN was proposed by Tang et al. to reduce the gradient-vanishing impact through an additional shortcut to merge high- and low-level information.57 A 3D CNN with seven convolutional layers was described by Wegmayr et al.58 To capture input features on different scales in this work, three different sizes for filters were selected in the first convolutional layer. Dense connections were introduced to 3D CNN for AD detection by Wang et al.59 It was reported that dense connections could enhance the gradients’ propagation throughout the network with insufficient training data.

2.3.

Transfer Learning

Successful deep learning methods in the classification of natural images have benefited studies of deep learning in the domain of medical images. However, it is still a challenge for researchers because of the low medical image acquisition quality and errors in preprocessing and segmentation; unavailability of a comprehensive dataset (including a vast number of subjects and biomarkers); low between-class variance in different stages of diseases; lack of expert knowledge, especially in identifying regions of interest (ROIs) in the medical images; and complexity of medical images compared with the usual natural images.

Overfitting is a challenging topic in deep learning that may occur because of issues such as a low number of subjects and a large number of learnable parameters. Some techniques are embedded into CNNs to avoid overfitting, such as max pooling and drop-out layers. Max pooling reduces the number of parameters and, subsequently, the dimension of extracted features to control overfitting and reach the invariance to scale, shift, and rotation.24 Drop-out layers randomly drop neurons at each update of the training phase and force neurons to act independently.21 Another idea is to discard fully connected layers applied to networks such as SqueezeNet. Removing them reduces the number of learnable parameters compared with VGGNet and reduces overfitting.

In addition to these CNN-related structures, L1 and L2 regularization have proven to prevent overfitting in the literature for training CNN models. A comprehensive overview of regularization methods, their working time, and object was recently studied together with advanced optimization methods on loss function during the training process.60,61 The overfitting issue is worst when applying 3D CNNs, which require training a large number of parameters.49 Transfer learning is another workaround to avoid overfitting.

Some studies trained a deep model from scratch; however, it is often not optimal. It is time consuming to train from scratch, and a dataset with adequate size is sometimes not available.21,49 Datasets for conventional classifications contain millions of images, while MRI datasets contain only a few hundred images. It causes overfitting in the training procedure. Using weights from a pretrained CNN for a specific task and retraining it for a new task through simply fine-tuning the learnable parameters is common. It is possible because general features are extracted by convolutional layers of CNNs, which can be utilized for many tasks. Hence, it is possible to transfer weights of convolutional layers from one application to another. This practice is called “transfer learning.”

Considering transfer learning from the ImageNet database, 2D CNNs like VGGNet-16,21 GoogLeNet and ResNet-152,38 Inception-V4,62 ResNet-18,37 VGGNet-16 and Inception-V4,42 VGGNet-16,43 CaffeNet and GoogLeNet,40 and DenseNet-12145 were implemented for AD detection using MRI scans. In contrast, CNNs such as LeNet and GoogLeNet,41 GoogLeNet, ResNet-18, and ResNet-152,36 and 3D CNNs based on ResNet,55 VGGNet,57 and VGGNet and ResNet53 were trained for AD detection from scratch. Transfer learning was reported to be quicker and outperformed training from scratch in 2D CNNs.37,42 Competition was noted among famous 2D CNNs, like ResNet-152 and GoogLeNet.36,38 Nevertheless, the performance of Inception-V4, ResNet, and CaffeNet was better than that of VGGNet-16, AlexNet, and GoogLeNet in some studies.40,42,62 DenseNet outperformed ResNet and LeNet in another study.63

Generally, 2D CNNs with transfer learning performed well when a limited number of subjects were available. For more subjects, however, 3D CNNs requires many more learnable parameters to solve such a complex problem. Wang et al.59 examined the effect of depth in CNNs. Results of this study indicated that shallow and very deep networks do not always yield proper outcomes. One crucial factor is the size of the training set, which significantly influences the performance of classifiers.59 There are a limited number of AD patients in MRI datasets, which is particularly insufficient for deep models. Therefore, combining datasets, using several scans of the same subject in longitudinal datasets, and data augmentation were followed in some studies.

3.

Materials and Methods

Training 2D CNNs is easy, but they are not optimized in encoding the spatial information of the 3D MRIs because of the non-existence of the third dimension in convolving kernels.49 Conversely, 3D CNNs can obtain 3D information from the 3D MRIs. Regardless of the training complexity, they have shown a higher accuracy than 2D CNNs.57 In Sec. 2.3, the success of transfer learning was explained. Training from scratch has been followed in many classification studies. However, according to the literature review, parameters such as weight and bias can be initialized by training deep models on other related or unrelated tasks. These parameters can extract general features from images, which can assist in our AD detection problem. The dataset from the source domain can be anything with millions of images, such as ImageNet or MNIST. ImageNet is a visual dataset that includes 1000 object categories, such as plants, tools, and many animals.19 MNIST is a dataset of handwritten digits with 70,000 centered, fixed-size, and grayscale images.64 To train models from scratch, the Xavier initialization method is utilized.65

The block diagram of an AD detection system is presented in Fig. 1. Preprocessing steps, such as intensity normalization, registration, and tissue segmentation, prepare MRI scans to be used as input data for intelligent systems. After preprocessing, MRI scans are prepared for training the implemented deep models according to each model’s structure. Slicing 3D MRI scans to 2D images, identifying ROIs, 2D/3D patch extraction, and resizing are possible data management methods. Three approaches for AD detection were implemented in this paper, using CNNs. The slice-based approach extracts 2D image slices from 3D MRIs for data management, using a 2D CNN as the deep model. The sequence-based approach takes the extracted features from the previous approach and feeds them to a RNN. The voxel-based approach takes the entire 3D MRI for data management, using a 3D CNN as the deep model.

Fig. 1

The block diagram of an AD detection system.

JMI_8_2_024503_f001.png

As discussed previously, various 2D CNN models have been proposed for AD detection using the ImageNet dataset. In Tables 1 and 2, all employed CNNs in this paper are listed together with their depth, number of layers, number of connections, number of convolution layers, number of fully connected layers, size on disk, number of learnable parameters, and image input size. The network depth is defined as the largest number of sequential convolutional layers and fully connected layers on a path from the input layer to the output layer. CNN structures are the same for 2D and 3D CNNs (i.e., number, type, and order of layers). The only differences are the dimensions of filters and input images, which can be 2D and 3D. Except for LeNet-5, which was trained on grayscale MNIST images, all networks’ inputs are RGB images according to the ImageNet dataset’s images. To adjust the models to our AD detection problem, the last layer of each 2D CNN model was changed to have two outputs with a weight/bias learning rate factor 10 times more than the previous layers. Applying this learning rate leads to faster training of the new layer than the other layers, which were already trained on the MNIST/ImageNet dataset.

Table 1

Implemented 2D CNNs in our study.

NetworkDepth#Layers#Connections#Convolutional layers#FCsSize (MB)#Parameters (millions)Image input sizeReference accuracy (%)
LeNet-551615320.230.06232×3298.48
AlexNet825245322761.0227×227×354.10
VGG-16164140133515138224×224×370.29
SqueezeNet1868752704.51.24227×227×355.16
ResNet-181871782014411.7224×224×369.49
VGG-19194746163535144224×224×370.42
GoogLeNet22144170581277.0224×224×366.25
Inceptionv3483153499418723.9299×299×377.07
ResNet-50501771925319625.6224×224×374.46
ResNet-101101347379104116744.6224×224×375.96

Table 2

Implemented 3D CNNs in our study.

NetworkSize (MB)#Parameters (millions)Image input size
LeNet-50.260.2632×32×32
ResNet-184634224×224×224
ResNet-5013248224×224×224
ResNet-10120487224×224×224

The classification accuracy of the reference dataset (MNIST or ImageNet) is the most generic means to evaluate network accuracy. Networks that perform well on the reference dataset are likewise usually accurate when applied to other image datasets using transfer learning. This assumption is feasible because networks have learned to extract general informative features from images. However, high accuracy on the reference dataset does not guarantee the model’s performance on other tasks; hence, trying multiple networks is required. Our LeNet-5 implementation achieved 98.48% accuracy on the MNIST test set after training on its training set. Other CNN models are implemented in MATLAB with the listed accuracies in the final column of Table 1. Standard ImageNet validation data were utilized to achieve these numbers according to the MATLAB documentation page.66

The knowledge from MNIST and ImageNet datasets can be transferred to our AD detection case. However, the main issue here is that these datasets have 2D images, and deep models pretrained on 2D images cannot be useful in a 3D workflow. This paper extends the concept of transfer learning from the 2D natural-images domain to the 3D MRI-images domain. To this end, we consider several approaches, explained in the following section.

3.1.

Dataset Preparation

The ADNI (AD Neuroimaging Initiative) study provided the dataset used here. ADNI is the most extensively utilized dataset in studies in this field.13,67 Its main goal is to determine if it is possible to combine PET, MRI, and other biological markers, neuropsychological and clinical assessments to measure AD progress. Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians in developing new treatments, monitoring their effectiveness, and reducing the time and cost of clinical trials. The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration, private pharmaceutical companies, and non-profit organizations, as a $60 million 5-year public–private partnership. Acquisition of these data was performed according to the ADNI acquisition protocol.67 ADNI subjects aged 55 to 90 years were recruited from more than 50 sites across the United States and Canada. MRI scans of 132 subjects for each class (AD and NC) were employed in this study (only for baseline or screening); the dataset is available online upon request.

Classification performance is affected by image preprocessing. Among different preprocessing techniques for AD detection, registration and intensity normalization are the techniques most commonly used.13 Normalization is mapping voxel/pixel intensities of all MRI scans to a common range. Registration is defined as a type of spatial adjustment of MRI scans to a reference anatomical space. Because of the differences among the brains of unique subjects, it is a necessity. Image registration is helpful to standardize MRI patterns. Intensity normalization (zero-center) and registration to Montreal Neurological Institute space68 were utilized in this study using the SPM12 toolbox.69

After preprocessing, the dimension of each MRI was 79×95×79. For 3D CNNs, the entire preprocessed MRI was used for training. For 2D CNNs, 2D slices of every view of MRI (axial, coronal, and sagittal) were extracted by disposing of the first and last image slices. By stacking three adjacent slices as RGB color channels, the remaining MRI slices were formed into 24 RGB coronal images, 19 RGB sagittal images, and 16 RGB axial images. Except for LeNet-5, since all 2D CNN models pretrained on the ImageNet dataset take RGB images as the input, RGB images are required. Some sample images of the ADNI dataset are shown in Figs. 2 and 3. Next, we resized each MRI to fit each 2D/3D CNN’s input layer in Tables 1 and 2. Data augmentation was used because the number of patients was insufficient to train a deep structure to improve classification performance.37,70 Data augmentation methods such as random translation, scaling, reflection, noise injection, rotation, blurring, cropping, and gamma correction are a process that increases data diversity to train models without gathering new data. In 3D approaches, ±5% scaling and ±5  pixel translation were performed on the training set only. In 2D approaches and because of brain scans’ symmetry, reflecting coronal and axial views was conducted along with the stated data augmentation transformations.

Fig. 2

Sample images of the ADNI dataset.

JMI_8_2_024503_f002.png

Fig. 3

24 MRI coronal slices of one subject.

JMI_8_2_024503_f003.png

3.2.

Approach 1: 2D CNNs

For approach 1, 2D CNNs have the ability to extract AD-related discriminative features, such as brain shrinkage, from each image slice, and classification of each subject is conducted based on image slices of that subject. Figure 4(a) indicates our single-view 2D CNN structure. Single-view (coronal, sagittal, and axial) image slices of all subjects on the training set are used to train a 2D CNN. For testing, all image slices of an MRI scan are then classified by the CNN model. The final classification is determined by a majority voting strategy on all image slices. The multi-view configuration follows a similar process, as depicted in Fig. 4(b), except another majority voting strategy is used on all three views to make the final decision.

Fig. 4

Architectures of the applied approaches: (a) single-view 2D CNN; (b) multi-view CNNs; (c) single-view 2D CNN + LSTM; (d) multi-view CNNs + LSTM; and (e) 3D CNN.

JMI_8_2_024503_f004.png

3.3.

Approach 2: 2D CNNs + LSTM

This approach aims to understand spatial connections in 2D image slices of MRIs by the combination of 2D CNNs with LSTM. After a preliminary phase of feature extraction with a CNN model on the ADNI database, an LSTM structure is employed to acquire informative features to detect AD on a sequence of image slices. In the single-view mode, after feature extraction by a CNN structure on each MRI slice, an LSTM model is trained on the extracted features from MRI slices of a specific subject on each view, as shown in Fig. 4(c). The features extracted after the second to last layer of the CNN (before the final layer) feed the LSTM model. It is expected that LSTM can comprehend the connections between sequences of images in each subject and make a sequence-based decision based on all input image slices, not individually. The multi-view configuration, as can be observed in Fig. 4(d), also follows this process, with the difference being that a majority voting approach was employed over all three views for the final classification. Two LSTM layers (100 hidden units for each of them) were designed for the LSTM network by trial and error.

3.4.

Approach 3: 3D CNNs

The whole preprocessed MRIs were used to train 3D CNNs to make voxel-based decisions. According to Table 2, 3D CNNs have many learnable parameters, and training them is computationally expensive. However, we utilized this type of deep model to be able to compare them with 2D CNNs. To build 3D CNNs, we extended the 2D filters of some CNN models in Table 1 in the third dimension to have 3D filters. Any other layers explained in Sec. 2.1 were adjusted according to the new filters. Training 3D CNNs has many learnable parameters compared with 2D CNNs. This issue makes the backpropagation learning process difficult to converge, especially when training from scratch. To transfer the learnable parameters from pretrained 2D CNNs (on MNIST or ImageNet) to 3D CNNs, we duplicated 2D filters (copying them repeatedly) through the third dimension. This is possible because an MRI scan can be converted into a sequence of image slices. To the best of our knowledge, 3D CNNs with transfer learning have not been employed for any classification task. In the training process, we expect that 3D models learn AD-related features in each slice of MRI and understand AD-related patterns across image slices. The 3D models are listed in Table 2, along with the number of learnable parameters and the networks’ size on a memory. The number of learnable parameters increases with the extension of the filters’ dimensions. A diagram of our 3D structure can be observed in Fig. 4(e).

4.

Results and Discussion

As previously discussed, the ADNI dataset was utilized in this paper. For our experiments, we utilized 100, 16, and 16 subjects per class to train, validate, and test, respectively. For each subject, one MRI scan was used. Therefore, each set contains the same number of MRI scans for each class. The reason for selecting the same number of samples for each class was to avoid the prediction bias because of an imbalanced dataset. It is possible to access many MRI scans from healthy people to include in our dataset and increase the dataset size. However, the number of subjects with AD is limited in medical datasets. Therefore, to avoid class imbalance and subsequent prediction bias to one of the classes—NC in this case—the same number of MRI scans was selected for each class.

The same training parameters were used for all 2D CNNs that contained mini-batch size = 64, initial learning rate = 0.0003, and L2 regularization = 0.0005. Stochastic gradient descent (SGD) with momentum=0.9 was the optimizer algorithm with early stopping according to the validation set. However, mini-batch size=8 was used for 3D CNNs because of the available computational resources. Moreover, we utilized the same training parameters for all LSTM models that contained mini-batch size = 64, initial learning rate = 0.01, and L2 regularization = 0.0001. SGD was the optimizer, with momentum=0.9, and the maximum number of epochs was 50, with early stopping according to the validation set.

The training set was shuffled in every epoch of the training process, and simultaneously, data augmentation was implemented for CNN models. Consequently, a CNN model observed altered input data in a different arrangement in each epoch. MATLAB deep learning toolbox was used in this paper to implement and train our models on an NVIDIA DGX station. This system has 4 GPUs with 32 GB memory for each and 256 GB RAM. However, we utilized only one GPU to reduce the chance of parallel computational issues in multi-core systems. The same training, validation, and test sets were employed for all models to ensure objective comparisons. In the following figures and tables, “TL” and “Sc” denote transfer learning and training from scratch, respectively.

The performance of classification by utilized CNNs is shown in Table 3. In this table, accuracy refers to the percentage of correctly classified test subjects. Sensitivity refers to the percentage of evaluated test subjects suffering from AD who were correctly classified as such, while specificity is the percentage of evaluated healthy test subjects correctly classified as healthy. Sensitivity is often more important for screening tests in medical tasks.

Table 3

Classification accuracy of implemented deep models (AD versus NC).

CNN model2D CNNs (Sc)2D CNNs (TL)2D CNNs (TL) + LSTM3D CNN
AxialCoronalSagittalMulti-viewAxialCoronalSagittalMulti-viewAxialCoronalSagittalMulti-viewScTL
LeNet-578.1278.1284.3884.3881.2581.2581.2584.3884.3884.3884.3887.5081.2581.25
AlexNet68.7578.1284.3278.1275.0081.2584.1884.3884.3890.6281.2584.38
VGGNet-1671.8884.3884.3881.2584.3881.2584.3884.3881.2584.3881.2584.38
SqueezeNet78.1278.1278.1281.2584.3881.2587.5090.6287.5090.6290.6290.62
ResNet-1881.2587.5084.3884.3887.5081.2581.2584.3887.5081.2578.1281.2568.7596.88
VGGNet-1971.8881.2587.5081.2584.3881.2584.3878.1281.2584.3887.5084.38
GoogleNet75.0078.1275.0078.1271.8881.2584.3881.2587.5087.5075.0087.50
Inceptionv378.1278.1281.2578.1281.2578.1278.1278.1275.0084.3878.1281.25
ResNet-5078.1281.2581.2578.1281.2581.2578.1287.5087.5087.5081.2587.5062.5090.62
ResNet-10178.1278.1278.1281.2581.2578.1275.0078.1281.2581.2581.2578.1293.75

2D CNN made 24, 16, and 19 decisions on coronal, axial, or sagittal views in the single-view mode of the 2D approach. The majority voting on the whole pile of images for each MRI view of a particular subject informed the final decision. In the multi-view mode, CNNs determined all image slices of the related view separately. A majority voting approach on all three views of a particular subject determined the final decision. Both transfer learning and training from scratch were employed in this approach. Transfer learning yielded about 2% accuracy enhancement on average. Generally, transfer learning helps deep models avoid overfitting.

SqueezeNet and ResNet-18 performed best on our dataset in this approach. The core idea behind ResNet is presenting a “shortcut connection” that skips some layers to avoid the vanishing gradient problem. The main notion of SqueezeNet is to use 1×1 (point-wise) filters to replace 3×3 filters to have fewer computations and down-sample late in the model to retain a large feature map. Multi-view ResNet-18 using transfer learning had 84.38% accuracy, 87.5% sensitivity, and 81.25% specificity. Multi-view SqueezeNet using transfer learning had 90.62% accuracy, 81.25% of sensitivity, and 100% specificity. The same performance was achieved by SqueezeNet + LSTM.

In the 2D CNNs + LSTM approach, a CNN was assigned to extract features from an image slice, and an LSTM model was assigned to locate the connection among sequences of image slices for a subject. After extracting features via a CNN structure on each MRI slice, the LSTM model made a decision based on the extracted features from MRI slices of a specific subject on each view in the single-view mode. In other words, the LSTM model made a single decision on either the sagittal, coronal, or axial view. In the multi-view mode, an LSTM model independently decided on all slices of a view. The final decision was reached by a majority voting approach on all three LSTM models allocated to each view.

Because of the better performance compared with training from scratch in the 2D CNNs approach, only transfer learning was utilized in this approach. The performance of models was compared, with and without LSTM. Since LSTM models incorporated temporal dependencies for the classification, almost 2% accuracy enhancement on average was observed. The results in Table 3 indicate there were not any significant differences among all views. However, multi-view models were more robust to some extent and showed greater accuracy compared with single-view models.

In the 3D CNNs approach, each CNN made a decision based on the whole MRI volume. A 3D CNN was responsible for feature extraction from each image slice and finding the relation between sequences of images for each subject simultaneously. Training some models was impossible using available hardware. We implemented and trained LeNet-5, ResNet-18, ResNet-50, and ResNet-101 in the 3D workflow. Both transfer learning and training from scratch were employed in this approach. Because of the computational requirements, ResNet-101 was trained using 4 GPUs simultaneously with parallel processing. In this situation, the 3D ResNet-101 did not converge after 2000 epochs while training from scratch. According to the results in Table 3, transferring knowledge from ImageNet to 3D CNNs improved the results significantly (96.88% accuracy on 3D ResNet-18) compared with other approaches. However, training 3D CNNs from scratch had a poor performance on our MRI dataset. This is because 3D CNNs have many learnable parameters, which makes the training process challenging. The situation was the worst for deeper models like ResNet-101. As shown in Table 3, transfer learning and training from scratch yielded the same performance for LeNet-5. However, transfer learning overtook training from scratch in accuracy in deeper models like ResNet-18 and ResNet-50. For ResNet-101, the training process could not converge for our available hardware resources when training from scratch.

The training time of different CNNs and different approaches in both transfer learning and training from scratch modes is shown in Fig. 5. Further, the number of required epochs is shown in Fig. 6. Only the results of multi-view approaches are given in both figures. All models have the same training parameters, except mini-batch size, 64 for 2D CNNs and 8 for 3D CNNs. Also the same hardware/software resources were used to train the models. Therefore, we omitted 3D ResNet-101 with transfer learning because it did not converge on a single GPU. As expected, deeper models or models with more learnable parameters took longer to train. For example, the training progress—SGD optimizer’s training loss versus epochs—of 3D ResNet-18 with transfer learning is shown in Fig. 7. Training from scratch was more time consuming and required more epochs compared with transfer learning. Comparing transfer learning in 2D and 3D CNNs, approximately the same training time was obtained in our experiments, though 3D CNNs required fewer epochs. Deeper models need fewer epochs in their training process; however, each epoch was more time consuming.

Fig. 5

Training time for implemented deep models in different approaches.

JMI_8_2_024503_f005.png

Fig. 6

The number of required epochs for implemented deep models in different approaches.

JMI_8_2_024503_f006.png

Fig. 7

SGD optimizer’s training loss for 3D ResNet-18 with transfer learning with moving average filter of 10.

JMI_8_2_024503_f007.png

5.

Conclusions

This paper has presented the design, implementation, and experiments of several MRI-based AD detection approaches using CNNs. In the first approach, a 2D CNN was trained on MRI slices in the single-view mode. Then for classification, the CNN model made decisions on all slices of one patient on the particular view. A majority voting mechanism was applied in multi-view mode to make the final decision on three views of an MRI volume. In the second approach, an LSTM model was used to classify a sequence of MRI slices in multi-view and single-view modes. In these approaches, there were not any significant differences among all views. However, the multi-view models were slightly more robust and accurate compared with single-view models. In the third approach, a 3D CNN was employed to classify MRI volumes, each in a single decision.

In general, SqueezeNet and ResNet-18 have the best performance on our dataset. Transfer learning was used in the 2D CNN approach, yielding 2% accuracy enhancement on average compared with training from scratch. Further, transferring knowledge from ImageNet to the MRI dataset from ADNI using 3D CNNs considerably improved the results compared with training from scratch. To the best of our knowledge, 3D CNNs with transfer learning have not been employed for any classification task. These 3D models can be employed for classifying 3D images and videos. Comparing different approaches, 2D CNN models with LSTM achieved 2% accuracy enhancement on average compared with 2D CNN models alone. Also 3D CNNs with transfer learning considerably improved the results to 96.88% accuracy, 100% sensitivity, and 94.12% specificity. However, training 3D CNNs from scratch performed poorly on our MRI dataset. In conclusion, deep learning methods can be utilized for accurate detection of AD. However, the necessity of a large dataset is a weakness of these approaches.

Disclosures

The authors have no conflicts of interest to declare, financial or otherwise, relevant to this article’s content.

Acknowledgments

Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI), National Institutes of Health (Grant No. U01 AG024904), and DOD ADNI (Department of Defense Award No. W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research and Development, LLC.; Johnson & Johnson Pharmaceutical Research and Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research provide funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( https://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory of Neuro Imaging (LONI) at the University of Southern California.

Code, Data, and Materials Availability

Data used in the preparation of this article were obtained from the ADNI study. The data are available online upon request ( http://adni.loni.usc.edu/data-samples/access-data/). The models implemented in this study are available in MathWorks file exchange ( https://au.mathworks.com/matlabcentral/profile/authors/12814523?utf8=%E2%9C%93&detail2=&detail=fileexchange).

References

1. 

Alzheimer’s Association, “2020 Alzheimer’s disease facts and figures,” Alzheimer’s Dementia, 16 (3), 1 –94 (2020). Google Scholar

2. 

M. Prince, R. Bryce and C. Ferri, World Alzheimer Report 2011: The Benefits of Early Diagnosis and Intervention, Alzheimer’s Disease International(2011). Google Scholar

3. 

S. Klöppel et al., “Diagnostic neuroimaging across diseases,” Neuroimage, 61 (2), 457 –463 (2012). https://doi.org/10.1016/j.neuroimage.2011.11.002 NEIMEF 1053-8119 Google Scholar

4. 

Dementia Australia, The Dementia Guide for People Living with Dementia, their Families and Carers, Dementia Australia,2020). Google Scholar

5. 

M. Prince et al., “The global prevalence of dementia: a systematic review and metaanalysis,” Alzheimer’s Dementia, 9 (1), 63 –75.e2 (2013). https://doi.org/10.1016/j.jalz.2012.11.007 Google Scholar

6. 

Australian Bureau of Statistics, “Causes of Death, Australia, 2019,” (2020). Google Scholar

7. 

F. Mangialasche et al., “Alzheimer’s disease: clinical trials and drug development,” Lancet Neurol., 9 (7), 702 –716 (2010). https://doi.org/10.1016/S1474-4422(10)70119-8 Google Scholar

8. 

S. Paquerault, “Battle against Alzheimer’s disease: the scope and potential value of magnetic resonance imaging biomarkers,” Acad. Radiol., 19 (5), 509 –511 (2012). https://doi.org/10.1016/j.acra.2012.02.003 Google Scholar

9. 

J. P. Lerch et al., “Automated cortical thickness measurements from MRI can accurately separate Alzheimer’s patients from normal elderly controls,” Neurobiol. Aging, 29 (1), 23 –30 (2008). https://doi.org/10.1016/j.neurobiolaging.2006.09.013 NEAGDO 0197-4580 Google Scholar

10. 

E. Gerardin et al., “Multidimensional classification of hippocampal shape features discriminates Alzheimer’s disease and mild cognitive impairment from normal aging,” Neuroimage, 47 (4), 1476 –1486 (2009). https://doi.org/10.1016/j.neuroimage.2009.05.036 NEIMEF 1053-8119 Google Scholar

11. 

Y. Kazemi and S. K. Houghten, “A deep learning pipeline to classify different stages of Alzheimer’s disease from fMRI data,” in Proc. IEEE Conf. Comput. Intell. Bioinf. Computat. Biol., 1 –8 (2018). https://doi.org/10.1109/CIBCB.2018.8404980 Google Scholar

12. 

M. I. Razzak, S. Naz, A. Zaib, “Deep learning for medical image processing: overview, challenges and the future,” Classification in BioApps, 323 –350 Springer(2018). Google Scholar

13. 

A. Ebrahimighahnavieh, S. Luo and R. Chiong, “Deep learning to detect Alzheimer’s disease from neuroimaging: a systematic literature review,” Comput. Methods Prog. Biomed., 187 105242 (2020). https://doi.org/10.1016/j.cmpb.2019.105242 Google Scholar

14. 

L. Deng and D. Yu, “Deep learning: methods and applications,” Found. Trends Signal Process., 7 (3–4), 197 –387 (2014). https://doi.org/10.1561/2000000039 Google Scholar

15. 

J. Ker et al., “Deep learning applications in medical image analysis,” IEEE Access, 6 9375 –9389 (2018). https://doi.org/10.1109/ACCESS.2017.2788044 Google Scholar

16. 

D. Shen, G. Wu and H.-I. Suk, “Deep learning in medical image analysis,” Annu. Rev. Biomed. Eng., 19 221 –248 (2017). https://doi.org/10.1146/annurev-bioeng-071516-044442 ARBEF7 1523-9829 Google Scholar

17. 

G. Litjens et al., “A survey on deep learning in medical image analysis,” Med. Image Anal., 42 60 –88 (2017). https://doi.org/10.1016/j.media.2017.07.005 Google Scholar

18. 

Y. LeCun et al., “Gradient-based learning applied to document recognition,” Proc. IEEE, 86 (11), 2278 –2324 (1998). https://doi.org/10.1109/5.726791 IEEPAD 0018-9219 Google Scholar

19. 

O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vision, 115 (3), 211 –252 (2015). https://doi.org/10.1007/s11263-015-0816-y IJCVEQ 0920-5691 Google Scholar

20. 

Y. Guo et al., “Deep learning for visual understanding: a review,” Neurocomputing, 187 27 –48 (2016). https://doi.org/10.1016/j.neucom.2015.09.116 NRCGEO 0925-2312 Google Scholar

21. 

C. D. Billones et al., “DemNet: a convolutional neural network for the detection of Alzheimer’s disease and mild cognitive impairment,” in Proc. IEEE Region 10 Conf., 3724 –3727 (2016). https://doi.org/10.1109/TENCON.2016.7848755 Google Scholar

22. 

S.-H. Wang et al., “Classification of Alzheimer’s disease based on eight-layer convolutional neural network with leaky rectified linear unit and max pooling,” J. Med. Syst., 42 (5), 85 (2018). https://doi.org/10.1007/s10916-018-0932-7 JMSYDA 0148-5598 Google Scholar

23. 

A. Karwath et al., “Convolutional neural networks for the identification of regions of interest in PET scans: a study of representation learning for diagnosing Alzheimer’s disease,” in Proc. Conf. Artif. Intell. Med. Europe, 316 –321 (2017). https://doi.org/10.1007/978-3-319-59758-4_36 Google Scholar

24. 

D. Scherer, A. Müller and S. Behnke, ““Evaluation of pooling operations in convolutional architectures for object recognition,” in Int. Conf. Artif. Neural Networks (ICANN), 92 –101 (2010). https://doi.org/10.1007/978-3-642-15825-4_10 Google Scholar

25. 

S. Bringas et al., “Alzheimer’s disease stage identification using deep learning models,” J. Biomed. Inf., 109 103514 (2020). https://doi.org/10.1016/j.jbi.2020.103514 Google Scholar

26. 

M. Raju et al., “Multi-class diagnosis of Alzheimer’s disease using cascaded three dimensional-convolutional neural network,” Phys. Eng. Sci. Med., 43 1219 –1228 (2020). https://doi.org/10.1007/s13246-020-00924-w Google Scholar

27. 

A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Adv. Neural Inf. Process. Syst., 1097 –1105 (2012). https://doi.org/10.1145/3065386 Google Scholar

28. 

Y. Jia et al., “Caffe: convolutional architecture for fast feature embedding,” in Proc. 22nd ACM Int. Conf. Multimedia, 675 –678 (2014). https://doi.org/10.1145/2647868.2654889 Google Scholar

29. 

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” (2014). Google Scholar

30. 

C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 1 –9 (2015). https://doi.org/10.1109/CVPR.2015.7298594 Google Scholar

31. 

K. He et al., “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 770 –778 (2016). https://doi.org/10.1109/CVPR.2016.90 Google Scholar

32. 

G. Huang et al., “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 4700 –4708 (2017). https://doi.org/10.1109/CVPR.2017.243 Google Scholar

33. 

C. Szegedy et al., “Inception-v4, inception-ResNet and the impact of residual connections on learning,” in Proc. 31st Assoc. for the Adv. Artif. Intell. Conf. Artif. Intell., 12 (2017). https://doi.org/10.5555/3298023.3298188 Google Scholar

34. 

D. Cheng and M. Liu, “Combining convolutional and recurrent neural networks for Alzheimer’s disease diagnosis using PET images,” in Proc. IEEE Int. Conf. Imaging Syst. Tech., 1 –5 (2017). https://doi.org/10.1109/IST.2017.8261461 Google Scholar

35. 

M. Liu, D. Cheng and W. Yan, “Classification of Alzheimer’s disease by combination of convolutional and recurrent neural networks using FDG-PET images,” Front. Neuroinf., 12 35 (2018). https://doi.org/10.3389/fninf.2018.00035 Google Scholar

36. 

A. Farooq et al., “A deep CNN based multi-class classification of Alzheimer’s disease using MRI,” in Proc. IEEE Int. Conf. Imaging Syst. Tech., 1 –6 (2017). https://doi.org/10.1109/IST.2017.8261460 Google Scholar

37. 

A. Valliani and A. Soni, “Deep residual nets for improved Alzheimer’s diagnosis,” in Proc. 8th ACM Int. Conf. Bioinf., Comput. Biol., and Health Inf., 615 –615 (2017). https://doi.org/10.1145/3107411.3108224 Google Scholar

38. 

A. Farooq et al., “Artificial intelligence based smart diagnosis of Alzheimer’s disease and mild cognitive impairment,” in Proc. Smart Int. Cities Conf. (ISC2), 1 –4 (2017). https://doi.org/10.1109/ISC2.2017.8090871 Google Scholar

39. 

S. Luo, X. Li and J. Li, “Automatic Alzheimer’s disease recognition from MRI data using deep learning method,” J. Appl. Math. Phys., 5 (09), 1892 –1898 (2017). https://doi.org/10.4236/jamp.2017.59159 Google Scholar

40. 

C. Wu et al., “Discrimination and conversion prediction of mild cognitive impairment using convolutional neural networks,” Quant. Imaging Med. Surg., 8 (10), 992 –1003 (2018). https://doi.org/10.21037/qims.2018.10.17 Google Scholar

41. 

S. Sarraf and G. Tofighi, “Classification of Alzheimer’s disease structural MRI data by deep learning convolutional neural networks,” (2016). Google Scholar

42. 

M. Hon and N. Khan, “Towards Alzheimer’s disease classification through transfer learning,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 1166 –1169 (2017). https://doi.org/10.1109/BIBM.2017.8217822 Google Scholar

43. 

R. Jain et al., “Convolutional neural network based Alzheimer’s disease classification from magnetic resonance brain images,” Cognit. Syst. Res., 57 147 –159 (2019). https://doi.org/10.1016/j.cogsys.2018.12.015 Google Scholar

44. 

K. Gunawardena, R. Rajapakse and N. Kodikara, “Applying convolutional neural networks for pre-detection of Alzheimer’s disease from structural MRI data,” in Proc. 24th Int. Conf. Mechatron. Mach. Vision in Pract., 1 –7 (2017). https://doi.org/10.1109/M2VIP.2017.8211486 Google Scholar

45. 

J. Islam and Y. Zhang, “Deep convolutional neural networks for automated diagnosis of Alzheimer’s disease and mild cognitive impairment using 3D brain MRI,” in Proc. Int. Conf. Brain Inf., 359 –369 (2018). https://doi.org/10.1007/978-3-030-05587-5_34 Google Scholar

46. 

J. M. Ortiz-Suárez, R. Ramos-Pollán and E. Romero, “Exploring Alzheimer’s anatomical patterns through convolutional networks,” Proc. SPIE, 10160 10160Z (2017). https://doi.org/10.1117/12.2256840 PSISDG 0277-786X Google Scholar

47. 

J. Islam and Y. Zhang, “Brain MRI analysis for Alzheimer’s disease diagnosis using an ensemble system of deep convolutional neural networks,” Brain Inf., 5 (2), 1 –14 (2018). https://doi.org/10.1186/s40708-018-0080-3 Google Scholar

48. 

A. Ebrahimi-Ghahnavieh, S. Luo and R. Chiong, “Transfer learning for Alzheimer’s disease detection on MRI images,” in IEEE Int. Conf. Industry 4.0, Artif. Intell., and Commun. Technol., 133 –138 (2019). https://doi.org/10.1109/ICIAICT.2019.8784845 Google Scholar

49. 

M. Liu et al., “Multi-modality cascaded convolutional neural networks for Alzheimer’s disease diagnosis,” Neuroinformatics, 16 (3-4), 295 –308 (2018). https://doi.org/10.1007/s12021-018-9370-4 1539-2791 Google Scholar

50. 

K. Bäckström et al., “An efficient 3D deep convolutional network for Alzheimer’s disease diagnosis using MR images,” in Proc. IEEE 15th Int. Symp. Biomed. Imaging, 149 –153 (2018). https://doi.org/10.1109/ISBI.2018.8363543 Google Scholar

51. 

S. Basaia et al., “Automated classification of Alzheimer’s disease and mild cognitive impairment using a single MRI and deep neural networks,” NeuroImage: Clin., 21 101645 (2019). https://doi.org/10.1016/j.nicl.2018.101645 Google Scholar

52. 

E. Hosseini-Asl, R. Keynto and A. El-Baz, “Alzheimer’s disease diagnostics by adaptation of 3D convolutional network,” in Proc. IEEE Int. Conf. Image Process., 126 –130 (2016). https://doi.org/10.1109/ICIP.2016.7532332 Google Scholar

53. 

S. Korolev et al., “Residual and plain convolutional neural networks for 3D brain MRI classification,” in Proc. IEEE 14th Int. Symp. Biomed. Imaging, 835 –838 (2017). https://doi.org/10.1109/ISBI.2017.7950647 Google Scholar

54. 

A. Ebrahimi, S. Luo and R. Chiong, “Introducing transfer learning to 3D ResNet-18 for Alzheimer’s disease detection on MRI images,” in 35th Int. Conf. Image and Vision Comput., 1 –6 (2020). https://doi.org/10.1109/IVCNZ51579.2020.9290616 Google Scholar

55. 

H. Karasawa, C.-L. Liu and H. Ohwada, “Deep 3D convolutional neural network architectures for Alzheimer’s disease diagnosis,” in Proc. Asian Conf. Intell. Inf. Database Syst., 287 –296 (2018). https://doi.org/10.1007/978-3-319-75417-8_27 Google Scholar

56. 

F. Li, D. Cheng and M. Liu, “Alzheimer’s disease classification based on combination of multi-model convolutional networks,” in Proc. IEEE Int. Conf. Imaging Syst. Tech., 1 –5 (2017). https://doi.org/10.1109/IST.2017.8261566 Google Scholar

57. 

H. Tang et al., “A fast and accurate 3D fine-tuning convolutional neural network for Alzheimer’s disease diagnosis,” in Proc. Int. CCF Conf. Artif. Intell., 115 –126 (2018). https://doi.org/10.1007/978-981-13-2122-1_9 Google Scholar

58. 

V. Wegmayr, S. Aitharaju and J. Buhmann, “Classification of brain MRI with big data and deep 3D convolutional neural networks,” Proc. SPIE, 10575 10575S (2018). https://doi.org/10.1117/12.2293719 PSISDG 0277-786X Google Scholar

59. 

H. Wang et al., “Ensemble of 3D densely connected convolutional network for diagnosis of mild cognitive impairment and Alzheimer’s disease,” Neurocomputing., 333 145 –156 (2019). https://doi.org/10.1016/j.neucom.2018.12.018 NRCGEO 0925-2312 Google Scholar

60. 

Q. Zheng et al., “Layer-wise learning based stochastic gradient descent method for the optimization of deep convolutional neural network,” J. Intell. Fuzzy Syst., 37 (4), 5641 –5654 (2019). https://doi.org/10.3233/JIFS-190861 JIFSE2 1064-1246 Google Scholar

61. 

Q. Zheng et al., “Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process,” IEEE Access, 6 15844 –15869 (2018). https://doi.org/10.1109/ACCESS.2018.2810849 Google Scholar

62. 

J. Islam and Y. Zhang, “A novel deep learning based multi-class classification method for Alzheimer’s disease detection using brain MRI data,” in Proc. Int. Conf. Brain Inf., 213 –222 (2017). https://doi.org/10.1007/978-3-319-70772-3_20 Google Scholar

63. 

F. Li and M. Liu, for the Alzheimer’s Disease Neuroimaging Initiative, “Alzheimer’s disease diagnosis based on multiple cluster dense convolutional networks,” Comput. Med. Imaging Graphics, 70 101 –110 (2018). https://doi.org/10.1016/j.compmedimag.2018.09.009 Google Scholar

64. 

Y. LeCun, C. Cortes and C. J. Burges, “The MNIST database of handwritten digits, 1998,” (1998). http://yann.lecun.com/exdb/mnist Google Scholar

65. 

X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Stat., 249 –256 (2010). Google Scholar

67. 

Jr. C. R. Jack et al., “The Alzheimer’s Disease Neuroimaging Initiative (ADNI): MRI methods,” J. Magn. Reson. Imaging, 27 (4), 685 –691 (2008). https://doi.org/10.1002/jmri.21049 Google Scholar

68. 

V. Fonov et al., “Unbiased average age-appropriate atlases for pediatric studies,” Neuroimage, 54 (1), 313 –327 (2011). https://doi.org/10.1016/j.neuroimage.2010.07.033 NEIMEF 1053-8119 Google Scholar

69. 

W. D. Penny et al., Statistical Parametric Mapping: The Analysis of Functional Brain Images, Elsevier(2011). Google Scholar

70. 

S. Esmaeilzadeh et al., “End-to-end Alzheimer’s disease diagnosis and biomarker identification,” in Proc. Int. Workshop Mach. Learn. Med. Imaging, 337 –345 (2018). https://doi.org/10.1007/978-3-030-00919-9_39 Google Scholar

Biography

Amir Ebrahimi received his master’s degree in digital electronics from Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran, in 2012. Currently, he has been a PhD student in information technology since 2018 at the University of Newcastle, Callaghan, New South Wales, Australia. He has industrial experience in software/hardware programming for several companies. His current research interests are image processing, computer vision, machine learning, and deep learning.

Suhuai Luo received his bachelor’s and master’s degrees in electrical engineering from Nanjing University of Posts and Telecommunications and his PhD from the University of Sydney. He is currently an associate professor in information technology at the University of Newcastle. He has conducted studies in areas ranging from medical imaging for computer-aided diagnoses to computer vision for intelligent driving systems and machine learning for enhancing cybersecurity. His main research interests include machine learning and cyber security.

© 2021 Society of Photo-Optical Instrumentation Engineers (SPIE) 2329-4302/2021/$28.00 © 2021 SPIE
Amir Ebrahimi, Suhuai Luo, and Alzheimer’s Disease Neuroimaging Initiative "Convolutional neural networks for Alzheimer’s disease detection on MRI images," Journal of Medical Imaging 8(2), 024503 (29 April 2021). https://doi.org/10.1117/1.JMI.8.2.024503
Received: 2 January 2021; Accepted: 14 April 2021; Published: 29 April 2021
Lens.org Logo
CITATIONS
Cited by 33 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Magnetic resonance imaging

3D modeling

3D image processing

Alzheimer's disease

Data modeling

Brain

Convolutional neural networks

Back to Top