Patients and imaging protocol
We retrospectively enrolled patients who underwent both PET/CT and MRI at the Peking Union Medical College Hospital. The PET/CT scan was performed on a Siemens BioGraph PET/CT scanner (Siemens Healthineers, Erlangen, Germany). The whole-body PET/CT scanning protocol was described in a previous study . A separate head CT imaging was also performed, and a full head coverage from vertex to skull base was achieved. The voltage output of the X-Ray generator was 120 kVp and the X-Ray tube current was 300 mAs. The head CT images had the voxel spacing (resolution) of 0.6 × 0.6 × 1.5 mm3 with dimensions of 512 × 512 × 148. The MRI scan was performed on a Toshiba Vantage Titan 3 T scanner (Canon Medical Systems, Tochigi, Japan). Both T1-weighted (T1-w) and T2-weighted (T2-w) MR images were acquired. T1-w images were obtained using the sequence with the following parameters: TR 2100 ms, TE 10 ms, TI 900 ms. T2-w images used these parameters: TR 4650 ms, TE 95 ms. For both T1-w images and T2-w images, the dimensions were 640 × 640 × 24 and the voxel spacing was 0.36 × 0.36 × 6 mm3. During post-processing, both CT images and MR images were reconstructed to have the same resolution of 1 × 1 × 2 mm3, resulting in a single voxel volume of 2 mm3.
MRI segmentation with an atlas label method
Shattuck et al.  proposed a new MRI analysis tool, BrainSuite, that produced cortical surface representation with spherical topology from human head MR images. The tool could perform accurate brain segmentations in a single package based on a sequence of low-level operations. The operations included skull and scalp removal, image nonuniformity compensation, voxel-based tissue classification, topological correction, rendering, and editing functions. Later, Shattuck et al.  proposed BrainSuite13, a collection of software tools for jointly processing and visualizing structural and diffusion MRI of the human brain.
In our study, we used BrainSuite13 to perform the brain segmentation task in MRI. First, full-head T1-w MR images were processed to achieve automated cortical surface extraction. Then, the generated cortical mesh models were registered spatially to a labeled brain atlas, which included eight different brain anatomical structures, i.e., hemisphere, hippocampus, basal ganglia, and cerebellum, all split into left and right. The atlas was from a single subject and the registration was performed using a combined surface/volume procedure . After the registration, the labels of the surface and volume were transferred from the atlas to the subject, segmenting the subject MRI into the delineated region of interest (ROI). For the ROI boundaries to conform to the bottoms of the sulcal valleys, cortical surfaces were refined locally at the mid-cortical surface using geodesic curvature flow .
CT segmentation with convolutional neural networks
In this study, we utilized two convolutional neural networks (CNNs) (the DenseVNet and the 3D U-Net) to accomplish the segmentation of brain anatomical regions in CT. First, 90 patients with non-contrast computed tomography (NCCT) images were enrolled. The CNNs were trained and tested on this 90-patient data set. Then, the CT images of 18 patients, whose acquisition details were described in chapter 2.1, were used as an independent testing data set. Later, the segmentation results obtained on this 18-patient CT data set with the trained CNN model, in addition to the MRI segmentation results of the same 18 patients, were used to conduct head-to-head volumetric comparisons. The trained CNN was embedded into the NovoStroke Kit (NSK) software (research-only prototype, GE Healthcare, China). The details of data acquisition, data preprocessing, and model training and testing are discussed below.
To train the CNNs, 90 patients were enrolled from two separate stroke centers. All enrolled patients underwent both non-contrast computed tomography (NCCT) and computed tomography perfusion (CTP). NCCT images were used for the brain segmentation task. 44 NCCT datasets from center A were acquired on a GE Revolution CT scanner (voltage: 120 kVp, current: 225 mAs) with a voxel spacing of 0.5 × 0.5 × 2.5 mm3 and dimensions of 512 × 512 × 64. 46 datasets from center B were acquired on a GE Revolution CT scanner (voltage: 120 kVp, current: 174 mAs) with a voxel spacing of 0.5 × 0.5 × 5 mm3 and dimensions of 512 × 512 × 32. 90 patients were split into the training set with 81 patients and the testing set with 9 patients.
The ground truth was defined by manual annotation by a neuroradiologist with more than 20 years of experience. Each axial slice was annotated, resulting in a segmentation of eight brain anatomical regions: basal ganglia, cerebellum, hemisphere, and hippocampus, all split into left and right. The same regions were segmented in the MRI atlas method. The annotation was performed by using the Medical Imaging Interaction Toolkit 2018 (MITK 2018) software.
Before training and testing, all 90 datasets were preprocessed by several operations. Firstly, all 3D image data were resampled to obtain the same voxel spacing of 0.5 × 0.5 × 5 mm3 by linear interpolation. Secondly, a Gaussian filter with sigma = 0.5 was utilized to remove the noise in CT images. Thirdly, we used skull-stripping to eliminate the skull region of the head so only the soft brain tissues remained . Finally, the brain parenchyma was refined by threshold method. All CT values which were not in the range of [0, 120] were reset to zero.
Model training and testing
The CNNs used were the DenseVNet  and 3D U-Net . Figure 1 shows the network architecture of the DenseVNet used in our study. It consisted of 5 key features, including batch-wise spatial dropout, dense feature stacks, V-network downsampling and upsampling, dilated convolution, and an explicit spatial prior. Structure of the 3D U-Net can be found in .
We trained the network on the framework of Niftynet 1.0.4 version. The graphic card used for training was NVIDIA Quadro P3200, which had 8 GB memory. The spatial window size was set as 200 × 200 × 64 for DenseVNet and as 96 × 96 × 96 for 3D U-Net. The window size was kept the same for training and testing for both network models. Also, the batch size was set as 1 for both network models. In addition, we chose the Adam gradient descent algorithm to reduce the training errors. For the first 2000 epochs, we chose 0.001 as the learning rate and 50 datasets for training. In the next 2000 epochs, we adapted the learning rate as 0.00025 and used the rest 31 datasets for training. The whole training costed nearly 7 h for DenseVNet and 4 h for 3D U-Net. This strategy was used for both networks and we used it to elevate the performance of segmenting the left and right hippocampus. A comparative experiment (on DenseVNet only) was conducted where the performance was compared between this strategy and the one using 4000 epochs and the constant learning rate of 0.001. The performance of CNN was evaluated by using the Dice similarity score .
Evaluation of the CT segmentation method and volumetric comparison to MRI results
We assessed the performance of the CT segmentation method by using an independent testing set, which was described in the patients and imaging protocol section. Performance metrics identical to those during the training and testing phase were recorded. The CNN with superior results would be used as the CT segmentation method for the following experiments. Since enrolled patients were required to undergo both MRI and PET/CT, we then performed a head-to-head comparison of the voxel volumes of the segmented brain structures between the CT method and the MRI method. The MRI results were used as references. Correlation was assessed by conducting non-parametric correlation (Spearman’s rho). For interpreting correlation coefficients, values less than 0.4, between 0.4 and 0.7, between 0.7 and 0.9, and greater than 0.9 are indicative of weak, moderate, strong, and very strong correlation, respectively . Intraclass correlation (ICC) was calculated to assess the agreement based on a two-way mixed, absolute agreement, single measures model . Based on the 95% confident interval (CI), values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.9 are indicative of poor, moderate, good, and excellent reliability, respectively . Besides, Student’s t test or the Wilcoxon Rank Sum Test was also conducted to verify the difference between two methods, depending on if the data were normally distributed.
Statistical analysis was performed on MedCalc 19.1 (Ostend, Belgium). Statistical significance was considered for a p value less than 0.05.