Alzheimer's disease (AD) is a degenerative disease of the nervous system that often occurs in the elderly. As magnetic resonance imaging (MRI) and positron emission tomography (PET) reflect the brain's anatomical changes and functional changes caused by AD, they are often used to diagnose AD. Multimodal fusion based on these two types of images can effectively utilize complementary information and improve diagnostic performance. To avoid the computational complexity of the 3D image and expand training samples, this study designed an AD diagnosis framework based on a 2.5D convolutional neural network (CNN) to fuse multimodal data. First, MRI and PET were preprocessed with skull stripping and registration. After that, multiple 2.5D patches were extracted within the hippocampus regions from both MRI and PET. Then, we constructed a multimodal 2.5D CNN to integrate the multimodal information from MRI and PET patches. We also utilized a training strategy called branches pre-training to enhance the feature extraction ability of the 2.5D CNN by pre-training two branches with corresponding modalities individually. Finally, the results of patches are used to diagnose AD and progressive mild cognitive impairment (pMCI) patients from normal controls (NC). The experiments were conducted with the ADNI dataset, and accuracies of 92.89% and 84.07% were achieved in the AD vs. NC and pMCI vs. NC tasks. The results are much better than using single modality and indicate that the proposed multimodal 2.5D CNN could effectively integrate complementary information from multi-modality and yield a promising AD diagnosis performance.