Anomaly detection algorithms based on deep neural networks have achieved favorable performance in finding abnormal events in surveillance video. Recently, end-to-end methods that combine feature extraction, model learning, and anomaly scoring into one training procedure have become popular. However, most existing research studies have relied on a deep convolutional structure, which faces overfitting problems for a limited training set. An anomaly detection algorithm based on the state-of-the-art prediction framework is proposed, leveraging the gap between frame prediction and its ground truth to detect abnormal events. The residual block is transferred from image classification, and we modify its modules to suit the prediction application in order to tackle the difficulties in training a deeper prediction network. As far as we know, the proposed method is the first anomaly detection residual network trained from scratch, which is different from several existing ones with fixed resnet-50 layers as feature extractor. Furthermore, a new perceptual constraint focusing on high-level information is proposed and combined with the commonly used spatial–temporal constraints. Experimental results on challenging public surveillance sequences verify that our proposed framework can effectively produce state-of-the-art performance.