Deep Neural Network Approach for Real-Time Abnormal Activity and Crime Detection in Surveillance Videos through Video Summarization
Main Article Content
Abstract
The exponential growth of surveillance video data in modern smart city infrastructure has rendered manual video monitoring impractical and error-prone. Video summarization is the task of condensing lengthy surveillance footage into compact, informative representations has emerged as a critical solution. This paper presents a novel video summarization framework driven by the concept of abnormal activity detection, wherein only frames containing behaviorally anomalous events are retained in the final summary. The proposed system integrates a Convolutional Neural Network (CNN) feature extractor, a Long Short-Term Memory (LSTM) sequence model, and an attention-guided scoring mechanism to identify temporal segments of interest. An unsupervised Gaussian Mixture Model (GMM) is employed to model normal activity patterns, enabling detection of statistical deviations that signal abnormal behavior. The framework is evaluated on the UCSD Anomaly Detection, UCF-Crime, and ShanghaiTech Campus datasets, achieving a frame-level detection AUC of 92.4% and a summarization compression ratio of 87.3% while retaining 96.1% of ground-truth abnormal events. Comparative experiments demonstrate significant improvements over baseline key-frame and unsupervised summarization methods. The results confirm that abnormality-centric summarization produces semantically richer and forensically more actionable video summaries than purely aesthetic or redundancy-driven approaches.
Article Details
Section
How to Cite
References
1. Tickner, A. H., & Poulton, E. C. (1973). Monitoring up to 16 synthetic television pictures showing a mixture of real and simulated targets. Ergonomics, 16(2), 179–199.
2. Zhuang, Y., Rui, Y., Huang, T. S., & Mehrotra, S. (1998). Adaptive key frame extraction using unsupervised clustering. In Proceedings of the International Conference on Image Processing (ICIP), Vol. 1, pp. 866–870.
3. Mundur, P., Rao, Y., & Yesha, Y. (2006). Keyframe-based video summarization using Delaunay clustering. International Journal on Digital Libraries, 6(2), 219–232.
4. Zhang, K., Chao, W. L., Sha, F., & Grauman, K. (2016). Video summarization with long short-term memory. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 766–782.
5. Mahasseni, B., Lam, M., & Todorovic, S. (2017). Unsupervised video summarization with adversarial LSTM networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 202–211.
6. Apostolidis, E., Adamantidou, E., Metsai, A. I., Mezaris, V., & Patras, I. (2021). Video summarization using deep neural networks: A survey. Proceedings of the IEEE, 109(11), 1838–1863.
7. Mehran, R., Oyama, A., & Shah, M. (2009). Abnormal crowd behavior detection using social force model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 935–942.
8. Cong, Y., Yuan, J., & Liu, J. (2011). Sparse reconstruction cost for abnormal event detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3449–3456.
9. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A. K., & Davis, L. S. (2016). Learning temporal regularity in video sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 733–742.
10. Luo, W., Liu, W., & Gao, S. (2017). A revisit of sparse coding based anomaly detection in stacked RNN framework. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 341–349.
11. Gong, D., Liu, L., Le, V., Saha, B., Mansour, M. R., Venkatesh, S., & Hengel, A. v. d. (2019). Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1705–1714.
12. Sultani, W., Chen, C., & Shah, M. (2018). Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6479–6488.
13. Feng, L., Li, Z., Kuang, Z., & Zhang, W. (2018). Extracting video highlights via learned temporal attention. arXiv preprint arXiv:1806.09208.
14. Mahadevan, V., Li, W., Bhalodia, V., & Vasconcelos, N. (2010). Anomaly detection in crowded scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1975–1981.
15. Liu, W., Luo, W., Lian, D., & Gao, S. (2018). Future frame prediction for anomaly detection — a new baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6536–6545.