EDUCATIONAL AI FOR ATTENTION-ENHANCED FACIAL EMOTION RECOGNITION FOR EMOTION-AWARE LEARNING SYSTEMS USING FACE-CROPPED DEEP NETWORKS
Main Article Content
Abstract
Emotion-aware learning systems depend on reliable recognition of students' affective states. However, facial emotion recognition in child-centred settings remains difficult due to background clutter, class imbalance, limited annotated data, and subtle variations in facial expressions. This study investigates whether a face-centric, attention-enhanced deep learning framework can improve recognition performance and convergence efficiency for educational artificial intelligence applications. The study employs the Multimodal Child Emotion for Learning dataset and implements a pipeline that uses Multi-Task Cascaded Convolutional Networks to detect and crop faces, an EfficientNet-B0 backbone to learn facial features, an Efficient Channel Attention module to strengthen discriminative channel representations, and a staged training procedure involving classifier-head training followed by full-network fine-tuning; experiments are conducted in PyTorch with ImageNet-pretrained weights, Adam optimization, cross-entropy loss, and augmentation through horizontal flipping and colour jittering. The results show that face detection successfully localised 826 of 833 images, baseline validation accuracy improved from approximately 14% to 68% after integrating MTCNN-based face cropping and ECA, and the final proposed configuration reached 77.78% validation accuracy after excluding the severely underrepresented fear class. Staged training achieved the 0.60 validation-accuracy threshold in 6 epochs rather than 10 and reduced total training time from 22.79 to 20.02 minutes, equivalent to a 12.15% reduction. Ablation analysis showed that validation accuracy declined by 9.8% without face cropping, 6.2% without channel attention, and 4.1% without staged training. The findings quantitatively demonstrate that the combined framework improves recognition reliability, accelerates convergence, and achieves the strongest performance across all tested configurations in the evaluated educational dataset.
JEL Classification Codes: A2, O31, O32, H52.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Abdeldayem, M., Hamed, H. F. A., & Nagy, A. M. (2025). Facial Expression Recognition: A Survey of Techniques, Datasets, and Real-World Challenges. Statistics, Optimisation & Information Computing, 15(1), 733–761. https://doi.org/10.19139/soic-2310-5070-2789
Almodhwahi, M. A., & Wang, B. (2025). A Facial-Expression-Aware Edge AI System for Driver Safety Monitoring. Sensors, 25(21), 6670. https://doi.org/10.3390/s25216670
Aung, P. P. W., Kulinan, A. S., Park, M., Ko, D., Cha, G., & Park, S. (2026). Mitigating class imbalance in deep learning-based multi-class structural damage recognition using an informatics-oriented data augmentation framework. Advanced Engineering Informatics, 71, 104430. https://doi.org/10.1016/j.aei.2026.104430
Bondalapati, A., & Khadija, S. (2026). Deep Learning Approaches and Technical Challenges in Facial Emotion Recognition (FER). In K. Slimani, V. Aseri, & S. Khoulji (Eds.), Emotion and Facial Recognition in Artificial Intelligence: Sustainable Multidisciplinary Perspectives and Applications (Vol. 78, pp. 317–327). Springer Nature Switzerland. https://doi.org/10.1007/978-3-032-14778-3_16
Caton, S., & Haas, C. (2024). Fairness in Machine Learning: A Survey. ACM Computing Surveys, 56(7), 1–38. https://doi.org/10.1145/3616865
Chandra, R., Sanjaya, K., Aravind, A., Abbas, A. R., Gulrukh, R., & Kumar, T. S. S. (2023). Algorithmic Fairness and Bias in Machine Learning Systems. E3S Web of Conferences, 399, 04036. https://doi.org/10.1051/e3sconf/202339904036
Choudhary, K., & Prajapati, G. L. (2026). Affect Recognition in Deaf Children Using Physiological Signal Measurements. SN Computer Science, 7(2), 132. https://doi.org/10.1007/s42979-025-04711-w
Devasena, G., & Vidhya, V. (2025). Twinned attention network for occlusion-aware facial expression recognition. Machine Vision and Applications, 36(1), 23. https://doi.org/10.1007/s00138-024-01641-0
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2010.11929
Fahim, M. H., Donkol, A. A., Bedair, A., & Abdel-Nasser, M. (2025). Image Processing: A Decade-by-Decade Review with a Focus on Face Recognition. SVU-International Journal of Engineering Sciences and Applications, 6(2), 29–46. https://doi.org/10.21608/svusrc.2025.339279.1251
Hejazi, E., Ahmadi, M., & Ahmadi, A. (2026). Robust Face Recognition and Classification Under Occlusion Using a Refined Transformer-Based Attention Mechanism. IEEE Canadian Journal of Electrical and Computer Engineering, 49(1), 93–104. https://doi.org/10.1109/ICJECE.2026.3650868
Henke, A., Harley, J. M., Matin, N., Chevalère, J., Hafner, V. V., Pinkwart, N., & Lazarides, R. (2026). Multimodal perspectives on affective dynamics in an intelligent tutoring system. Learning and Instruction, 103, 102310. https://doi.org/10.1016/j.learninstruc.2025.102310
Hossain, M., Patwary, M. S. A., Hossain, M. M., & Rahman, R. M. (2026). Emotionally Aware Bangla Speech Systems: Real-Time SER with Adaptive Learning and LLM Integration. SN Computer Science, 7(2), 152. https://doi.org/10.1007/s42979-026-04744-9
Jagati, B. P., Patel, A. K., Tembhurne, J., & Goud, H. (2026). A Comprehensive Survey of Logit Adjustment Methods for Long-Tailed Recognition. https://doi.org/10.36227/techrxiv.177127395.56824258/v1
Kanaparthi, P. B., & Padamata, T. V. S. V. (2026). Lightweight Channel Attention for Efficient CNNs (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2601.01002
Khalid, S., Ur Rehman, M., Usman, A. B., & Khawar, M. (2026). Resource-efficient models for edge devices. In Edge Intelligence (pp. 111–153). Elsevier. https://doi.org/10.1016/B978-0-44-338297-0.00011-8
Kiran, M. A., Pittala, R. B., Thaile, M., Reddy, G. S., Shanoor, S., & Raj, E. N. (2025). Implementation of Real-Time Facial Emotion Recognition using Advanced Deep Learning Models. 2025 3rd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA), 1–6. https://doi.org/10.1109/AIMLA63829.2025.11040384
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
Kumari, R., Pallavi, P., & Saurabh, P. (2026). Enhancing face mask detection performance using stacked neural ensembles in mask fusion. Discover Artificial Intelligence, 6(1), 171. https://doi.org/10.1007/s44163-025-00826-4
Li, Y., Liu, H., Liang, J., & Jiang, D. (2025). Occlusion-Robust Facial Expression Recognition Based on Multi-Angle Feature Extraction. Applied Sciences, 15(9), 5139. https://doi.org/10.3390/app15095139
Liu, R., Pang, W., Chen, J., Balakrishnan, V. A. P., & Chin, H. L. (2025). The application of scaffolding instruction and AI-driven diffusion models in children's aesthetic education: A case study on teaching traditional Chinese painting of the twenty-four solar terms in Chinese culture. Education and Information Technologies, 30(7), 9129–9160. https://doi.org/10.1007/s10639-024-13135-7
Luo, Z., Luo, Y., Zou, X., Zhou, Q., Ke, S., & Gong, J. (2026). Lightweight Neural Network with Attention Mechanism for Enhanced Facial Expression Recognition. https://doi.org/10.21203/rs.3.rs-8225640/v1
Ma, L., Xu, C., Jiao, K., Pei, W., Zhang, H., Liu, L., Deng, B., & Wu, J. (2026). A Multi-Scale Object Detection Network with Integrated Spatial-Channel Collaborative Attention for Remote Sensing Images. Sensors, 26(4), 1370. https://doi.org/10.3390/s26041370
Mei, D., Gong, J., Qu, M., & Bian, S. (2025). The Evolution of Convolutional Neural Network Architectures: A Review. IEEE Access, 13, 200446–200496. https://doi.org/10.1109/ACCESS.2025.3631774
Muthulingam, G. A., Parvathy, V. S., Reddy, T. S. M., Sai Krishna, A. V., Venkatesu, D., & Kowshik, B. V. (2025). Inclusive Education via Emotion-Aware Tutoring using Real-Time Recognition. 2025 7th International Conference on Innovative Data Communication Technologies and Application (ICIDCA), 1362–1367. https://doi.org/10.1109/ICIDCA66325.2025.11280349
Naik, S., Bagayatkar, S., & Singh, P. (2026). Facial Emotion Recognition on FER-2013 using an EfficientNetB2-Based Approach (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2601.18228
Nguyen, D., Hoang, V.-D., & Le, V.-T.-L. (2026). Towards enhancing learning on imbalanced data: A novel adaptive weighting strategy. Neurocomputing, 673, 132886. https://doi.org/10.1016/j.neucom.2026.132886
Rehman, A., Mujahid, M., Elyassih, A., AlGhofaily, B., & Bahaj, S. A. O. (2025). Comprehensive Review and Analysis on Facial Emotion Recognition: Performance Insights into Deep and Traditional Learning with Current Updates and Challenges. Computers, Materials & Continua, 82(1), 41–72. https://doi.org/10.32604/cmc.2024.058036
Reza, M. H., Sibly, M. N. B., Rabbani, S. G., Sadi, S. H., Ahamed, M. F., Shafi, F. B., ... & Chowdhury, M. E. (2026). A comprehensive review of convolutional neural networks: foundations, enhancements and applications. Neural Computing and Applications, 38(4), 56. https://doi.org/10.1007/s00521-025-11827-w
Riviere, E., Courbois, Y., & Gentaz, E. (2026). The developmental changes in emotion recognition from human biological motion by children aged from 4 to 12 years. Emotion. https://doi.org/10.1037/emo0001626
Sarvakar, K., & Rana, K. (2025). Revolutionizing facial emotion recognition: In-depth analysis of cutting-edge models, methodologies, and datasets. Discover Artificial Intelligence, 5(1), 388. https://doi.org/10.1007/s44163-025-00553-w
Saurav, S., Saini, R., & Singh, S. (2024). An integrated attention-guided deep convolutional neural network for facial expression recognition in the wild. Multimedia Tools and Applications, 84(12), 10027–10069. https://doi.org/10.1007/s11042-024-19012-2
Shangguan, Z., Dong, Y., Guo, S., Leung, V. C. M., Deen, M. J., & Hu, X. (2026). Facial Expression Analysis and Its Potentials in IoT Systems: A Contemporary Survey. ACM Computing Surveys, 58(2), 1–39. https://doi.org/10.1145/3737456
Shingjergji, K., Iren, D., Urlings, C., & Klemke, R. (2025). Design Principles for Affective Online Learning. Technology, Knowledge and Learning. https://doi.org/10.1007/s10758-025-09867-1
Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. https://doi.org/10.48550/ARXIV.1905.11946
Villegas-Ch, W., Gutierrez, R., & Mera-Navarrete, A. (2025). Multimodal Emotional Detection System for Virtual Educational Environments: Integration Into Microsoft Teams to Improve Student Engagement. IEEE Access, 13, 42910–42933. https://doi.org/10.1109/ACCESS.2025.3546772
Wang, H., Wang, C., Xu, H., Bao, C., & Xu, X. (2026). Enhancing real-time facial emotion recognition in classrooms via Attention-ResNet optimization. The Visual Computer, 42(1), 46. https://doi.org/10.1007/s00371-025-04265-1
Zong, L., Nan, S. J., Die, Z. F., & Peng, H. J. (2026). DMSCA: Dynamic multi-scale channel-spatial attention for enhanced feature representation in convolutional neural networks. Scientific Reports, 16(1), 8044. https://doi.org/10.1038/s41598-026-37546-3