HybridEdge-COVID: Fair Compression Benchmarking, Calibration Aware Uncertainty Quantification, and Radiologist-Validated  Explainability for Trustworthy Edge-Deployed COVID-19 Chest X-Ray  Screening

Bharat Tank; Mitul Patel; Soumya Das

PDF

Published: 2026-06-12

Keywords:

Fair benchmarking; Edge AI; Trustworthy AI; Calibration; Reliability diagram; Uncertainty quantification; COVID-19; Chest X-ray; Hybrid CNN; Monte Carlo Dropout; TOST equivalence; DeLong AUC; Grad-CAM++; TFLite; Raspberry Pi 4; Pareto-optimal; LMIC deployment

Bharat Tank

Ph.D. Scholar, Electronics and Communication Engineering, Parul Institute of Engineering & Technology, Faculty of Engineering & Technology, Parul University, Vadodara, Gujarat, India

Mitul Patel

Assistant Professor, Electronics and Communication Engineering, Parul Institute of Engineering & Technology, Faculty of Engineering & Technology, Parul University, Vadodara, Gujarat, India

Soumya Das

PG Scholar (IMCA), Integrated Master of Computer Science, Parul University, Vadodara, Gujarat, India

Abstract

Background: COVID-19 diagnostic capacity is still very limited in LMICs. Systematic optimism bias is  introduced in prior studies of edge-AI sensors by compressing proposed models more aggressively than baseline  models, a methodological quirk that is not clearly addressed in the COVID-19 chest X-ray (CXR) literature.  Calibrated uncertainty, clinical relevance of uncertainty, and explainability of AI predictions to clinicians – in the  form of saliency maps – are also critical features of trustworthy medical AI that have been lacking in previous  COVID-19 CXR edge-AI benchmarking studies. Methods: The key methodological advancement is the  application of a three-step Edge-Aware Optimisation Pipeline (dynamic-range quantisation, INT8 quantisation aware training and structured L1-norm channel pruning) to all seven architectures to remove any systematic  benchmarking bias. As a secondary contribution we propose a lightweight hybrid CNN (1.91M parameters)  combining SqueezeNet Fire, MobileNetV2 inverted residual bottlenecks, and Squeeze-and-Excitation channel wise attention, as an example to illustrate the evaluation framework that can evaluate any hybrid CNN. The  statistical validation consists of testing for equivalence (using TOST with margin Δ = ±1.0 pp), pairwise AUC  comparison (DeLong), and Bonferroni-corrected McNemar's tests. The trustworthiness is evaluated through  ECE/Brier Score/MCE calibration analysis, Monte Carlo Dropout uncertainty quantification (50 passes) and risk  coverage deferral analysis. Performance is tested using 5-fold stratified cross validation (COVID-Xray-5k; n =  5,000; 95% bootstrap CIs) and external testing on COVIDx CXR-3 (n = 13,870; 16,352 unique patients). Grad CAM++ explainability maps are double-blinded validated by two board-certified radiologists with Cohen's kappa  inter-rater agreement. Results: Under uniform compression benchmarking, HybridEdge-COVID achieves 97.84  ± 0.31% CV accuracy (95% CI: 97.21–98.47%), AUC 0.981 ± 0.009, and MCC 0.957 ± 0.013. ResNet18  (98.12%) and ResNet50 (98.23%) have higher AUC point estimate accuracy, which is explicitly stated; DeLong  analysis shows no significant difference in AUC from ResNet18 or EfficientNet-Lite0 after Bonferroni correction;  and TOST confirms at least the equivalence in AUC, with a range of ±1.0 pp for four out of six comparisons. The  results of calibration analysis show that the ECE is 0.022, which proves that the multi-stage compression does not  affect the reliability of probability. The Monte Carlo Dropout uncertainty estimates increase the misclassified  cases by 4.25×, which results in a 10%-referral deferral workflow with retained accuracy of around 98.5%.  External validation yields 91.30% accuracy (95% CI: 90.73–91.87%) and AUC 0.943. On Raspberry Pi 4 (< USD  55): 8.93 s/100 images, 47.2 MB peak RAM, 4.8 MB model — Pareto-optimal among all 7 evaluated  architectures. Dual-radiologist Grad-CAM++ validation: κ = 0.71 (95% CI: 0.61–0.81; substantial agreement),  76.9% clinical feature consistency. Conclusions: This study presents a fair compression benchmarking  framework, calibration-aware uncertainty quantification and rigorous statistical validation and preliminary  radiologist-validated explainability for trustworthy edge-deployed COVID-19 CXR screening. In addition to the  proposed HybridEdge-COVID architecture, the most important contribution is the reproducible evaluation  methodology that will allow scientifically fair comparisons of the performance of edge medical AI systems.  Limitations: binary classification only, no prospective clinical validation, preliminary XAI by two radiologists,  transformer baselines not evaluated by compression pipeline, Grad-CAM++ not on-device. Before considering  any deployment to clinical use, there needs to be multi-centre prospective validation.

Issue

Vol. 6 No. 2 (2026): Volume 6, Issue 2, 2026 (July-December)

Section

Articles

How to Cite

Tank, B., Patel, M., & Das, S. (2026). HybridEdge-COVID: Fair Compression Benchmarking, Calibration Aware Uncertainty Quantification, and Radiologist-Validated Explainability for Trustworthy Edge-Deployed COVID-19 Chest X-Ray Screening. International Journal of Aquatic Research and Environmental Studies, 6(2), 185-209. https://injoere.com/index.php/injoere/article/view/717

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

How to Cite

Similar Articles