AGVBench: A Reliability-Oriented Benchmark of Data Augmentation for Vein Recognition

Authors: Haiyang Li, Yuming Fu, Qun Song, Hongchao Liao, Jing Chen, Mounim A. El-Yacoubi, Xin Jin

arXiv ID: 2607.02271

Problem: Data augmentations inherited from natural image tasks can disrupt the fine-grained vascular topology and textures critical for identity discrimination in vein recognition.

Key Methodology:

Evaluated 30 augmentation strategies across 5 public palm/finger-vein datasets (VERA220, TJU600, SCUT1100, FV-USM, SDUMLA-HMT) with 7 backbone architectures (ResNet18, MobileNetV2, ViT-S, Swin-T, plus vein-specific FVRASNet, AMPVNet, StarLKNet-S)
Multi-dimensional assessment across recognition performance, calibration (ECE), corruption robustness (19 corruptions at 3 severity levels), adversarial robustness (FGSM & PGD white-box attacks at ϵ=0.2/255), occlusion robustness (0–50% masking), and computational efficiency
Introduces a Pareto-based APEX rank to jointly evaluate accuracy vs. training time, memory, and FLOPs trade-offs

Key Results:

Multi-image mixing methods dominate clean accuracy: PuzzleMix 95.55% (VERA220, R18), MixUp 95.27% (VERA220, R18), StarMixup 96.27% (VERA220, APN), and PuzzleMix 96.02% (TJU600, SLK-S)
These same top-accuracy mixup methods show poor calibration (high ECE) and severe adversarial vulnerability to FGSM/PGD attacks - a clear accuracy–security decoupling
Severe geometric transforms (Flip, Rotate, Translation) consistently degrade recognition across datasets
Vein models suffer catastrophic performance collapse even at lowest corruption severity (level C1/C2), motivating custom low-intensity corruption tiers
Label enhancement methods (LabelSmoothing, DirichletLabelSmooth) offer strong calibration benefits (e.g., LabelSmoothing 94.88% on TJU600 R18 with competitive EER)

Applied Context: Builders should not optimize for clean accuracy alone when selecting augmentations for vein recognition; multi-image mixing yields top accuracy but introduces calibration and adversarial risk, so production systems need a multi-objective evaluation including calibration, corruption, and attack robustness before deployment.