Linkedin

Scaling datasets efficiently with different distributions

Date: 26/02/2026

Bakary Badjie, LASIGE’s PhD student, published a paper entitled “Decoupling Mixture-of-Experts Routing from Gradient Noise: A Framework for Structured Specialization and Soft Generalization Toward Robust and Efficient Inference”, in the Expert Systems With Applications, a top-ranked journal (impact factor 8.665), co-authored by José Cecílio and António Casimiro, both LASIGE integrated members.

The paper provides a new approach to improving the performance and interpretability of Mixture-of-Experts (MoE) models in deep learning. It introduces SEAS-GMoE (Structured Expert Assignment and Supervised Gating Mixture of Experts), a framework designed to overcome unstable expert specialization and routing bias in conventional MoEs. The framework uses a double-stage feature clustering (K-means + KNN refinement) combined with semantic pseudo-labeling via a Siamese Neural Network (SNN) to create meaningful, semantically coherent expert groups. A supervised managing (gating) network learns to route inputs to these experts through a bidirectional training process, improving both specialization and routing stability while decoupling them from gradient noise.

During inference, soft expert routing aggregates predictions probabilistically, leading to smoother generalization across inputs and improved robustness to out-of-distribution and zero-shot scenarios.