AI & ML Paradigm Shift

SIGMA resolves 'trajectory divergence' in molecular string generation by enforcing geometric symmetry recognition through contrastive learning.

March 27, 2026

Original Paper

SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning

Xinyu Wang, Fei Dou, Jinbo Bi, Minghu Song

arXiv · 2603.25062

The Takeaway

It addresses a fundamental flaw in chemical language models where different string linearizations of the same molecular graph lead to different latent representations. This allows autoregressive models to maintain the efficiency of strings while gaining the structural fidelity of graph-based approaches.

From the abstract

Linearized string representations serve as the foundation of scalable autoregressive molecular generation; however, they introduce a fundamental modality mismatch where a single molecular graph maps to multiple distinct sequences. This ambiguity leads to \textit{trajectory divergence}, where the latent representations of structurally equivalent partial graphs drift apart due to differences in linearization history. To resolve this without abandoning the efficient string formulation, we propose S