An autonomous agent reveals that domain-specific molecular architectures are largely unnecessary; standard transformers with better tuning outperform custom designs.
March 31, 2026
Original Paper
What an Autonomous Agent Discovers About Molecular Transformer Design: Does It Transfer?
arXiv · 2603.28015
The Takeaway
Through 3,106 experiments, the study found that architectural innovations found for SMILES transfer perfectly to English text, and vice versa. This suggests the ML community should focus on learning rates and schedules rather than building specialized 'molecular' transformers.
From the abstract
Deep learning models for drug-like molecules and proteins overwhelmingly reuse transformer architectures designed for natural language, yet whether molecular sequences benefit from different designs has not been systematically tested. We deploy autonomous architecture search via an agent across three sequence types (SMILES, protein, and English text as control), running 3,106 experiments on a single GPU. For SMILES, architecture search is counterproductive: tuning learning rates and schedules al