We've been treating sign language as pictures; treating it like grammar just broke the scaling wall.
April 14, 2026
Original Paper
State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition
arXiv · 2604.08761
The Takeaway
By shifting from atomic image recognition to modeling 'phonological compositionality' (handshape/movement), researchers achieved a massive jump in ASL accuracy. It proves linguistic structure is the key to unlocking vocabulary-scale recognition.
From the abstract
Sign language recognition suffers from catastrophic scaling failure: models achieving high accuracy on small vocabularies collapse at realistic sizes. Existing architectures treat signs as atomic visual patterns, learning flat representations that cannot exploit the compositional structure of sign languages-systematically organized from discrete phonological parameters (handshape, location, movement, orientation) reused across the vocabulary. We introduce PHONSSM, enforcing phonological decompos