Synthesize fast English speech from typed text
Music Generation Foundation Model v1.5
Steering Vision Representations with Language