TemporalMesh Transformer: 29.4 PPL at 48% compute — beats Mamba, new open-source architecture
#3 opened about 2 hours ago
by
vigneshwar234
Context Length for 2X6000 Pros (2x96 = 192GB VRAM)
3
#2 opened about 1 month ago
by
mtcl
licence?
1
#1 opened about 1 month ago
by
celikburak