ReSET: Accurate Latency-Critical NVFP4 Reasoning via Step-Aware Temperature Scaling Paper • 2606.13233 • Published 23 days ago • 2
Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving Paper • 2606.06302 • Published 19 days ago • 10