Papers - University - Hong Kong University of Science and Te
updated
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper
• 2404.02731
• Published • 1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
• 2309.12284
• Published • 19
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
for Text-to-Speech Synthesis
Paper
• 2404.03204
• Published • 9
Adapting LLaMA Decoder to Vision Transformer
Paper
• 2404.06773
• Published • 18
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
Computer Environments
Paper
• 2404.07972
• Published • 52
RegionGPT: Towards Region Understanding Vision Language Model
Paper
• 2403.02330
• Published • 2
Dynamic Typography: Bringing Words to Life
Paper
• 2404.11614
• Published • 46
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper
• 2404.14047
• Published • 45
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper
• 2404.14700
• Published • 32
Interactive3D: Create What You Want by Interactive 3D Generation
Paper
• 2404.16510
• Published • 21
LLaVA-OneVision: Easy Visual Task Transfer
Paper
• 2408.03326
• Published • 61