1 69 1

Rui Sun

ThreeSR

https://threesr.github.io/

AI & ML interests

Vision and Language Multimodal Learning, CV, NLP, LLM

Recent Activity

upvoted a paper 3 days ago

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

upvoted a paper 3 days ago

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

updated a collection 8 days ago

New Papers

View all activity

Organizations

upvoted 2 papers 3 days ago

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Paper • 2606.02031 • Published 7 days ago • 18

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Paper • 2606.03264 • Published 6 days ago • 15

upvoted 4 papers 22 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 27 days ago • 191

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

Paper • 2605.07177 • Published May 8 • 62

Teaching Language Models to Think in Code

Paper • 2605.07237 • Published 28 days ago • 30

From Web to Pixels: Bringing Agentic Search into Visual Perception

Paper • 2605.12497 • Published 27 days ago • 14

upvoted 3 papers about 1 month ago

Image Generators are Generalist Vision Learners

Paper • 2604.20329 • Published Apr 22 • 21

Co-Director: Agentic Generative Video Storytelling

Paper • 2604.24842 • Published Apr 27 • 16

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

Paper • 2604.25256 • Published Apr 28 • 29

upvoted 11 papers about 2 months ago

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Paper • 2604.08224 • Published Apr 9 • 52

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

Paper • 2604.08455 • Published Apr 9 • 47

DMax: Aggressive Parallel Decoding for dLLMs

Paper • 2604.08302 • Published Apr 9 • 53

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

Paper • 2604.08539 • Published Apr 9 • 50

Rui Sun

AI & ML interests

Recent Activity

Organizations

ThreeSR's activity