Hi there! My name is Hang Wu (吴杭), you can also call me by my English name Laurent.

I am currently a first-year PhD student at the University of California, Merced, conducting research under the guidance of Prof. Yiwei Wang, with Prof. Ming-Hsuan Yang as my senior advisor. Additionally, I work closely with Prof. Yujun Cai. My main research interests are in vision-language models and large multimodal models, with a focus on improving their performance and specific applications. I received my bachelor’s degree from Tongji University, where I worked on image processing tasks in the low-level vision field.

You can find my CV here: Hang Wu’s Curriculum Vitae. If you are interested in my work, please feel free to drop me an email.

🔥 News

2025.09: Three papers submitted to ICLR 2026.
2025.08: 🎉🎉 Our paper DiMo-GUI has been accepted to EMNLP 2025 Main Conference!
2025.08: Officially join UC Merced NLP Lab and start my PhD journey.
2025.05: Two papers submitted to EMNLP 2025.
2025.04: Thrilled to accept PhD offer from UC Merced. Looking forward to working and living in CA!
2025.03: Join vivo as a Research Intern!
2024.11: One paper submitted to CVPR 2025.

📝 Publications

RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation

Hang Wu, Yujun Cai, Haonan Ge, Hongkai Chen, Ming-Hsuan Yang, Yiwei Wang$^{\dagger}$

Arxiv 2025

Benchmark refinement: Enforces consistent option granularity, unified evaluation dimensions, and mutual exclusivity for greater dataset reliability.
Baseline analysis: Thoroughly evaluates ShotVL, revealing weaknesses in reasoning, prompt adherence, and output consistency.
Evaluation expansion: Adds a protocol assessing both task-specific performance and core model competencies, enabling more balanced and robust comparisons.

FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning

Haonan Ge, Yiwei Wang, Kai-Wei Chang, Hang Wu, Yujun Cai$^{\dagger}$

Arxiv 2025

We introduce FiCOT, a reasoning paradigm enabling dynamic visual evidence gathering during inference.
We propose DRFS, a training methodology for learning adaptive sampling policies. We also develop DRFS-GRPO,anefficient reinforcement learning algorithm for training complex perception-reasoning policies from sparse rewards.

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning

Hang Wu, Hongkai Chen$^{\dagger}$, Yujun Cai, Chang Liu, Qingwen Ye, Ming-Hsuan Yang, Yiwei Wang

EMNLP 2025 Main Conference

We propose DiMo-GUI, a training-free framework that can be seamlessly integrated as a plug-and-play component into any GUI agent.
Without requiring additional training or external data, DiMo-GUI effectively enhances grounding performance across various GUI tasks.

Structured Attention Matters to Multimodal LLMs in Document Understanding

Chang Liu，Hongkai Chen$^{\dagger}$, Yujun Cai, Hang Wu, Qingwen Ye, Ming-Hsuan Yang, Yiwei Wang

Arxiv 2025

Our work investigates a fundamental yet overlooked aspect: how input format influences document comprehension performance.
We propose a novel structurepreserving approach that encodes document elements using the LATEX paradigm, maintaining the hierarchical organization and spatial relationships critical for comprehension.

🎓 Educations

2025.08 - Present, PhD student, University of California, Merced.
2021.09 - 2025.06, Undergraduate student, Tongji University.

💻 Internships

2025.03 - 2025.07, Research Intern, vivo@Shenzhen, China.
2025.01 - 2025.07, Research Intern, UC Merced NLP Lab@University of California-Merced, Remote.
2023.09 - 2025.03, Research Intern, Ni’s Group@Tongji University, Shanghai, China.

📚 Projects

Research on Perception-oriented High Dynamic Range Imaging Systems

National-level Innovation Project

We treat artifacts in HDR images as detectable entities, explicitly detect and suppress them to enhance HDR quality.
National-level innovation project at Tongji University, with a funding of 10,000 RMB.

📖 Services

Conference Reviewer

The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

Teaching

CSE-022: Introduction to Programming, Teaching Assistant (Fall 2025, UC Merced).

🎨 About Me

I’m ESFJ.
I come from Maanshan, Anhui Province, China.
I’m a huge sports fan and enjoy doing many kinds of sports in my spare time, including basketball🏀, soccer⚽️, waterpolo🤽‍♂️, swimming🏊, badminton🏸…
I love pop music and R&B, with Ed Sheeran and The Weeknd as my favorite English artists, David Tao and Khalil Fong as my favorite Chinese artists.