Hi, I'm Chan Hee (Luke) Song.송찬희.

I am a researcher at NVIDIA Metropolis, focusing on spatial understanding in vision-language models and their applications in smart spaces, robotics, and other interactive physical environments.

I received my PhD from The Ohio State University, where I was advised by Yu Su.

I have interned at Google Research, Adobe Research, and NVIDIA Research.

I'm always happy to discuss potential collaborations on related topics. Feel free to reach out via the email below.

Find me on , , and .


What's New

April 2026

Serving as an Area Chair for NeurIPS 2026

Feburary 2026

We’re excited to announce the Embodied Reasoning in Action Workshop at CVPR 2026, where RoboSpatial will be featured as one of the official challenges - join us and submit your work!

Feburary 2026

Happy to have Watch and Learn and SpaceTools accepted to CVPR 2026!

October 2025

Watch and Learn, a framework that learns from Internet videos to enhance computer use agents, was featured in VentureBeat!

September 2025

Thrilled to see that RoboSpatial is used by Qwen3-VL and Gemini Robotics! Spatial understanding is the next frontier for MLLMs.

September 2025

Excited to organize Eval&Deploy workshop at CoRL 2025. Also attended by first robotics conference.

August 2025

Honored to serve as an Area Chair for ICLR 2026.

July 2025

Checkout Mind2Web 2, a rigorous benchmark for Deep Research and Agentic Search!

June 2025

Selected to attend CVPR Doctoral Consortium to connect with senior researchers. I'll be mentored by Prof. Katerina Fragkiadaki (CMU).

April 2025

Released Online-Mind2Web benchmark, showing current web agents are far less capable than reported.

March 2025

Interning at Google Cloud AI Research this summer working on multimodal agents. Catch me (again) in Seattle!

Feb 2025

RoboSpatial (Oral) has been accepted to CVPR 2025 with a perfect 5,5,5 score!

Feb 2025

VisualAgentBench has been accepted to ICLR 2025.

Nov 2024

Excited to present RoboSpatial, a work partly done at Nvidia. We present a large-scale 2D/3D spatial understanding dataset and benchmark tailored for robotics. Stay tuned for the full release!

Jun 2024

BioCLIP won the best student paper award at CVPR 2024! Honored to be part of the team.

Feb 2024

BioCLIP, a biology vision foundation model (Oral), and Dual-VCR, a dual-view web-navigation method (Poster) have been accepted to CVPR 2024!

Feb 2024

I will be interning at Nvidia Learning and Perception Research Group this summer. Catch me in Seattle!

Jul 2023

LLM-Planner, a paper on using large language models for vison-and-language navigation accepted to ICCV 2023.

Mar 2023

Our SalsaBot work for Amazon Alexa Prize Challenge has been accepted to the Embodied AI Workshop at CVPR 2023!

Mar 2023

I will be interning at Adobe Research this summer. Catch me in San Jose!


Selected Publications

† indicates co-corresponding author. ‡ indicates project lead.

See full list in Publications.

  • Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

    Cheolhong Min, Jaeyun Jung, Daeun Lee, Hyeonseong Jeon, Yu Su, Jonathan Tremblay, Chan Hee Song†‡, Jaesik Park†

  • SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

    Siyi Chen, Mikaela Angelina Uy, Chan Hee Song, Faisal Ladhak, Adithyavairavan Murali, Qing Qu, Stan Birchfield, Valts Blukis, Jonathan Tremblay

  • RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

    Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su, Stan Birchfield

  • BioCLIP: A Vision Foundation Model for the Tree of Life

    Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su

  • LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

    Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su

Contact
Email: lu[LAST_NAME] at nvidia dot com