Hi, I'm Chan Hee (Luke) Song.
I am a final-year PhD candidate at The Ohio State University advised by Yu Su.
My research focuses on multimodal agents, particularly on planning, perception, and reasoning.
I am interning at Google Research, and previously interned at NVIDIA Research and Adobe Research.
August 2025
Honored to serve as an Area Chair for ICLR 2026.
July 2025
Checkout Mind2Web 2, a rigorous benchmark for Deep Research and Agentic Search!
June 2025
Selected to attend CVPR Doctoral Consortium to connect with senior researchers. I'll be mentored by Prof. Katerina Fragkiadaki (CMU).
April 2025
Released Online-Mind2Web benchmark, showing current web agents are far less capable than reported.
March 2025
Interning at Google Cloud AI Research this summer working on multimodal agents. Catch me (again) in Seattle!
Feb 2025
RoboSpatial (Oral) has been accepted to CVPR 2025 with a perfect 5,5,5 score!
Feb 2025
VisualAgentBench has been accepted to ICLR 2025.
Nov 2024
Excited to present RoboSpatial, a work partly done at Nvidia. We present a large-scale 2D/3D spatial understanding dataset and benchmark tailored for robotics. Stay tuned for the full release!
Jun 2024
BioCLIP won the best student paper award at CVPR 2024! Honored to be part of the team.
Feb 2024
I will be interning at Nvidia Learning and Perception Research Group this summer. Catch me in Seattle!
Jul 2023
LLM-Planner, a paper on using large language models for vison-and-language navigation accepted to ICCV 2023.
Mar 2023
Our SalsaBot work for Amazon Alexa Prize Challenge has been accepted to the Embodied AI Workshop at CVPR 2023!
Mar 2023
I will be interning at Adobe Research this summer. Catch me in San Jose!
See full list in Publications.
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
CVPR 2025 Oral (0.74%) / Part of GR00T N1.5 Paper Website Code Data
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
ICLR 2025 Featured in Stanford AI Index Report 2025 Paper Code
BioCLIP: A Vision Foundation Model for the Tree of Life
CVPR 2024 Best Student Paper Award (0.03%) Paper Website Code Data
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
ICCV 2023 Top 5 Most Cited AI Paper on arXiv (2022) Paper Website Code
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
CVPR 2022 Paper
Feel free to contact me on anything including collaboration, job opportunities, or just a friendly hello :)