Hi, I'm Chan Hee (Luke) Song.

I am a CS PhD student at The Ohio State University advised by Yu Su.

My research focuses on multimodal agents, particularly on planning, perception, and benchmarking.

During my undergraduate at Notre Dame, I was part of the ND NLP.

I have interned at Nvidia Research and Adobe Research.

Find me on , , and .


What's New

March 2025

Interning at Google Cloud AI Research this summer working on multimodal agents. Catch me (again) in Seattle!

Feb 2025

RoboSpatial has been accepted to CVPR 2025 with a perfect 5,5,5 score!

Feb 2025

VisualAgentBench has been accepted to ICLR 2025.

Nov 2024

Excited to present RoboSpatial, a work done at Nvidia. We present a large-scale 2D/3D spatial understanding dataset and benchmark tailored for robotics. Stay tuned for the full release!

Jun 2024

BioCLIP won the best student paper award at CVPR 2024! Honored to be part of the team.

Feb 2024

BioCLIP, a biology vision foundation model (Oral), and Dual-VCR, a dual-view web-navigation method (Poster) have been accepted to CVPR 2024!

Feb 2024

I will be interning at Nvidia Learning and Perception Research Group this summer. Catch me in Seattle!

Jul 2023

LLM-Planner, a paper on using large language models for vison-and-language navigation accepted to ICCV 2023.

Mar 2023

Our SalsaBot work for Amazon Alexa Prize Challenge has been accepted to the Embodied AI Workshop at CVPR 2023!

Mar 2023

I will be interning at Adobe Research this summer. Catch me in San Jose!


Selected Publications

See full list in Publications.

  • RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

    Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su, Stan Birchfield

  • VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

    Xiao Liu, Tianjie Zhang, Yu Gu, Iat Long Iong, Yifan Xu, Xixuan Song, Shudan Zhang, Hanyu Lai, Xinyi Liu, Hanlin Zhao, Jiadai Sun, Xinyue Yang, Yu Yang, Zehan Qi, Shuntian Yao, Xueqiao Sun, Siyi Cheng, Qinkai Zheng, Hao Yu, Hanchen Zhang, Wenyi Hong, Ming Ding, Lihang Pan, Xiaotao Gu, Aohan Zeng, Zhengxiao Du, Chan Hee Song, Yu Su, Yuxiao Dong, Jie Tang

  • BioCLIP: A Vision Foundation Model for the Tree of Life

    Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su

  • Dual-View Visual Contextualization for Web Navigation

    Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao

  • LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

    Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su

  • One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

    Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su

Contact
Email: 1ch[LAST_NAME]@gmail.com

Feel free to contact me if you are interested in my research or want to discuss anything :)