I am a postdoctoral researcher at Stanford University
hosted by Prof. Fei-Fei Li and Prof. Ehsan Adeli.
I received my PhD from UT Austin
advised by Prof. Kristen Grauman.
I am broadly interested at building machine learning models that perceive the world with
multi-modalities and interact with the world.
Currently, I work on multimodal perception and generation for 3D scenes and humans.
Previously, I spent five months working with Prof. Andrea Vedaldi
and Dr. Natalia Neverova at FAIR, London.
I was a visiting researcher at FAIR
working with Prof. Kristen Grauman for two years.
In my undergrad, I spent a wonderful year working with Prof.
Greg Mori on sports video analysis and efficient deep learning, eight months working with
Prof. Alexandre Alahi on social navigation in
crowds, and eight months working with Prof. Manolis
Savva on relational graph reasoning for navigation.
My first name is pronounced as /tʃæn'æn/ with the g being silent.
Research opportunities: I am happy to collaborate with motivated undergrad and master students at Stanford.
I am also happy to answer questions about my research. If you are interested, please send me an email.
CV | E-Mail | Google Scholar |
Github |
Twitter |
Dissertation
|
Photo credit: Jasmin Zhang
|
|
The Language of Motion:
Unifying Verbal and Non-verbal Language of 3D Human Motion
Changan Chen*, Juze Zhang*, Shrinidhi Kowshika Lakshmikanth*, Yusu Fang, Ruizhi Shao,
Gordon Wetzstein, Li Fei-Fei, Ehsan Adeli
CVPR 2025
paper |
project
Media coverage:
|
|
Self-Supervised Cross-View Correspondence with Predictive Cycle Consistency
Alan Baade and Changan Chen
CVPR 2025 (Highlight)
paper
|
|
HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness
Zihui Xue, Mi Luo, Changan Chen, Kristen Grauman
NeurIPS 2024
paper |
project
|
|
Action2Sound: Ambient-Aware Generation of
Action Sounds from Egocentric Videos
Changan Chen*, Puyuan Peng*, Ami Baid, Sherry Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman
ECCV 2024 (Oral)
paper |
project |
data |
code
|
|
Sim2Real Transfer for Audio-Visual Navigation with
Frequency-Adaptive Acoustic Field Prediction
Changan Chen*, Jordi Ramos*, Anshul Tomar*, Kristen Grauman
IROS 2024
paper |
project
|
|
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, ..., Changan Chen, ...,
Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella,
Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park,
James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray
CVPR 2024 (Oral)
website |
paper |
video
|
|
Novel-View Acoustic Synthesis
Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna
Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi
CVPR 2023
paper |
project |
code |
data
|
|
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen*, Carl Schissler*, Sanchit Garg*, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman
NeurIPS 2022
Distinguished Paper Award at EgoVis Workshop, CVPR 2024
paper |
project |
website |
code
|
|
Visual Acoustic Matching
Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman
CVPR 2022 (Oral)
paper |
video |
project |
code
Media coverage:
|
|
Semantic Audio-Visual Navigation
Changan Chen, Ziad Al-Halah, Kristen Grauman
CVPR 2021
paper |
project |
code
|
|
Learning to Set Waypoints for Audio-Visual Navigation
Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao,
Santhosh K. Ramakrishnan, Kristen Grauman
ICLR 2021
paper |
project |
code
|
|
SoundSpaces: Audio-Visual Navigation in 3D Environments
Changan Chen*, Unnat Jain*, Carl Schissler, Sebastia Vicenc Amengual Gari,
Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
ECCV 2020 (Spotlight)
paper |
project |
code |
website
Media coverage:
|
|
Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep
Reinforcement Learning
Changan Chen, Yuejiang Liu, Sven Kreiss, Alexandre Alahi
ICRA 2019
paper |
code
|
|
Constraint-Aware Deep Neural Network Compression
Changan Chen, Frederick Tung, Naveen Vedula, and Greg Mori
ECCV 2018
paper |
code
|
March 2025 |
Invited talk at UT Austin, "Multimodal Perception and Generation from Spaces to Humans"
|
Dec 2024 |
Invited talk at LASER, "4D Audio-Visual Learning: A Visual Perspective of Sound Propagation and Production"
|
Aug 2024 |
Invited talk at CCRMA, Stanford, "4D Audio-Visual Learning: A Visual Perspective of Sound Propagation and Production"
|
March 2024 |
Invited talk at MIT, "4D Audio-Visual Perception: Simulating, Synthesizing, and Navigating with Sounds in Spaces"
|
Feb 2024 |
Invited talk at Stanford, "4D Audio-Visual Perception: Simulating, Synthesizing, and Navigating with Sounds in Spaces"
|
Feb 2024 |
Invited talk at Berkeley, "4D Audio-Visual Perception: Simulating, Synthesizing, and Navigating with Sounds in Spaces"
|
Dec 2023 |
Invited talk at NYU, "4D Audio-Visual Perception: Simulating, Synthesizing and Navigating with Sounds in Spaces"
|
June 2023 |
Keynote talk at PerDream Workshop, ICCV 2023, "Audio-Visual Embodied AI: From Simulating to Navigating with Sounds in Spaces" (Slides)
|
June 2023 |
Keynote talk at Sight and Sound Workshop, CVPR 2023, "Novel-view Acoustic Synthesis" (Slides)
|
June 2023 |
Keynote talk at Ambient AI Workshop, ICASSP 2023, "Visual-acoustic Learning" (Slides)
|
Feb 2023 |
Invited talk at Texas Acoustics, "Visual Learning of Sound in Spaces" (Slides)
|
Jan 2023 |
Invited talk at MIT, "Visual Learning of Sound in Spaces" (Slides)
|
Nov 2022 |
Invited talk at FAIR, Meta AI, "Visual Learning of Sound in Spaces" (Slides)
|
June 2022 |
Oral talk at CVPR 2022, "Visual Acoustic Matching" (Slides)
|
June 2021 |
Invited talk at Facebook Reality Labs, "Learning Audio-Visual Dereverberation" (Slides)
|
June 2021 |
Invited talk at EPIC Workshop, CVPR 2021, "Semantic Audio-Visual Navigation" (Slides)
|
Sept. 2020 |
Invited talk at CS391R: Robot Learning at UT Austin, ""Audio-Visual Navigation" (Slides)
|
Dec. 2018 |
Invited talk at SwissAI Meetup, "Navigation in Crowds: From 2D Navigation to Visual Navigation"
|
Nov. 2018 |
Invited talk at Swiss Machine Learning Day, "Crowd-aware Robot Navigation with Attention-based DRL"
|
 
|
 
|
 
|
 
|
 
|
 
|
ZJU, China 2014-2016 |
SFU, Canada 2016-2019 |
EPFL, Switzerland 2018 |
FAIR, USA & UK 2020 - 2022 |
UT AUSTIN, USA 2019 - 2024 |
Stanford, USA 2024 - present |
|