Changan Chen

Changan Chen 陈昌安

I am a postdoctoral researcher at Stanford University hosted by Prof. Fei-Fei Li and Prof. Ehsan Adeli. I received my PhD from UT Austin advised by Prof. Kristen Grauman. I am broadly interested at building machine learning models that perceive the world with multi-modalities and interact with the world. Currently, I work on multimodal perception and generation for 3D scenes and humans.

Previously, I spent five months working with Prof. Andrea Vedaldi and Dr. Natalia Neverova at FAIR, London. I was a visiting researcher at FAIR working with Prof. Kristen Grauman for two years. In my undergrad, I spent a wonderful year working with Prof. Greg Mori on sports video analysis and efficient deep learning, eight months working with Prof. Alexandre Alahi on social navigation in crowds, and eight months working with Prof. Manolis Savva on relational graph reasoning for navigation.

My first name is pronounced as /tʃæn'æn/ with the g being silent.

Research opportunities: I am happy to collaborate with motivated undergrad and master students at Stanford. I am also happy to answer questions about my research. If you are interested, please send me an email.

Photo credit: Jasmin Zhang

News

June 2024	Receiving the Distinguished Paper Award at EgoVis Workshop, CVPR 2024 for SoundSpaces 2.0.
May 2024	Joining Stanford Vision and Learning Lab as a postdoc researcher!
May 2024	I defended my PhD dissertation 4D Audio-Visual Learning: A Visual Perspective of Sound Propagation and Production!
March 2024	We are organizing the first Multimodalities for 3D Scenes (M3DS) workshop at CVPR 2024!
March 2024	We are organizing the fifth Embodied AI workshop at CVPR 2024!
October 2023	Organizing the second AV4D workshop at ICCV 2023!
June 2023	Giving one keynote talk at Ambient AI Workshop, ICASSP23, and one at Sight and Sound Workshop, CVPR23
Feb 2023	Co-organizing Embodied AI Workshop and the 3rd SoundSpaces Challenge at CVPR 2023!
Jan 2023	Co-organizing L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality at ICASSP 2023!
October 2022	We are organizing the first AV4D: Visual Learning of Sounds in Spaces workshop at ECCV 2022!
July 2022	Joining FAIR London for summer internship!
March 2022	I am very honored to receive the 2022 Adobe Research Fellowship!
Feb 2022	Organizing the second SoundSpaces Challenge at the Embodied AI Workshop, CVPR 2022!
Feb 2021	Organizing the first SoundSpaces Challenge at the Embodied AI Workshop, CVPR 2021!
May 2020	Joining Facebook AI Research as a visiting researcher

Selected Publications | All Publications

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
Changan Chen*, Juze Zhang*, Shrinidhi Kowshika Lakshmikanth*, Yusu Fang, Ruizhi Shao, Gordon Wetzstein, Li Fei-Fei, Ehsan Adeli
CVPR 2025
paper | project
Media coverage:

Self-Supervised Cross-View Correspondence with Predictive Cycle Consistency
Alan Baade and Changan Chen
CVPR 2025 (Highlight)
paper

HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness
Zihui Xue, Mi Luo, Changan Chen, Kristen Grauman
NeurIPS 2024
paper | project

	Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos Changan Chen, Puyuan Peng, Ami Baid, Sherry Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman ECCV 2024 (Oral) paper \| project \| data \| code
	Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction Changan Chen, Jordi Ramos, Anshul Tomar, Kristen Grauman IROS 2024* paper \| project
	Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, ..., Changan Chen, ..., Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray CVPR 2024 (Oral) website \| paper \| video
	Novel-View Acoustic Synthesis Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi CVPR 2023 paper \| project \| code \| data

	SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman NeurIPS 2022* Distinguished Paper Award at EgoVis Workshop, CVPR 2024 paper \| project \| website \| code
	Visual Acoustic Matching Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman CVPR 2022 (Oral) paper \| video \| project \| code Media coverage:

Semantic Audio-Visual Navigation
Changan Chen, Ziad Al-Halah, Kristen Grauman
CVPR 2021
paper | project | code

	Learning to Set Waypoints for Audio-Visual Navigation Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman ICLR 2021 paper \| project \| code
	SoundSpaces: Audio-Visual Navigation in 3D Environments Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman ECCV 2020 (Spotlight) paper \| project \| code \| website Media coverage:
	Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning Changan Chen, Yuejiang Liu, Sven Kreiss, Alexandre Alahi ICRA 2019 paper \| code
	Constraint-Aware Deep Neural Network Compression Changan Chen, Frederick Tung, Naveen Vedula, and Greg Mori ECCV 2018 paper \| code

Invited Talks

March 2025	Invited talk at UT Austin, "Multimodal Perception and Generation from Spaces to Humans"
Dec 2024	Invited talk at LASER, "4D Audio-Visual Learning: A Visual Perspective of Sound Propagation and Production"
Aug 2024	Invited talk at CCRMA, Stanford, "4D Audio-Visual Learning: A Visual Perspective of Sound Propagation and Production"
March 2024	Invited talk at MIT, "4D Audio-Visual Perception: Simulating, Synthesizing, and Navigating with Sounds in Spaces"
Feb 2024	Invited talk at Stanford, "4D Audio-Visual Perception: Simulating, Synthesizing, and Navigating with Sounds in Spaces"
Feb 2024	Invited talk at Berkeley, "4D Audio-Visual Perception: Simulating, Synthesizing, and Navigating with Sounds in Spaces"
Dec 2023	Invited talk at NYU, "4D Audio-Visual Perception: Simulating, Synthesizing and Navigating with Sounds in Spaces"
June 2023	Keynote talk at PerDream Workshop, ICCV 2023, "Audio-Visual Embodied AI: From Simulating to Navigating with Sounds in Spaces" (Slides)
June 2023	Keynote talk at Sight and Sound Workshop, CVPR 2023, "Novel-view Acoustic Synthesis" (Slides)
June 2023	Keynote talk at Ambient AI Workshop, ICASSP 2023, "Visual-acoustic Learning" (Slides)
Feb 2023	Invited talk at Texas Acoustics, "Visual Learning of Sound in Spaces" (Slides)
Jan 2023	Invited talk at MIT, "Visual Learning of Sound in Spaces" (Slides)
Nov 2022	Invited talk at FAIR, Meta AI, "Visual Learning of Sound in Spaces" (Slides)
June 2022	Oral talk at CVPR 2022, "Visual Acoustic Matching" (Slides)
June 2021	Invited talk at Facebook Reality Labs, "Learning Audio-Visual Dereverberation" (Slides)
June 2021	Invited talk at EPIC Workshop, CVPR 2021, "Semantic Audio-Visual Navigation" (Slides)
Sept. 2020	Invited talk at CS391R: Robot Learning at UT Austin, ""Audio-Visual Navigation" (Slides)
Dec. 2018	Invited talk at SwissAI Meetup, "Navigation in Crowds: From 2D Navigation to Visual Navigation"
Nov. 2018	Invited talk at Swiss Machine Learning Day, "Crowd-aware Robot Navigation with Attention-based DRL"

Affiliations


ZJU, China 2014-2016	SFU, Canada 2016-2019	EPFL, Switzerland 2018	FAIR, USA & UK 2020 - 2022	UT AUSTIN, USA 2019 - 2024	Stanford, USA 2024 - present

Template credits: Unnat, Jon