I’m Yiwei Zhang, currently a senior undergraduate researcher from the Department of Computer Science and Technology, Tsinghua University, working with Dr. Chuang Gan in the field of multimodal machine learning and computer vision.

In 2019, I spent 6 months at MultiComp Lab, where I was supervised by Professor Louis-Philippe Morency.

Here is my Curriculum Vitae.


In general, my research interest lies in the field of Machine Learning, in particular multimodal machine learning and its applications.

From a theoretical perspective, I am interested in understanding the computational and statistical principals in multimodal learning and building systems that can process and relate information from multiple modalities.

From an application perspective, I apply these principals to solve problems in Computer Vision, Natural Language Processing, and Speech such as multimodal language modeling, multi-modality learning from videos, and audio-visual embodied indoor navigation.


Watch, Reason and Code: Learning to Represent Videos Using Program.
Xuguang Duan, Qi Wu, Chuang Gan, Yiwei Zhang, Wenbing Huang, Anton van den Hengel, and Wenwu Zhu.

ACM International Conference on Multimedia (ACM MM), October 2019

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Gan Chuang*, Yiwei Zhang*, Jiajun Wu, Boqing Gong, and Joshua Tenenbaum.

International Conference on Robotics and Automation (ICRA), June 2020

Under review:

Factorized Multimodal Transformer for Multimodal Sequential Learning
Amir Zadeh, Chengfeng Mao, Kelly Shi, Yiwei Zhang, Paul Pu Liang, Soujanya Poria, and Louis-Philippe Morency.

Under review in Information Fusion


Language Technology Inititute, Carnegie Mellon University

Jul. 2019 - Dec. 2019

Department of Computer Science and Technology, Tsinghua University

Jul. 2018 - Jun. 2019

  • Directed by Dr. Chuang Gan
  • Research on multimodal machine learning and computer vision.