Zhou Yu is currently with School of Computer Science and Technology, in Hangzhou Dianzi University (HDU). He received his Bachelor Degree in Digital Media and Ph.D. Degree in Computer Science from Zhejiang University (Hangzhou, China) in 2010 and 2015, respectively. His Ph.D. advisor is Prof. Yueting Zhuang and Prof. Fei Wu. Before joining HDU, he was a senior algorithm engineer in Alibaba inc.

He mainly applies machine learning and deep learning techniques to bridge vision and language. His research interests include multimodal learning, visual question answering, visual grounding, visual captioning, cross-media retrieval and high-dimensional hashing indexing, etc. His research results have expounded in 30+ publications at prestigious conferences and journals (e.g., CVPR, ICCV, SIGIR, ACM Multimedia, IEEE TMM, TNNLS), and achieved 1800+ citations from global peers. Also, he has served as reviewers for a number of journals and conferences, including IEEE Trans. on Image Processing (TIP), IEEE Trans. on Multimedia (TMM), IEEE Trans. on Circuits and Systems for Video Technology (TCSVT), Information Sciences, Signal Processing, Neurocomputing, and CVPR, AAAI, IJCAI, ACMMM, etc.

Selected Publications

Oct. 2021
Yuhao Cui, Zhou Yu, Chunqi Wang, Zhongzhou Zhao, Ji Zhang, Meng Wang, Jun Yu, "ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge", ACM International Conference on Multimedia (ACM MM) , Chengdu, China, 2021.
The first VLP approach that incorporates cross- and intra-modal knowledge simultaneously.
Paper  Project 
Oct. 2020
Zhou Yu, Yuhao Cui, Jun Yu, Dacheng Tao, Qi Tian, "Deep Multimodal Neural Architecture Search", ACM International Conference on Multimedia (ACM MM) , Virtual, 2020.
The first deep NAS approach for universal multimodal learning tasks.
Paper  Project 
Jun. 2019
Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, Tian Qi, "Deep Modular Co-Attention Networks for Visual Question Answering", IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, USA, 2019.
The solution of the winner award (1st place) in VQA Challenge 2019
Paper  Project  Slides
Dec. 2018
Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, Dacheng Tao, "Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering", IEEE Transactions on Neural Networks and Learning Systems (TNNLS) , 29(12): 5947-5959, 2018.
The solution of the runner-up awards (2nd place) in VQA Challenge 2017 and VQA Challenge 2018
Paper  Project  Slides 2017  Slides 2018 
Jul. 2018
Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, Dacheng Tao, "Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding", International Joint Conference on Artificial Intelligence (IJCAI) , Stockholm, Sweden, 2018.
A simple yet strong baseline for visual grounding.
Paper  Project 


Sep. 2021
ACM Hangzhou Rising Star Award
One of the three recipients for talented young researchers in Zhejiang Province.
Jun. 2019
VQA Challenge 2019 Winner Award
Our team MIL@HDU won the campionship award of the international Visual Question Answering competition (VQA Challenge 2019).
Jun. 2018
VQA Challenge 2018 Runner-up Award
Our team HDU-UCAS-USYD won the runner-up award of the international Visual Question Answering competition (VQA Challenge 2018).
Jul. 2017
VQA Challenge 2017 Runner-up Award
Our team HDU-USYD-UNCC won the runner-up Award of the international Visual Question Answering competition (VQA Challenge 2017).