清华大学 Jifeng DAI--Home-- Multimodal General Perception Models

中文

Research Focus

This research explores unified task representations, network architectures, and training methodologies for visual and vision-language multitask learning. It aims to build general models for multimodal tasks, and to design a new paradigm of universal perception based on large models—enabling general capabilities for open-world and open-ended tasks.

Representative Works:

Unified Pretraining Algorithm for Vision-Language Multimodal Foundation Models

VL-BERT: Pre-training of Generic Visual-Linguistic Representations
[7th Most Influential Paper at ICLR 2021]

Unified Representations for General Visual Tasks, a single model architecture and shared parameters used to solve diverse multimodal tasks

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
[NeurIPS 2022 Spotlight paper]

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
[CVPR 2023 Highlight paper]

Vision-Language Large Models for Open-World Tasks

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

+

Doctoral Degree in Engineering

Jifeng DAI

Click:

The Last Update Time:--

All rights reserved. Department of electronic engineering, Tsinghua University Address: Tsinghua University, Haidian District, Beijing, 100084