- Submissions Due: 31 May 2023
- Publication: November/December 2024
Pre-training model has recently become a new advanced paradigm of deep model initialization that establishes the state-of-the-art performance for many multimedia analysis tasks. Within the trend, a variety of large-scale pre-trained models have been developed and deployed to promote model robustness and uncertainty estimates in various multimedia applications. Among them, BERT, GPT, ViT, UNITER and their variants have achieved a great success and become new milestones in the vision, language, and various multimedia fields. By storing sophisticated knowledge into a large number of parameters, pre-trained models are capable of capturing semantic relations among a large number of labeled and unlabeled data with self-supervised learning in advance, and then provide stronger representations for a variety of downstream tasks. Although there is an emerging trend of pre-training models, it remains under-explored aspects of different models for various applications. Therefore, it is highly demanding to make a comprehensive review and comparison of the latest breakthroughs and designs of model pre-training, including theories, algorithms, and applications. Moreover, building advanced pre-training architectures and predicting new research directions in new research fields are of prominent importance in multimedia intelligence.
This special issue will offer a timely collection of original contributions of works to benefit the researchers and practitioners in the research fields of multi-modal learning and multimedia understanding in intelligent systems. The concerned research problems should be covered by the multimedia community as well as the topic of interest of IEEE Intelligent Systems.
Topics of interest include, but are not limited to:
- New architectures, theories, and applications on multi-modal model pre-training
- Fine-tuning and adaptation for multi-modal model pre-training
- Efficient multi-modal pre-training architectures
- Knowledge distillation and model compression for multi-modal model pre-training
- Cognitive- or knowledge-inspired multi-modal pre-training architectures
- Applications of multi-modal model pre-training in various multimedia areas
- Survey or review recent advancements on a multi-modal pre-trained model
For author information and guidelines on submission criteria, please visit the IS Author Information page. Please submit papers through the ScholarOne system, and be sure to select the special-issue name. Manuscripts should not be published or currently submitted for publication elsewhere. Please submit only full papers intended for review, not abstracts, to the ScholarOne portal.
Email the guest editor at firstname.lastname@example.org.
- Can Wang (lead), Griffith University, Gold Coast, QLD, Australia
- Zheng Zhang, Harbin Institute of Technology, Shenzhen, China
- Lei Zhu, Shandong Normal University, Jinan, China
- Jianxin Li, Deakin University, Melbourne, Australia