CLOSED Call for Papers: Special Issue on Transformer Models in Vision

Share this on:

Submissions Due: 15 February 2022

Transformer models have recently demonstrated exemplary performance on a broad range of language tasks such as text classification, machine translation, and question answering. These breakthroughs in the natural language processing (NLP) domain have sparked great interest in the computer vision community to investigate these models for vision and multi-modal learning tasks. However, visual data follows a typical structure (such as spatial and temporal coherence), thus demanding novel network designs and training schemes. As a result, transformer models and their variants have been successfully used for image recognition, object detection, segmentation, image super-resolution, video understanding, image generation, text-image synthesis and visual question answering.

Among their salient benefits, transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequences, as compared to recurrent networks such as long short-term memory (LSTM). Different from convolutional neural networks, transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the relatively straightforward design of transformers allows processing multiple modalities (such as images, videos, text, and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets.

This special issue seeks original contributions towards advancing the theory, architecture, and algorithmic design for transformer models in computer vision, as well as novel applications and use cases. We envision original and well-motivated adaptations of transformer models for vision tasks and efforts towards improving their accuracy, robustness, and efficiency. The special issue will provide a timely collection of recent advances to benefit the researchers and practitioners working in the broad research field of computer vision, pattern analysis, and machine intelligence. Topics of interest include (but are not limited to):

Theoretical insights into transformer-based models
Efficient transformer architectures, including novel mechanisms for self-attention
Novel transformer models for spatial (image) and temporal (video) data modeling
Visualizing and interpreting transformer networks
Generative models for transformer networks
Hybrid network designs combining the strengths of transformer models with convolutional and graph-based models
Unsupervised, weakly supervised, and semi-supervised learning with transformer models
Multi-modal learning combining visual data with text, speech, and knowledge graphs
Leveraging multi-spectral data like satellite imagery and infrared images in transformer models for improved semantic understanding of visual content
Transformer-based designs for low-level vision problems such as image super- resolution, deblurring, de-raining, and denoising
Novel transformer-based methods for high-level vision problems such as object detection, segmentation, activity recognition, and pose estimation
Transformer models for volumetric, mesh, and point-cloud data processing in 3D and 4D data regimes

Important Dates

Open for submissions: 15 October 2021
Submissions due: 15 February 2022
Preliminary notification: 15 March 2022
Revisions due: 15 May 2022
Final notification: 30 June 2022
Publication (tentative): November 2022

Submission Guidelines

For author information and guidelines on submission criteria, visit the Author Information page. Please submit papers through the ScholarOne system, and be sure to select the special-issue name. Manuscripts should not be published or currently submitted for publication elsewhere. Please submit only full papers intended for review, not abstracts, to the ScholarOne portal.

Questions?

Contact the guest editors:

Ashish Vaswani, Research Scientist, Google Brain (USA)
Fahad Shahbaz Khan, Associate Professor, Mohamed Bin Zayed University of Artificial Intelligence (UAE), Linköping
University (Sweden)
Ming-Hsuan Yang, Professor University of California, Merced (USA); Research Scientist, Google (USA)
Mubarak Shah, Trustee Chair Professor, University of Central Florida (USA)
Niki Parmar, Research Scientist, Google Research (USA)
Salman Khan, Assistant Professor, Mohamed Bin Zayed University of Artificial Intelligence (UAE), Australian National University (Australia)

CLOSED Call for Papers: Special Issue on Transformer Models in Vision

Important Dates

Submission Guidelines

Questions?

Recommended by IEEE Computer Society

How to Set Green Procurement Goals for Your Business

Demystifying the Evolution of Software Architecture: A Journey Through Technological Advancements

Keynotes Announced for IEEE SustainTech Leadership Forum

The Importance of ESG Sustainability Reporting and Where to Start

Harnessing Data for Human-Centered Design

Driving Female STEM Engagement: The IT Workshop for Young Teenagers

IEEE CS Emerging Tech Grant Supports “Hack the Metaverse”

Ethical Considerations in Deploying Large Language Models within Business Intelligence Systems