CLOSED Call for Papers: Special Issue on Transformer Models in Vision
Share this on:
Submissions Due: 15 February 2022
Transformer models have recently demonstrated exemplary performance on a broad range of language tasks such as text classification, machine translation, and question answering. These breakthroughs in the natural language processing (NLP) domain have sparked great interest in the computer vision community to investigate these models for vision and multi-modal learning tasks. However, visual data follows a typical structure (such as spatial and temporal coherence), thus demanding novel network designs and training schemes. As a result, transformer models and their variants have been successfully used for image recognition, object detection, segmentation, image super-resolution, video understanding, image generation, text-image synthesis and visual question answering.
Among their salient benefits, transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequences, as compared to recurrent networks such as long short-term memory (LSTM). Different from convolutional neural networks, transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the relatively straightforward design of transformers allows processing multiple modalities (such as images, videos, text, and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets.
This special issue seeks original contributions towards advancing the theory, architecture, and algorithmic design for transformer models in computer vision, as well as novel applications and use cases. We envision original and well-motivated adaptations of transformer models for vision tasks and efforts towards improving their accuracy, robustness, and efficiency. The special issue will provide a timely collection of recent advances to benefit the researchers and practitioners working in the broad research field of computer vision, pattern analysis, and machine intelligence. Topics of interest include (but are not limited to):
Theoretical insights into transformer-based models
Efficient transformer architectures, including novel mechanisms for self-attention
Novel transformer models for spatial (image) and temporal (video) data modeling
Visualizing and interpreting transformer networks
Generative models for transformer networks
Hybrid network designs combining the strengths of transformer models with convolutional and graph-based models
Unsupervised, weakly supervised, and semi-supervised learning with transformer models
Multi-modal learning combining visual data with text, speech, and knowledge graphs
Leveraging multi-spectral data like satellite imagery and infrared images in transformer models for improved semantic understanding of visual content
Transformer-based designs for low-level vision problems such as image super- resolution, deblurring, de-raining, and denoising
Novel transformer-based methods for high-level vision problems such as object detection, segmentation, activity recognition, and pose estimation
Transformer models for volumetric, mesh, and point-cloud data processing in 3D and 4D data regimes
Open for submissions: 15 October 2021 Submissions due: 15 February 2022
Preliminary notification: 15 March 2022
Revisions due: 15 May 2022
Final notification: 30 June 2022
Publication (tentative): November 2022
For author information and guidelines on submission criteria, visit the Author Information page. Please submit papers through the ScholarOne system, and be sure to select the special-issue name. Manuscripts should not be published or currently submitted for publication elsewhere. Please submit only full papers intended for review, not abstracts, to the ScholarOne portal.