Transformer_Backbone

Visualization

Requirements

adjustText

Example

show.ipynb

Paper

CoAtNet
- CoAtNet: Marrying Convolution and Attention for All Data Sizes
- papers_with_code
ViT-G/14
- Scaling Vision Transformers
- paper
SwinV2
- Swin Transformer V2: Scaling Up Capacity and Resolution
- papers_with_code
ViT-MoE
- Scaling Vision with Sparse Mixture of Experts
- paper
Florence
- Florence: A New Foundation Model for Computer Vision
- paper
ALIGN
- Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
- papers_with_code
MViTv2
- Improved Multiscale Vision Transformers for Classification and Detection
- paper
MViT
- Multiscale Vision Transformers
- papers_with_code
BEiT
- BEiT: BERT Pre-Training of Image Transformers
- papers_with_code
Meta_Pseudo_Labels
- Meta Pseudo Labels
- papers_with_code
SAM
- Sharpness-Aware Minimization for Efficiently Improving Generalization
- papers_with_code
NoisyStudent
- Self-training with Noisy Student improves ImageNet classification
- papers_with_code
NFNet
- High-Performance Large-Scale Image Recognition Without Normalization
- papers_with_code
TokenLearner
- TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
- papers_with_code
BiT
- Big Transfer (BiT): General Visual Representation Learning
- papers_with_code
MAE
- Masked Autoencoders Are Scalable Vision Learners
- papers_with_code
Focal
- Focal Attention for Long-Range Interactions in Vision Transformers
- paper & code
MetaFormer
- MetaFormer is Actually What You Need for Vision
- papers_with_code
CSWin
- CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
- papers_with_code
Twins
- Twins: Revisiting the Design of Spatial Attention in Vision Transformers
- papers_with_code
Swin
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- papers_with_code
CaiT
- Going deeper with Image Transformers
- papers_with_code
CvT
- CvT: Introducing Convolutions to Vision Transformers
- papers_with_code
PvTv2
- PVTv2: Improved Baselines with Pyramid Vision Transformer
- papers_with_code
PvT
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- paper, code
SReT
- Sliced Recursive Transformer
- papers_with_code
ViT
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- papers_with_code
HRFormer
- HRFormer: High-Resolution Vision Transformer for Dense Predict
- paper & code
Conformer
- Conformer: Local Features Coupling Global Representations for Visual Recognition
- papers_with_code
FixEfficientNet
- Fixing the train-test resolution discrepancy: FixEfficientNet
- papers_with_code
EfficientNetV2
- EfficientNetV2: Smaller Models and Faster Training
- papers_with_code
EfficientNet
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
- papers_with_code
Pale
- Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
- paper
VOLO
- VOLO: Vision Outlooker for Visual Recognition
- papers_with_code
ELSA
- ELSA: Enhanced Local Self-Attention for Vision Transformer
- papers_with_code
DAT
- Vision Transformer with Deformable Attention
- github
As-ViT
- Auto-scaling Vision Transformers without Training
- github
CycleMLP
- CycleMLP: A MLP-like Architecture for Dense Prediction
- github
CrossFormer
- CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
- github
AS-MLP
- AS-MLP: An Axial Shifted MLP Architecture for Vision
- github
VAN
- Visual Attention Network
- github
ConvNeXt
- A ConvNet for the 2020s
- github

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Transformer_Backbone

Visualization

Paper

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Transformer_Backbone

Visualization

Paper