Skip to content

Latest commit

 

History

History
143 lines (135 loc) · 6.59 KB

readme.md

File metadata and controls

143 lines (135 loc) · 6.59 KB

Transformer_Backbone

Visualization

Requirements

Example

Paper

  • CoAtNet
    • CoAtNet: Marrying Convolution and Attention for All Data Sizes
    • papers_with_code
  • ViT-G/14
    • Scaling Vision Transformers
    • paper
  • SwinV2
  • ViT-MoE
    • Scaling Vision with Sparse Mixture of Experts
    • paper
  • Florence
    • Florence: A New Foundation Model for Computer Vision
    • paper
  • ALIGN
    • Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
    • papers_with_code
  • MViTv2
    • Improved Multiscale Vision Transformers for Classification and Detection
    • paper
  • MViT
  • BEiT
  • Meta_Pseudo_Labels
  • SAM
    • Sharpness-Aware Minimization for Efficiently Improving Generalization
    • papers_with_code
  • NoisyStudent
    • Self-training with Noisy Student improves ImageNet classification
    • papers_with_code
  • NFNet
    • High-Performance Large-Scale Image Recognition Without Normalization
    • papers_with_code
  • TokenLearner
    • TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
    • papers_with_code
  • BiT
  • MAE
  • Focal
    • Focal Attention for Long-Range Interactions in Vision Transformers
    • paper & code
  • MetaFormer
  • CSWin
    • CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
    • papers_with_code
  • Twins
    • Twins: Revisiting the Design of Spatial Attention in Vision Transformers
    • papers_with_code
  • Swin
    • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
    • papers_with_code
  • CaiT
  • CvT
  • PvTv2
  • PvT
    • Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
    • paper, code
  • SReT
  • ViT
    • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
    • papers_with_code
  • HRFormer
    • HRFormer: High-Resolution Vision Transformer for Dense Predict
    • paper & code
  • Conformer
    • Conformer: Local Features Coupling Global Representations for Visual Recognition
    • papers_with_code
  • FixEfficientNet
    • Fixing the train-test resolution discrepancy: FixEfficientNet
    • papers_with_code
  • EfficientNetV2
  • EfficientNet
    • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
    • papers_with_code
  • Pale
    • Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
    • paper
  • VOLO
  • ELSA
  • DAT
    • Vision Transformer with Deformable Attention
    • github
  • As-ViT
    • Auto-scaling Vision Transformers without Training
    • github
  • CycleMLP
    • CycleMLP: A MLP-like Architecture for Dense Prediction
    • github
  • CrossFormer
    • CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
    • github
  • AS-MLP
    • AS-MLP: An Axial Shifted MLP Architecture for Vision
    • github
  • VAN
    • Visual Attention Network
    • github
  • ConvNeXt
    • A ConvNet for the 2020s
    • github