An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleAlexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsbyhttps://arxiv.org/pdf/2010.11929 1. Introduction1-1. ViT모델이 나타나게 된 배경당시에 NLP분야에서는 Transformer가 우세하고, Vision분야에서는 CNN가 우..