this post was submitted on 27 Oct 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

PAPER: https://arxiv.org/abs/2310.16764

SUMMARY

The paper "ConvNets Match Vision Transformers at Scale" from Google DeepMind aims to debunk the prevalent notion that Vision Transformers (ViTs) are inherently superior to ConvNets for large-scale image classification. Using the NFNet model family as a representative ConvNet architecture, the authors pre-train various models on the extensive JFT-4B dataset under different compute budgets, ranging from 0.4k to 110k TPU-v4 core hours. Through this empirical analysis, they observe a log-log scaling law between held-out loss and compute budget. Importantly, when these NFNets are fine-tuned on ImageNet, they match the performance metrics of ViTs trained under comparable computational constraints. Their most resource-intensive model even achieves a Top-1 ImageNet accuracy of 90.4%.

The crux of the paper's argument is that the supposed performance gap between ConvNets and ViTs largely vanishes under a fair comparison, which accounts for compute and data scale. In other words, the efficacy of a machine learning model in large-scale image classification is more dependent on the available data and computational resources than on the choice between ConvNet and Vision Transformer architectures. This challenges the community's leaning towards ViTs and emphasizes the importance of equitable benchmarking when evaluating different neural network architectures.

you are viewing a single comment's thread
view the rest of the comments
[–] Smallpaul@alien.top 1 points 1 year ago (5 children)

The “it” in AI models is the dataset.

... trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point. Sufficiently large diffusion conv-unets produce the same images as ViT generators. AR sampling produces the same images as diffusion.

[–] currentscurrents@alien.top 1 points 1 year ago (4 children)

Maybe it's less about having as many parameters as the human brain, and more about having datasets as rich and diverse as the real world.

[–] TikiTDO@alien.top 1 points 1 year ago (1 children)

People talk a lot about datasets being "rich" and "diverse," but I wish they would also mentioned "not full of crap" in the same breath. Whether it be AI or humans, garbage-in, garbage-out still applies. You can have a rich and diverse dataset that teaches AI horrific, terrible ideas and practices.

We know with humans you get a very different effect based on the quality of the teacher and the teaching material, and we know that a bad teacher teaching bad lessons can be even worse than nothing at all. AI isn't really that different.

[–] shanereid1@alien.top 1 points 1 year ago

Was at a big data industry conference yesterday, and one of the big takeaways was that data quality is going to be critical in the age of genAI.

load more comments (2 replies)
load more comments (2 replies)