Forsaken-Data4905

joined 2 years ago

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? in c/machinelearning@academy.garden

[–] Forsaken-Data4905@alien.top 1 points 2 years ago

The point is that the adapted layers have a significantly higher parameter count in the freezed model, leading to huge savings of memory. You never take your gradient with respect to adapted layers, only to adaptor layers and whatever is left of the original model.

This is of course not necessarily true for smaller models.

permalink
fedilink
source