Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Wenlong Deng, Yize Zhao, Vala Vakilian, Minghui Chen, Xiaoxiao Li, Christos Thrampoulidis

January, 2025

Abstract

Storing open-source fine-tuned models separately introduces redundancy and increases response times in applications utilizing multiple models. Delta-parameter pruning (DPP), particularly the random drop and rescale (DARE) method proposed by Yu et al., addresses this by pruning the majority of delta parameters—the differences between fine-tuned and pre-trained model weights—while typically maintaining minimal performance loss. However, DARE fails when either the pruning rate or the magnitude of the delta parameters is large. We highlight two key reasons for this failure - (1) an excessively large rescaling factor as pruning rates increase, and (2) high mean and variance in the delta parameters. To address these, we develop two algorithmic improvements - (1) DARq, which modifies the rescaling factor in DARE, leading to significant performance gains at high pruning rates (e.g., >30% on COLA and SST2 for encoder models, with even larger improvements in decoder models), and (2) AdamR, an in-training modification that incorporates appropriate Delta regularization before applying DPP. We also demonstrate that DARq can be seamlessly combined with vanilla parameter-efficient fine-tuning techniques like LoRA and can facilitate structural DPP. Additionally, we revisit the application of importance-based pruning techniques within DPP, demonstrating that they outperform random-based methods when delta parameters are large. Through this comprehensive study, we develop a pipeline for selecting the most appropriate DPP method under various practical scenarios.

Type

Conference paper

Publication

In The Thirteenth International Conference on Learning Representations

Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Abstract

Minghui Chen

Researcher