ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
Sprache des Titels:
Englisch
Original Kurzfassung:
Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly
achieved by simultaneously learning n tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated
within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits, such as promoting reusability, and addressing cases
involving data privacy and societal concerns; on the flip side, current two-stage MTL methods come with the cost of introducing a substantial number of additional parameters. In this work, we address this issue by leveraging the usefulness
of linearly scaling the output representations of source adapters for transfer learning. We introduce SCALEARN, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning
a minimal set of scaling parameters that enable effective knowledge transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) show that our SCALEARN, in addition to facilitating the benefits of two-stage MTL,
consistently outperforms strong baselines with only a small number of transfer parameters ? roughly 0.35% of those of AdapterFusion. Remarkably, we observe that SCALEARN maintains its strong abilities even when further reducing parameters
through uniform scaling and layer-sharing, achieving similarly competitive results with only 8 transfer parameters for each target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for
more efficient task transfer