Abstract
We propose Matryoshka, a novel framework for transformer model pruning, enabling dynamic runtime controls while maintaining competitive accuracy to modern large language models (LLMs). Matryoshka incrementally constructs submodels with varying complexities, allowing runtime adaptation without maintaining separate models. Our evaluations on LLaMA-7B demonstrate that Matryoshka achieves up to 34% speedup and outperforms the quality of state-of-the-art pruning methods, providing a flexible solution for deploying LLMs.
| Original language | English |
|---|---|
| Title of host publication | 2025 Design, Automation and Test in Europe Conference, DATE 2025 - Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9783982674100 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 Design, Automation and Test in Europe Conference, DATE 2025 - Lyon, France Duration: 31 Mar 2025 → 2 Apr 2025 |
Publication series
| Name | Proceedings -Design, Automation and Test in Europe, DATE |
|---|---|
| ISSN (Print) | 1530-1591 |
Conference
| Conference | 2025 Design, Automation and Test in Europe Conference, DATE 2025 |
|---|---|
| Country/Territory | France |
| City | Lyon |
| Period | 31/03/25 → 2/04/25 |
Bibliographical note
Publisher Copyright:© 2025 EDAA.
Keywords
- Depth-Pruning
- Large Language Model
- Real-Time Management
Fingerprint
Dive into the research topics of 'Late Breaking Results: Dynamically Scalable Pruning for Transformer-Based Large Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver