A curated and up-to-date paper list of awesome on-deivce Large Language Models (LLMs) / Small Language Models (SLMs) research.
When the AGI era comes, the role of on-device LLMs / SLMs will likely become increasingly critical. They offer unique advantages in terms of privacy, responsiveness, and user-centric customization that cloud-based models may not be able to match. However, LLMs are often large and resource-intensive, making it difficult to deploy them on devices with limited computational power and memory. Researching techniques such as model pruning, quantization, and distillation can lead to more efficient models as SLMs that retain performance while being lightweight enough for mobile devices. This can enable a wider range of applications and improve accessibility.
If you find some interesting work/projects, please contact me through issues or email withhaotian [at] gmail [dot] com.
This list only focuses on the on-device LLMs / SLMs research. If you are interested in edge AI computing and system, please refer to awesome-edge-AI-papers.
This project is licensed under the GPL-3.0 license - see the LICENSE file for details.
- [arXiv'24] On-Device Language Models: A Comprehensive Review - [PDF] [Code]
- [arXiv'24] A Survey of Small Language Models - [PDF]
- [arXiv'24] Small Language Models: Survey, Measurements, and Insights - [PDF] [Code] [Demo]
- [arXiv'24] A Survey of Resource-efficient LLM and Multimodal Foundation Models - [PDF] [Code]
- [arXiv'24] On-Device Language Models: A Comprehensive Review - [PDF]
- [arXiv'24] A Survey on Model Compression for Large Language Models - [PDF]
- [arXiv'24] OpenELM: An Efficient Language Model Family with Open Training and Inference Framework - [PDF] [Code] [HuggingFace]
- [arXiv'24] FOX-1 TECHNICAL REPORT - [PDF] [HuggingFace]
- [arXiv'24] Tinyllama: An open-source small language model - [PDF] [Code]
- [arXiv'24] MobileVLM V2: Faster and Stronger Baseline for Vision Language Model - [PDF] [Code]
- [arXiv'24] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - [PDF]
- [arXiv'24] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone - [PDF]
- [arXiv'24] MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT - [PDF] [Code]
- [arXiv'24] vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving - [PDF]
- [arXiv'24] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - [PDF] [Code]
- [arXiv'24] Exploring post-training quantization in llms from comprehensive study to low rank compensation - [PDF]
- [arXiv'24] MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases - [PDF]
- [EdgeFM'24] Large Language Models on Mobile Devices: Measurements, Analysis, and Insights - [PDF]
- [arXiv'24] Toward Scalable Generative AI via Mixture of Experts in Mobile Edge Networks - [PDF]