Skip to content

A curated and up-to-date paper list of awesome on-deivce LLMs/SLMs research.

License

Notifications You must be signed in to change notification settings

withhaotian/awesome-on-device-LLM-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Awesome On-Device LLM papers

A curated and up-to-date paper list of awesome on-deivce Large Language Models (LLMs) / Small Language Models (SLMs) research.

When the AGI era comes, the role of on-device LLMs / SLMs will likely become increasingly critical. They offer unique advantages in terms of privacy, responsiveness, and user-centric customization that cloud-based models may not be able to match. However, LLMs are often large and resource-intensive, making it difficult to deploy them on devices with limited computational power and memory. Researching techniques such as model pruning, quantization, and distillation can lead to more efficient models as SLMs that retain performance while being lightweight enough for mobile devices. This can enable a wider range of applications and improve accessibility.

If you find some interesting work/projects, please contact me through issues or email withhaotian [at] gmail [dot] com.

This list only focuses on the on-device LLMs / SLMs research. If you are interested in edge AI computing and system, please refer to awesome-edge-AI-papers.

License

This project is licensed under the GPL-3.0 license - see the LICENSE file for details.

Overview

Surveys

  • [arXiv'24] On-Device Language Models: A Comprehensive Review - [PDF] [Code]
  • [arXiv'24] A Survey of Small Language Models - [PDF]
  • [arXiv'24] Small Language Models: Survey, Measurements, and Insights - [PDF] [Code] [Demo]
  • [arXiv'24] A Survey of Resource-efficient LLM and Multimodal Foundation Models - [PDF] [Code]
  • [arXiv'24] On-Device Language Models: A Comprehensive Review - [PDF]
  • [arXiv'24] A Survey on Model Compression for Large Language Models - [PDF]

Models / Architectures Design

  • [arXiv'24] OpenELM: An Efficient Language Model Family with Open Training and Inference Framework - [PDF] [Code] [HuggingFace]
  • [arXiv'24] FOX-1 TECHNICAL REPORT - [PDF] [HuggingFace]
  • [arXiv'24] Tinyllama: An open-source small language model - [PDF] [Code]
  • [arXiv'24] MobileVLM V2: Faster and Stronger Baseline for Vision Language Model - [PDF] [Code]
  • [arXiv'24] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - [PDF]
  • [arXiv'24] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone - [PDF]
  • [arXiv'24] MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT - [PDF] [Code]

Algorithms

  • [arXiv'24] vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving - [PDF]

Model Compression

  • [arXiv'24] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - [PDF] [Code]
  • [arXiv'24] Exploring post-training quantization in llms from comprehensive study to low rank compensation - [PDF]

Systems

  • [arXiv'23]AutoDroid: LLM-powered Task Automation in Android - [PDF] [Code]

Benchmarks

  • [arXiv'24] MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases - [PDF]
  • [EdgeFM'24] Large Language Models on Mobile Devices: Measurements, Analysis, and Insights - [PDF]

Applications

  • [arXiv'24] Toward Scalable Generative AI via Mixture of Experts in Mobile Edge Networks - [PDF]

About

A curated and up-to-date paper list of awesome on-deivce LLMs/SLMs research.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published