Efficient Large Language Model Deployment: A Survey and Empirical Study

15 November 2023

Sustainable , AI

Research

Conducted by :

Peking University, Microsoft Research Asia

#LLM Deployment #Energy Efficiency #Model Optimization #Green AI #Resource Utilization

Authors

Xiaonan Nie (Peking University)

Xixuan Zhang (Microsoft Research Asia)

Shuo Wang (Microsoft Research Asia)

Xuanyu Zhu (Peking University)

Abstract

This comprehensive survey investigates various approaches for deploying large language models efficiently, focusing on reducing computational resources and energy consumption.

The research evaluates different deployment strategies including model compression, quantization, and hardware acceleration techniques, providing empirical evidence of their effectiveness.

The authors present a systematic comparison of deployment methods and their impact on model performance, latency, and energy usage.

Sources

Page source

Document source

Notice something missing or incorrect?
Suggest changes on GitHub