Green LLM: Studying Key Factors Affecting Energy Consumption of Code Assistants

In recent years,Large Language Models (LLMs) have significantly improved in generating high-quality code, enabling their integration into developers’ Integrated Development Environments (IDEs) as code assistants. These assistants, such as GitHub Copilot, deliver real-time code suggestions and can greatly enhance developers’ productivity. However, the environmental impact of these tools, in particular their energy consumption, remains a key concern. This paper investigates the energy consumption of LLM-based code assistants by simulating developer interactions with GitHub Copilot and analyzing various configuration factors. We collected a dataset of development traces from 20 developers and conducted extensive software project development simulations to measure energy usage under different scenarios. Our findings reveal that the energy consumption and performance of code assistants are influenced by various factors, such as the number of concurrent developers, model size, quantization methods, and the use of streaming. Notably, a substantial portion of generation requests made by GitHub Copilot is either canceled or rejected by developers, indicating a potential area for reducing wasted computations. Based on these findings, we share actionable insights into optimizing configurations for different use cases, demonstrating that careful adjustments can lead to significant energy savings.

Mission: Impossible Language Models

Chomsky and others have claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn. However, there is very little published experimental evidence to support such a claim. This study develops a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. These languages lie on an impossibility continuum: at one end are languages that are inherently impossible (random and irreversible shuffles of English words), and on the other, languages considered impossible in linguistics, particularly those with rules based on counting word positions. A wide range of evaluations assess the capacity of GPT-2 small models to learn these uncontroversially impossible languages, performed at various stages throughout training to compare the learning process for each language. The core finding is that GPT-2 struggles to learn impossible languages compared to English as a control, challenging Chomsky’s claim. The authors hope their approach opens a productive line of inquiry in which different LLM architectures are tested on a variety of impossible languages, as tools for cognitive and typological investigations.

On the Biology of a Large Language Model

Large language models display impressive capabilities. However, for the most part, the mechanisms by which they do so are unknown. The black-box nature of models is increasingly unsatisfactory as they advance in intelligence and are deployed in a growing number of applications. Our goal is to reverse engineer how these models work on the inside, so we may better understand them and assess their fitness for purpose. The challenges we face in understanding language models resemble those faced by biologists. Living organisms are complex systems which have been sculpted by billions of years of evolution. While the basic principles of evolution are straightforward, the biological mechanisms it produces are spectacularly intricate. Likewise, while language models are generated by simple, human-designed training algorithms, the mechanisms born of these algorithms appear to be quite complex.