Emergent abilities of Large Language Models
Exploring unexpected behaviour in transformer-based deep learning models
Dear all, in this post you will see:
Emergent abilities of LLMs: are they real or a mirage?
Some jobs and research opportunities from my network
Are the emergent abilities of LLMs real or an illusion?
Besides their infinite possible applications, LLMs are the latest toy model available to explore many interesting properties, including certain emergent behaviours. Researchers are studying whether these behaviours are real, generated by the complexity of the system, or whether they are an illusion. Let us explore this interesting topic.
Emergent phenomena: a general definition
In the fascinating realms of physics, biology, and dynamical systems, emergent phenomena captivate our interest. These intriguing behaviours, arising from the complex interactions of a system's individual components, are not explicitly dictated by the fundamental laws governing their respective domains. Instead, they materialize at certain scales, astonishing us with their seemingly unexpected presence. At the heart of these phenomena lies the profound assertion, "More is different", famously coined by Nobel laureate physicist Philip Anderson.
Among the more widely used definition there is:
An ability is emergent if it is not present in smaller systems but is present in larger ones.
or even:
Emergence is when quantitative changes in a system result in qualitative changes in behaviour.
To be concrete, I will give you two interesting examples of emergent phenomena.
Ants collective intelligence: ants, as individuals, possess relatively simple cognitive abilities and limited information. However, when they come together and interact within a colony, they exhibit complex and highly efficient cooperative behaviours that lead to the emergence of sophisticated intelligence, without any central control or explicit instructions.
Emergence of spacetime: some contemporary theories in physics, such as Holography, propose that spacetime may not be a fundamental concept but rather an emergent phenomenon arising from the intricate web of quantum entanglement. I will definitely look into this in a future post.
Moving on, economics can be viewed as a complex phenomenon that emerges from psychology, which in turn emerges from biology, which emerges from chemistry. The latter is an emergent theory rooted in physics.
To summarize, in emergent phenomena happens that seemingly simple components give rise to astonishingly complex and adaptive systems, where the whole is greater than the sum of its parts. In other words what makes emergent phenomena intriguing is their:
sharpness: transitioning seemingly instantaneously from not present to present;
unpredictability: transitioning at seemingly unforeseeable model scales.
Emergence in LLMs
The current paradigm for improving the quantitative ability of AI models, in particular transformer-based LLMs, is scaling their dimensions (more parameters, more data, greater computational power). This scaling paradigm is often criticised, but on the other side it leads to the increasing of qualitative abilities.
While growing in size, new unexpected emergent behaviours are suddenly unlocked. Among these abilities there are math reasoning, riddle resolution, and even sparks of spatial orientation.
Let’s go through together and explore some interesting examples.
Few-shot and Zero-shot learning
Few-shot learning is the simplest emergent abilities first discovered in LLMs. It refers to the ability of a LLM to learn and perform a new task using only a few examples. The model is able to understand the task to be solved without explicitly asking for it.
Zero-shot learning takes the idea of few-shot learning a step further. In zero-shot learning, a LLM is capable of performing a task for which it has received no explicit training examples. The model can make use of the relationships between different tasks it has been trained on to generalize its knowledge and apply it to new tasks.
Emergent maths and spatial abilities
Without being explicitly trained on the following resolution, as well as many other mathematical problems, here we can observe how GPT4 is able to understand and solve this task.
Some LLMs exhibit also geometric shape intuition, as show in this work. Below instead, a spatial ability is shown. I strongly suggest you to copy this prompt and test GPT-4 result.
Suppose I have an 8x8 grid. The columns are labelled 1-8 from left to right, and the rows are A-H from top to bottom. All cells are empty except for cell B-2 where Alice is, and cell F-5 where Bob is. What is the exact series of cells Alice can move through to get to Bob as quickly as possible? Assume Alice can only move down-left-right to adjacent cells, and not diagonally.
Chain-of-thought & Zero-shot Chain-of-thought
Chain-of-Though (CoT), introduced by Way at al., is one of the most interesting emerging skills of LLMs. It involves providing the LLM with a series of instructions that create a logical chain of connected thoughts, guiding the model's responses step by step. This technique increases the likelihood of the LLM giving a correct answer compared to the case where the LLM simply provides its solution to the problem in one go.
Zero-shot CoT is even more interesting, as we can achieve CoT-like behaviour and results by simply replacing the set of instructions with the "solve it step by step" hint added at the end of the prompt.
These are emergent abilities because they demonstrates that LLMs can follow logical sequences, maintaining context and coherence throughout a conversation.
From Self-attention to Self-refinement
Self-attention is the secret of the Transformer architecture introduced in the seminal article “Attention is all you need”. Part of the recent LLM revolution started thanks to this revolutionary idea. It is curious to see how many of the emerging skills stem from a very simple idea made scalable.
Recently A. Madaan et al., explored the possibility that a LLMs can improve their initial outputs through iterative feedback and refinement. This is called Self-refinement. The framework generates an output using an LLM, then allows the same model to provide multi-aspect feedback for its own output. Finally, the same model refines its previously generated output given its own feedback. This process is similar to how humans refine their text through self-feedback. Self-refinement does not require supervised training or reinforcement learning, and works with a single LLM.
Emergent abilities: fact or mirage?
Despite the intriguing possibility that LLMs may give rise to hidden abilities, it is certainly legitimate to ask the question whether such emergent properties are really such or an illusion. Some researchers (check here and here) highlight the possibility that LLMs emergent properties might arise from:
simple multi-step reasoning (see below for more details and check here );
the specific choice of metric chosen to measure them (more here).
We now briefly go through the first point. Suppose LLMs get smoothly better at elementary reasoning when increasing in scale. Starting now from a model M that has a certain probability p of a successful single reasoning step, let’s consider a multi-step reasoning problem which requires a chain of several successful reasoning steps to solve. If we assume that the reasoning steps are independent, the probability of success for k steps is given by the binomial distribution. There is a rapid increase in the probability of task success with a steady tuning of the probability p of the elementary reasoning step.
The steepness of the curve implies that those abilities which involve multi-step reasoning may not be truly emergent, but partly a result of an increased ability to perform an elementary task that smoothly changes with scale.
To identify proper emergent abilities in LLMs, one should look for phase transitions, where below a certain threshold of scale, model performance is near-random, and beyond that threshold, performance is well above random. This distinguishes emergent abilities from abilities that smoothly improve with scale.
Although it is not yet clear whether these LLM skills are purely emerging or not, I think that these abilities will open many research and applicative opportunities. The fact is that Emergence in LLMs is already a field of research, as the Sociology of AI, the study of the interaction of many agents and their social behaviour, will soon be.
Jobs and research opportunities
I am going now to share some opportunities from my network:
Data Science in Healthcare: Junior/Senior Bioinformatician for Computational Neurogenomics @ Human Technopole (Milan);
Earth Observation: Researcher on Earth Observation from Remote Sensing @ CMCC
AI & Software Developer, Ginestra @ Applied Materials.
Do you know of any interesting opportunities you would like me to share? Please, write to me!
You can find me on LinkedIn or Twitter, where I share Science & Tech news. If you wish to book a call, you can find me here.