Evolution of the Rational Swarm Model

This provides a history of the evolution of the rational swarms model. It is intended both to give credit to the students and colleagues who have participated in this line of work over the years, but it also serves an educational goal. It shows how a long-term research direction evolves over many years, through several theses and resulting publications.

For an up-to-date perspective and model, see the following paper, published in the Royal Society Transactions. It best summarizes the approach and results up to 2025:

Gal A. Kaminka, Swarms can be rational. In Philosophical Transactions of the Royal Society A (2025)

The Effectiveness Index (EI), 2007–2010

The first version of the reward function was called Effectiveness Index (EI). It was presented in

Gal A. Kaminka, Dan Erusalimchik, and Sarit Kraus. Adaptive Multi-Robot Coordination: A Game-Theoretic Perspective, in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2010.

EI is the ratio between the time and resources spent on coordination, and the total time and resources spent by the robot overall. The robot measures time and resource usage for itself, and thus no external information is required. When a collision occurs, the robot computers this ratio, and uses it as a reward for its previous selection. It then selects a new action using a greedy-epsilon scheme.

Dan Erusalimchik’s thesis presented the EI reward in detail, and examines its use in several different environments, and in several reinforcement learning settings (stateless, stateful, varying learning rates). The most demanding one was with Sony AIBO robots:

The Aligned Effectiveness Index, 2016–2020

The difficulty with the EI reward was that while each robot selfishly adapts to select collision-avoidance methods which reduces it own overhead, this can increase the overhead of others. Yinon Douchan’s thesis presented a generalized version of the EI reward, in which the individual rewards are aligned with the collective, by approximation. This paper also showed that the swarm task can be modeled as a repeating game with unknown horizon, and presented a continuous-time reinforcement-learning algorithm that is able to converge to a local-maximum Nash equilibrium, despite each of the robots not knowing anything about the tasks, goals, or actions of the other robots.

The main results from the thesis were published in two papers:

Yinon Douchan, Ran Wolf and Gal A. Kaminka. Swarms can be rational in Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems, 2019

Gal A. Kaminka and Yinon Douchan, Heterogeneous foraging swarms can be better

This general version of the EI function and algorithm were demonstrated in two environments. First, in a foraging-like task (repeated search) carried out by Krembot robots, and also in the material handling simulator Alphabet Soup, built by Kiva Robotics (that became the foundation for Amazon Robotics when acquired by Amazon)