Evolution of the Rational Swarm Model
The Effectiveness Index (EI), 2010
The first version of the reward function was called Effectiveness Index (EI). It was presented in Gal A. Kaminka, Dan Erusalimchik, and Sarit Kraus. Adaptive Multi-Robot Coordination: A Game-Theoretic Perspective, in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2010. EI is the ratio between the time and resources spent on coordination, and the total time and resources spent by the robot overall. The robot measures time and resource usage for itself, and thus no external information is required. When a collision occurs, the robot computers this ratio, and uses it as a reward for its previous selection. It then selects a new action using a greedy-epsilon scheme. We used a very high learning rate, which practically means the robots were adapting quickly to their settings, but never really converge to a specific selected action.
The EI reward was used in several different environments, but the most demanding one was with Sony AIBO robots:
The Aligned (General) Effectiveness Index, 2019
The difficulty with the EI reward as introduced in 2010 was that while each robot selfishly adapts to select collision-avoidance methods which reduces it own overhead, this can increase the overhead of others. Thus a later paper, Yinon Douchan, Ran Wolf and Gal A. Kaminka. Swarms Can be Rational in Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems, 2019 presented a generalized version of the EI reward, in which the individual and collective rewards are aligned, by approximation. This paper also showed that the swarm task can be modeled as a repeating game with unknown horizon, and presented a continuous-time reinforcement-learning algorithm that is able to converge to a local-maximum Nash equilibrium, despite each of the robots not knowing anything about the tasks, goals, or actions of the other robots.
This general version of the EI function and algorithm were demonstrated in two environments. First, in a foraging-like task (repeated search) carried out by Krembot robots, and also in the material handling simulator Alphabet Soup, built by Kiva Robotics (that became the foundation for Amazon Robotics when acquired by Amazon)