Gal A. Kaminka: Publications

• Sorted by Date • Classified by Publication Type • Classified by Topic • Grouped by Student (current) • Grouped by Former Students •

Swarming Bandits: A Rational and Practical Model of Swarm Robotic Tasks

Eden R. Hartman. Swarming Bandits: A Rational and Practical Model of Swarm Robotic Tasks. Master's Thesis, Bar Ilan University,2022.

Download

[PDF]5.6MB

Abstract

A swarm is a multi-agent system in which robots base their decisions only on local interactions with the other robots and the environment.Local interactions limit the robots' abilities, allowing them to perceive and act only with respect to a subset of the other robots, and preventing them fromcoordinating explicitly with all members of the system.Despite these challenging constraints, swarms are often observed in real-world phenomena, and inspired technology for many robotics applications.A key open challenge in swarm research is to be able toprovide guarantees on the global behavior of the swarm, given their individual decision rules and local interactions.The reverse is also an open challenge: given the required guaranteed global behavior, how should theindividual behave and make decisions?This thesis proposes a new game-theoretic model for swarms. It ties local decision-making with theoretical guarantees of stability and global rewards.Using simple reinforcement-learning with a reward that is computed locally by each robot, it is able to make guarantees about the emerging global results.Specifically, we show that the utility of the swarm is maximized as robots maximize the time they spent on their task. This allows each single robot to evaluate the efficacy of a collision-avoidance action based on the time it frees up for its own swarm task execution. We use a multi-arm bandit framework to allow each individual agent to learn the collision-avoidance actions that are best. Then, we show how to shape the reward used in the learningprocess, so that it takes into account the marginal contribution of the robot to the swarm. While the marginal contribution is not directly accessible by the robot, it can be approximated effectively based on its own experience.We evaluate the model empirically, using a popular 3D robotics physics-based simulation, in which a cooperative swarm is engaged in foraging, a popular canonical task. We compare the results to those achieved by the state of the art, and show superior results.

Additional Information

BibTeX

@mastersthesis{eden-msc,
  author = {Eden R. Hartman},
  title = {{Swarming Bandits: A Rational and Practical Model of Swarm Robotic Tasks}},
  school = {{B}ar {I}lan {U}niversity},
  year = {2022},
  OPTkey = {},
  OPTtype = {},
  OPTaddress = {},
  OPTmonth = {},
  OPTnote = {Available at \url{http://www.cs.biu.ac.il/~galk/Publications/b2hd-eden-msc.html}},
  OPTannote = {},
  wwwnote = {}, 
  abstract = {A \textit{swarm} is a multi-agent system in which robots base their decisions only on \textit{local} interactions with the other robots and the environment.
Local interactions limit the robots' abilities, allowing them to perceive and act only with respect to a \textbf{subset} of the other robots, and preventing them from
coordinating explicitly with all members of the system.
Despite these challenging constraints, swarms are often observed in real-world phenomena, and inspired technology for many robotics applications.
A key open challenge in swarm research is to be able to
provide guarantees on the global behavior of the swarm, given their individual decision rules and local interactions.
The reverse is also an open challenge: given the required guaranteed global behavior, how should the
individual behave and make decisions?
This thesis proposes a new game-theoretic model for swarms. It ties local decision-making with theoretical guarantees of stability and global rewards.
Using simple reinforcement-learning with a reward that is computed locally by each robot, it is able to make guarantees about the emerging global results.
Specifically, we show that the utility of the swarm is maximized as robots maximize the time they spent on their task. This allows each single robot to evaluate the efficacy of a collision-avoidance action based on the time it frees up for its own swarm task execution. We use a \textit{multi-arm bandit framework} to allow each individual agent to learn the collision-avoidance actions that are best.  
Then, we show how to shape the reward used in the learning
process, so that it takes into account the marginal contribution of the robot to the swarm.  While the marginal contribution is not directly accessible by the robot, it can be  approximated effectively based on its own experience.
We evaluate the model empirically, using a popular 3D robotics physics-based simulation, in which a cooperative swarm is engaged in \emph{foraging}, a popular canonical task.  We compare the results to those achieved by the state of the art, and show superior results.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Sun Feb 08, 2026 18:23:11