@COMMENT This file was generated by bib2html.pl <https://sourceforge.net/projects/bib2html/> version 0.94
@COMMENT written by Patrick Riley <http://sourceforge.net/users/patstg/>
@COMMENT This file came from Gal A. Kaminka's publication pages at
@COMMENT http://www.cs.biu.ac.il/~galk/publications/
@mastersthesis{eden-msc,
  author = {Eden R. Hartman},
  title = {Swarming Bandits: A Rational and Practical Model of Swarm Robotic Tasks},
  school = {{B}ar {I}lan {U}niversity},
  year = {2022},
  OPTkey = {},
  OPTtype = {},
  OPTaddress = {},
  OPTmonth = {},
  OPTnote = {Available at \url{http://www.cs.biu.ac.il/~galk/Publications/b2hd-eden-msc.html}},
  OPTannote = {},
  wwwnote = {}, 
  abstract = {A \textit{swarm} is a multi-agent system in which robots base their decisions only on \textit{local} interactions with the other robots and the environment.
Local interactions limit the robots' abilities, allowing them to perceive and act only with respect to a \textbf{subset} of the other robots, and preventing them from
coordinating explicitly with all members of the system.
Despite these challenging constraints, swarms are often observed in real-world phenomena, and inspired technology for many robotics applications.
A key open challenge in swarm research is to be able to
provide guarantees on the global behavior of the swarm, given their individual decision rules and local interactions.
The reverse is also an open challenge: given the required guaranteed global behavior, how should the
individual behave and make decisions?
This thesis proposes a new game-theoretic model for swarms. It ties local decision-making with theoretical guarantees of stability and global rewards.
Using simple reinforcement-learning with a reward that is computed locally by each robot, it is able to make guarantees about the emerging global results.
Specifically, we show that the utility of the swarm is maximized as robots maximize the time they spent on their task. This allows each single robot to evaluate the efficacy of a collision-avoidance action based on the time it frees up for its own swarm task execution. We use a \textit{multi-arm bandit framework} to allow each individual agent to learn the collision-avoidance actions that are best.  
Then, we show how to shape the reward used in the learning
process, so that it takes into account the marginal contribution of the robot to the swarm.  While the marginal contribution is not directly accessible by the robot, it can be  approximated effectively based on its own experience.
We evaluate the model empirically, using a popular 3D robotics physics-based simulation, in which a cooperative swarm is engaged in \emph{foraging}, a popular canonical task.  We compare the results to those achieved by the state of the art, and show superior results.
  },
}