Videos & Image Gallery
Experimental Environments
We have applied the rational swarms multi-agent reinforcement learning method in many different environments, both with physical robots and in simulations. Repeatedly, we find that different robots learn to respond differently to collisions and inter-agent conflicts that require coordination. They become heterogeneous in their decision-making.
Alphabet Soup material-handling simulator1
- The material handling simulator Alphabet Soup2 was developed by Kiva Systems (which was acquired by Amazon, and became the basis for Amazon Robotics). Here orders (words) are continuously put into a queue. The shelves (purpule circles) contain letters. Robots (orange rectangles) are assigned by a task-allocation algorithm to fetch letters, and bring them to the order-completion stations (green circles, right side). Occasionally, new letters have to be brought in from the supply station (blue circles, left side). The task is to maximize the number of completed words.
While the assignment of which robot goes to which letter is centrally given, each robot plots and manages its own path. Robots may therefore collide and must handle collisions and navigate around obstacles.
The experiments described in the paper show that by using the generalized (aligned—see paper) EI reward and continuous-time reinforcement learning algorithm, different robots learn to use different collision-avoidance methods. This results in a heterogeneous swarm (different robots respond differently to same state), in which the robots complete many more words in a given time, than using default methods.
Repeated search (foraging variant) with Krembot robots1
In this task, Krembot robots carry out a repeated search task, which is a variant of foraging. Due to limitations of the robots, we could not get them to push pucks (collected items) reliably. We therefore fixed the pucks in place (wooden circles), while the robots had to repeatedly find them, and then find their nest (lighted, bottom left corner). The robots do not know where the pucks are, and cannot localize, so they have to search from scratch every time. They also do not know where the nest is, and have to search for it, by looking for areas that are highly lighted (the nest is lighted in green, but it is difficult to see in the movie).
The color coding on the robots themselves. As the robots seek pucks, they light up in red. When they determine that they have found a puck, they change their light to blue, and start searching for the nest. When they reach it, they switch to red again, and go back to searching for pucks.
Here again, the application the generalized aligned EI reward leads to significant and clear improvement in the number of items found, compared to any homogeneous swarm using a single method.
Foraging with Sony AIBO robots3
-
Yinon Douchan, Ran Wolf and Gal A. Kaminka. Swarms Can be Rational in Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2019. ↩︎ ↩︎
-
Hazard, Christopher & Wurman, Peter & andrea, Raffaello. (2010). Alphabet Soup: A Testbed for Studying Resource Allocation in Multi-vehicle Systems. ↩︎
-
Gal A. Kaminka, Dan Erusalimchik, and Sarit Kraus. Adaptive Multi-Robot Coordination: A Game-Theoretic Perspective, in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2010. ↩︎