Gal A. Kaminka: Publications

Sorted by DateClassified by Publication TypeClassified by TopicGrouped by Student (current)Grouped by Former Students

Diagnosing Coordination Faults in Multi-Agent Systems

Meir Kalech. Diagnosing Coordination Faults in Multi-Agent Systems. Ph.D. Thesis, Bar Ilan University, 2007.

Download

[PDF]899.4kB  

Abstract

With increasing deployment of distributed applications in complex, dynamic settings, there is an increasing need to also be able to respond to failures that occur in their coordinated operation, in order to facilitate recovery and reestablish collaboration. We refer to this type of diagnosis as social diagnosis, since it focuses on finding causes for failures to maintain designer--specified relationships between sub-systems. Naive implementations of social diagnosis processes can require significant computation and communications overhead, which prohibit them from being effective as the number of sub-systems is scaled up, or the number of failures to diagnose increases. We thus seek to examine in depth the communication and computation overhead of diagnosis. We choose to use a model-based diagnosis (MBD) approach that relies on a model of the diagnosed system, which simulates the behavior of the system given the operational context (typically, the system's inputs). The resulting simulated behavior (typically, outputs) are compared to the actual behavior to detect discrepancies indicating failures. The model can then be used to pinpoint possible failing components within the system. MBD is increasingly being applied in distributed and multi-agent systems. While successfully addressing key challenges, MBD has been difficult to apply to diagnosing coordination failures. This is because many such failures take place at the boundaries between the agent and their environment, including other agents. For instance, a robot may send a message that another robot, due to an intermittent radio failure, did not receive. As a result, the two agents come to disagree on an action to be taken. A model of both the agents as well as the coordination between the agents is defined in advance, which enables to use a general reliable and robust diagnosis method which we believe is applicable to many multi-agent systems. In the first stage of the thesis we present a formalized model of social diagnosis in terms of model-based diagnosis approach. The multi-agent systems of interest to us are composed of several agents, which (by design) are to satisfy certain concurrence and mutual--exclusion coordination constraints. A fault in the coordination of a multi-agent system may be the result of a fault in one of the components or other agent components. Given a team model and an observation of the agents' components, the goal of the diagnosis is to determine a minimal set of abnormal components of agents whose selection may explain the inconsistency of the system. Based on this formalism, we take a first step towards distributed model-based diagnosis of coordination (inter--agent) failures. Modeling the coordination as a constraint graph brings to bear solution methods from distributed constraint satisfaction literature, as solutions to the constraint graph form the basis for diagnoses. We evaluated the use of these algorithms in comprehensive experiments with a team of physical and simulated Sony Aibo robots. We examined the computational requirements of the algorithms (i.e., their run--time and bandwidth usage), and the correctness of the diagnoses produced. We find that in general, a trade--off exists between computational costs and the ability to produce correct diagnosis. In the second stage of the thesis we present a design space of diagnosis algorithms, distinguishing several phases in the social diagnosis process, and providing alternative algorithms for each phase. To this aim we focus on the diagnosis of disagreement between situated agents, since the control process of such agents and faults is relatively simple to model, and we can therefore focus on the core communications and computational requirements of the diagnosis. We distinguish two phases of diagnosis: (i) selection of the diagnosing agents; and (ii) diagnosis of the global team state (by the selected agents). We provide alternative algorithms for these phases, and empirically evaluate the communications and run--time. The results show that centralizing the disambiguation process is a key factor in improving communications, but is not a determining factor in run--time. On the other hand, explicit reasoning about the different sub-systems is a key factor in determining run--time. Based on this conclusion we address two principles to achieve the reduction of the communication and the computation in large-scale teams. First, instead of sending all the information, send only the information that is relevant to the diagnosis. Second, diagnosing a limited number of agents that represent all others. These principles yield a novel diagnosis method which significantly reduces the runtime, while keeping communications overhead to a minimum.

Additional Information

BibTeX

@PhdThesis{kalech-phd, 
  author = 	 {Meir Kalech}, 
  title = 	 {Diagnosing Coordination Faults in Multi-Agent Systems}, 
  school = 	 {{B}ar {I}lan {U}niversity}, 
  year = 	 {2007}, 
  OPTkey = 	 {}, 
  OPTtype = 	 {}, 
  OPTaddress = 	 {}, 
  OPTmonth = 	 {}, 
  OPTnote = 	 {}, 
  abstract = { 
With increasing deployment of distributed applications in complex, 
dynamic settings, there is an increasing need to also be able to 
respond to failures that occur in their coordinated operation, in 
order to facilitate recovery and reestablish collaboration. We refer 
to this type of diagnosis as social diagnosis, since it focuses on 
finding causes for failures to maintain designer--specified 
relationships between sub-systems. Naive implementations of social 
diagnosis processes can require significant computation and 
communications overhead, which prohibit them from being effective as 
the number of sub-systems is scaled up, or the number of failures 
to diagnose increases. We thus seek to examine in depth the 
communication and computation overhead of diagnosis. 
We choose to use a model-based diagnosis (MBD) approach that 
relies on a model of the diagnosed system, which simulates the 
behavior of the system given the operational context (typically, the 
system's inputs). The resulting simulated behavior (typically, 
outputs) are compared to the actual behavior to detect discrepancies 
indicating failures. The model can then be used to pinpoint possible 
failing components within the system. 
MBD is increasingly being applied in distributed and multi-agent 
systems. While successfully addressing key challenges, MBD has been 
difficult to apply to diagnosing coordination failures. This is 
because many such failures take place at the boundaries between the 
agent and their environment, including other agents. For instance, a 
robot may send a message that another robot, due to an intermittent 
radio failure, did not receive. As a result, the two agents come to 
disagree on an action to be taken. 
A model of both the agents as well as the coordination between the 
agents is defined in advance, which enables to use a general 
reliable and robust diagnosis method which we believe is applicable 
to many multi-agent systems. 
In the first stage of the thesis we present a formalized model of 
social diagnosis in terms of model-based diagnosis approach. The 
multi-agent systems of interest to us are composed of several 
agents, which (by design) are to satisfy certain concurrence and 
mutual--exclusion coordination constraints. 
A fault in the coordination of a multi-agent system may be the 
result of a fault in one of the components or other agent 
components. Given a team model and an observation of the agents' 
components, the goal of the diagnosis is to determine a minimal set 
of abnormal components of agents whose selection may explain the 
inconsistency of the system. 
Based on this formalism, we take a first step towards distributed model-based diagnosis of coordination (inter--agent) failures. Modeling the coordination as a constraint 
graph brings to bear solution methods from distributed constraint 
satisfaction literature, as solutions to the constraint graph form 
the basis for diagnoses. 
We evaluated the use of these algorithms in comprehensive 
experiments with a team of physical and simulated Sony Aibo robots. 
We examined the computational requirements of the algorithms (i.e., 
their run--time and bandwidth usage), and the correctness of the 
diagnoses produced. We find that in general, a trade--off exists 
between computational costs and the ability to produce correct 
diagnosis. 
In the second stage of the thesis we present a design space of 
diagnosis algorithms, distinguishing several phases in the social 
diagnosis process, and providing alternative algorithms for each 
phase. To this aim we focus on the diagnosis of disagreement between 
situated agents, since the control process of such agents and faults 
is relatively simple to model, and we can therefore focus on the 
core communications and computational requirements of the diagnosis. 
We distinguish two phases of diagnosis: (i) selection of the 
diagnosing agents; and (ii) diagnosis of the global team state (by 
the selected agents). We provide alternative algorithms for these 
phases, and empirically evaluate the communications and run--time. 
The results show that centralizing the disambiguation process is a 
key factor in improving communications, but is not a determining 
factor in run--time. On the other hand, explicit reasoning about the 
different sub-systems is a key factor in determining run--time. 
Based on this conclusion we address two principles to achieve the 
reduction of the communication and the computation in large-scale 
teams. First, instead of sending all the information, send only the 
information that is relevant to the diagnosis. Second, diagnosing a 
limited number of agents that represent all others. These principles 
yield a novel diagnosis method which significantly reduces the 
runtime, while keeping communications overhead to a minimum.}, 
  wwwnote = {}, 
  OPTannote = 	 {} 
} 

Generated by bib2html.pl (written by Patrick Riley ) on Thu Feb 22, 2024 11:36:58