• Sorted by Date • Classified by Publication Type • Classified by Topic • Grouped by Student (current) • Grouped by Former Students •
Meir Kalech. Diagnosing Coordination Faults in Multi-Agent Systems.
Ph.D. Thesis, Bar Ilan University, 2007.
With increasing deployment of distributed applications in complex, dynamic settings, there is an increasing need to also be able to respond to failures that occur in their coordinated operation, in order to facilitate recovery and reestablish collaboration. We refer to this type of diagnosis as social diagnosis, since it focuses on finding causes for failures to maintain designer--specified relationships between sub-systems. Naive implementations of social diagnosis processes can require significant computation and communications overhead, which prohibit them from being effective as the number of sub-systems is scaled up, or the number of failures to diagnose increases. We thus seek to examine in depth the communication and computation overhead of diagnosis. We choose to use a model-based diagnosis (MBD) approach that relies on a model of the diagnosed system, which simulates the behavior of the system given the operational context (typically, the system's inputs). The resulting simulated behavior (typically, outputs) are compared to the actual behavior to detect discrepancies indicating failures. The model can then be used to pinpoint possible failing components within the system. MBD is increasingly being applied in distributed and multi-agent systems. While successfully addressing key challenges, MBD has been difficult to apply to diagnosing coordination failures. This is because many such failures take place at the boundaries between the agent and their environment, including other agents. For instance, a robot may send a message that another robot, due to an intermittent radio failure, did not receive. As a result, the two agents come to disagree on an action to be taken. A model of both the agents as well as the coordination between the agents is defined in advance, which enables to use a general reliable and robust diagnosis method which we believe is applicable to many multi-agent systems. In the first stage of the thesis we present a formalized model of social diagnosis in terms of model-based diagnosis approach. The multi-agent systems of interest to us are composed of several agents, which (by design) are to satisfy certain concurrence and mutual--exclusion coordination constraints. A fault in the coordination of a multi-agent system may be the result of a fault in one of the components or other agent components. Given a team model and an observation of the agents' components, the goal of the diagnosis is to determine a minimal set of abnormal components of agents whose selection may explain the inconsistency of the system. Based on this formalism, we take a first step towards distributed model-based diagnosis of coordination (inter--agent) failures. Modeling the coordination as a constraint graph brings to bear solution methods from distributed constraint satisfaction literature, as solutions to the constraint graph form the basis for diagnoses. We evaluated the use of these algorithms in comprehensive experiments with a team of physical and simulated Sony Aibo robots. We examined the computational requirements of the algorithms (i.e., their run--time and bandwidth usage), and the correctness of the diagnoses produced. We find that in general, a trade--off exists between computational costs and the ability to produce correct diagnosis. In the second stage of the thesis we present a design space of diagnosis algorithms, distinguishing several phases in the social diagnosis process, and providing alternative algorithms for each phase. To this aim we focus on the diagnosis of disagreement between situated agents, since the control process of such agents and faults is relatively simple to model, and we can therefore focus on the core communications and computational requirements of the diagnosis. We distinguish two phases of diagnosis: (i) selection of the diagnosing agents; and (ii) diagnosis of the global team state (by the selected agents). We provide alternative algorithms for these phases, and empirically evaluate the communications and run--time. The results show that centralizing the disambiguation process is a key factor in improving communications, but is not a determining factor in run--time. On the other hand, explicit reasoning about the different sub-systems is a key factor in determining run--time. Based on this conclusion we address two principles to achieve the reduction of the communication and the computation in large-scale teams. First, instead of sending all the information, send only the information that is relevant to the diagnosis. Second, diagnosing a limited number of agents that represent all others. These principles yield a novel diagnosis method which significantly reduces the runtime, while keeping communications overhead to a minimum.
@PhdThesis{kalech-phd, author = {Meir Kalech}, title = {Diagnosing Coordination Faults in Multi-Agent Systems}, school = {{B}ar {I}lan {U}niversity}, year = {2007}, OPTkey = {}, OPTtype = {}, OPTaddress = {}, OPTmonth = {}, OPTnote = {}, abstract = { With increasing deployment of distributed applications in complex, dynamic settings, there is an increasing need to also be able to respond to failures that occur in their coordinated operation, in order to facilitate recovery and reestablish collaboration. We refer to this type of diagnosis as social diagnosis, since it focuses on finding causes for failures to maintain designer--specified relationships between sub-systems. Naive implementations of social diagnosis processes can require significant computation and communications overhead, which prohibit them from being effective as the number of sub-systems is scaled up, or the number of failures to diagnose increases. We thus seek to examine in depth the communication and computation overhead of diagnosis. We choose to use a model-based diagnosis (MBD) approach that relies on a model of the diagnosed system, which simulates the behavior of the system given the operational context (typically, the system's inputs). The resulting simulated behavior (typically, outputs) are compared to the actual behavior to detect discrepancies indicating failures. The model can then be used to pinpoint possible failing components within the system. MBD is increasingly being applied in distributed and multi-agent systems. While successfully addressing key challenges, MBD has been difficult to apply to diagnosing coordination failures. This is because many such failures take place at the boundaries between the agent and their environment, including other agents. For instance, a robot may send a message that another robot, due to an intermittent radio failure, did not receive. As a result, the two agents come to disagree on an action to be taken. A model of both the agents as well as the coordination between the agents is defined in advance, which enables to use a general reliable and robust diagnosis method which we believe is applicable to many multi-agent systems. In the first stage of the thesis we present a formalized model of social diagnosis in terms of model-based diagnosis approach. The multi-agent systems of interest to us are composed of several agents, which (by design) are to satisfy certain concurrence and mutual--exclusion coordination constraints. A fault in the coordination of a multi-agent system may be the result of a fault in one of the components or other agent components. Given a team model and an observation of the agents' components, the goal of the diagnosis is to determine a minimal set of abnormal components of agents whose selection may explain the inconsistency of the system. Based on this formalism, we take a first step towards distributed model-based diagnosis of coordination (inter--agent) failures. Modeling the coordination as a constraint graph brings to bear solution methods from distributed constraint satisfaction literature, as solutions to the constraint graph form the basis for diagnoses. We evaluated the use of these algorithms in comprehensive experiments with a team of physical and simulated Sony Aibo robots. We examined the computational requirements of the algorithms (i.e., their run--time and bandwidth usage), and the correctness of the diagnoses produced. We find that in general, a trade--off exists between computational costs and the ability to produce correct diagnosis. In the second stage of the thesis we present a design space of diagnosis algorithms, distinguishing several phases in the social diagnosis process, and providing alternative algorithms for each phase. To this aim we focus on the diagnosis of disagreement between situated agents, since the control process of such agents and faults is relatively simple to model, and we can therefore focus on the core communications and computational requirements of the diagnosis. We distinguish two phases of diagnosis: (i) selection of the diagnosing agents; and (ii) diagnosis of the global team state (by the selected agents). We provide alternative algorithms for these phases, and empirically evaluate the communications and run--time. The results show that centralizing the disambiguation process is a key factor in improving communications, but is not a determining factor in run--time. On the other hand, explicit reasoning about the different sub-systems is a key factor in determining run--time. Based on this conclusion we address two principles to achieve the reduction of the communication and the computation in large-scale teams. First, instead of sending all the information, send only the information that is relevant to the diagnosis. Second, diagnosing a limited number of agents that represent all others. These principles yield a novel diagnosis method which significantly reduces the runtime, while keeping communications overhead to a minimum.}, wwwnote = {}, OPTannote = {} }
Generated by bib2html.pl (written by Patrick Riley ) on Fri Aug 30, 2024 17:29:51