Gal A. Kaminka: Publications

• Sorted by Date • Classified by Publication Type • Classified by Topic • Grouped by Student (current) • Grouped by Former Students •

Diagnosing Coordination Faults in Multi-Agent Systems

Meir Kalech. Diagnosing Coordination Faults in Multi-Agent Systems. Ph.D. Thesis, Bar Ilan University, 2007.

Download

[PDF]899.4kB

Abstract

With increasing deployment of distributed applications in complex, dynamic settings, there is an increasing need to also be able to respond to failures that occur in their coordinated operation, in order to facilitate recovery and reestablish collaboration. We refer to this type of diagnosis as social diagnosis, since it focuses on finding causes for failures to maintain designer--specified relationships between sub-systems. Naive implementations of social diagnosis processes can require significant computation and communications overhead, which prohibit them from being effective as the number of sub-systems is scaled up, or the number of failures to diagnose increases. We thus seek to examine in depth the communication and computation overhead of diagnosis. We choose to use a model-based diagnosis (MBD) approach that relies on a model of the diagnosed system, which simulates the behavior of the system given the operational context (typically, the system's inputs). The resulting simulated behavior (typically, outputs) are compared to the actual behavior to detect discrepancies indicating failures. The model can then be used to pinpoint possible failing components within the system. MBD is increasingly being applied in distributed and multi-agent systems. While successfully addressing key challenges, MBD has been difficult to apply to diagnosing coordination failures. This is because many such failures take place at the boundaries between the agent and their environment, including other agents. For instance, a robot may send a message that another robot, due to an intermittent radio failure, did not receive. As a result, the two agents come to disagree on an action to be taken. A model of both the agents as well as the coordination between the agents is defined in advance, which enables to use a general reliable and robust diagnosis method which we believe is applicable to many multi-agent systems. In the first stage of the thesis we present a formalized model of social diagnosis in terms of model-based diagnosis approach. The multi-agent systems of interest to us are composed of several agents, which (by design) are to satisfy certain concurrence and mutual--exclusion coordination constraints. A fault in the coordination of a multi-agent system may be the result of a fault in one of the components or other agent components. Given a team model and an observation of the agents' components, the goal of the diagnosis is to determine a minimal set of abnormal components of agents whose selection may explain the inconsistency of the system. Based on this formalism, we take a first step towards distributed model-based diagnosis of coordination (inter--agent) failures. Modeling the coordination as a constraint graph brings to bear solution methods from distributed constraint satisfaction literature, as solutions to the constraint graph form the basis for diagnoses. We evaluated the use of these algorithms in comprehensive experiments with a team of physical and simulated Sony Aibo robots. We examined the computational requirements of the algorithms (i.e., their run--time and bandwidth usage), and the correctness of the diagnoses produced. We find that in general, a trade--off exists between computational costs and the ability to produce correct diagnosis. In the second stage of the thesis we present a design space of diagnosis algorithms, distinguishing several phases in the social diagnosis process, and providing alternative algorithms for each phase. To this aim we focus on the diagnosis of disagreement between situated agents, since the control process of such agents and faults is relatively simple to model, and we can therefore focus on the core communications and computational requirements of the diagnosis. We distinguish two phases of diagnosis: (i) selection of the diagnosing agents; and (ii) diagnosis of the global team state (by the selected agents). We provide alternative algorithms for these phases, and empirically evaluate the communications and run--time. The results show that centralizing the disambiguation process is a key factor in improving communications, but is not a determining factor in run--time. On the other hand, explicit reasoning about the different sub-systems is a key factor in determining run--time. Based on this conclusion we address two principles to achieve the reduction of the communication and the computation in large-scale teams. First, instead of sending all the information, send only the information that is relevant to the diagnosis. Second, diagnosing a limited number of agents that represent all others. These principles yield a novel diagnosis method which significantly reduces the runtime, while keeping communications overhead to a minimum.

Additional Information

BibTeX

@PhdThesis{kalech-phd,
author = {Meir Kalech},
title = {Diagnosing Coordination Faults in Multi-Agent Systems},
school = {{B}ar {I}lan {U}niversity},
year = {2007},
OPTkey = {},
OPTtype = {},
OPTaddress = {},
OPTmonth = {},
OPTnote = {},
abstract = {
With increasing deployment of distributed applications in complex,
dynamic settings, there is an increasing need to also be able to
respond to failures that occur in their coordinated operation, in
order to facilitate recovery and reestablish collaboration. We refer
to this type of diagnosis as social diagnosis, since it focuses on
finding causes for failures to maintain designer--specified
relationships between sub-systems. Naive implementations of social
diagnosis processes can require significant computation and
communications overhead, which prohibit them from being effective as
the number of sub-systems is scaled up, or the number of failures
to diagnose increases. We thus seek to examine in depth the
communication and computation overhead of diagnosis.
We choose to use a model-based diagnosis (MBD) approach that
relies on a model of the diagnosed system, which simulates the
behavior of the system given the operational context (typically, the
system's inputs). The resulting simulated behavior (typically,
outputs) are compared to the actual behavior to detect discrepancies
indicating failures. The model can then be used to pinpoint possible
failing components within the system.
MBD is increasingly being applied in distributed and multi-agent
systems. While successfully addressing key challenges, MBD has been
difficult to apply to diagnosing coordination failures. This is
because many such failures take place at the boundaries between the
agent and their environment, including other agents. For instance, a
robot may send a message that another robot, due to an intermittent
radio failure, did not receive. As a result, the two agents come to
disagree on an action to be taken.
A model of both the agents as well as the coordination between the
agents is defined in advance, which enables to use a general
reliable and robust diagnosis method which we believe is applicable
to many multi-agent systems.
In the first stage of the thesis we present a formalized model of
social diagnosis in terms of model-based diagnosis approach. The
multi-agent systems of interest to us are composed of several
agents, which (by design) are to satisfy certain concurrence and
mutual--exclusion coordination constraints.
A fault in the coordination of a multi-agent system may be the
result of a fault in one of the components or other agent
components. Given a team model and an observation of the agents'
components, the goal of the diagnosis is to determine a minimal set
of abnormal components of agents whose selection may explain the
inconsistency of the system.
Based on this formalism, we take a first step towards distributed model-based diagnosis of coordination (inter--agent) failures. Modeling the coordination as a constraint
graph brings to bear solution methods from distributed constraint
satisfaction literature, as solutions to the constraint graph form
the basis for diagnoses.
We evaluated the use of these algorithms in comprehensive
experiments with a team of physical and simulated Sony Aibo robots.
We examined the computational requirements of the algorithms (i.e.,
their run--time and bandwidth usage), and the correctness of the
diagnoses produced. We find that in general, a trade--off exists
between computational costs and the ability to produce correct
diagnosis.
In the second stage of the thesis we present a design space of
diagnosis algorithms, distinguishing several phases in the social
diagnosis process, and providing alternative algorithms for each
phase. To this aim we focus on the diagnosis of disagreement between
situated agents, since the control process of such agents and faults
is relatively simple to model, and we can therefore focus on the
core communications and computational requirements of the diagnosis.
We distinguish two phases of diagnosis: (i) selection of the
diagnosing agents; and (ii) diagnosis of the global team state (by
the selected agents). We provide alternative algorithms for these
phases, and empirically evaluate the communications and run--time.
The results show that centralizing the disambiguation process is a
key factor in improving communications, but is not a determining
factor in run--time. On the other hand, explicit reasoning about the
different sub-systems is a key factor in determining run--time.
Based on this conclusion we address two principles to achieve the
reduction of the communication and the computation in large-scale
teams. First, instead of sending all the information, send only the
information that is relevant to the diagnosis. Second, diagnosing a
limited number of agents that represent all others. These principles
yield a novel diagnosis method which significantly reduces the
runtime, while keeping communications overhead to a minimum.},
wwwnote = {},
OPTannote = {}
}

Generated by bib2html.pl (written by Patrick Riley ) on Mon Feb 03, 2025 16:33:37