A Formal Approach for Failure Detection in Large-Scale Distributed Systems using Abstract State Machines
Sprache des Vortragstitels:
28th International Conference on Database and Expert Systems Applications
Sprache des Tagungstitel:
Large-scale distributed systems have been widely adopted in
various domains due to their ability to compose services and resources tailored to user requirements. Such systems are characterized by high complexity and heterogeneity. Maintaining a high-level availability and a normal execution of the components implies precise monitoring and robust adaptation. Monitors capture relevant metrics and transform them to meaningful knowledge, which is further used in justifying adaptation actions. The current paper proposes an Abstract State Machine model for defining monitoring processes addressing failures and unavailability of the system nodes. The specification is simulated and validated with the aid of the ASMETA toolset. The solution is complemented with a small ontology reflecting the structure of the system. We emphasize the role of formal models in achieving the proposed requirements.