Automatic Anomaly Localization in Distributed System Público

Zhou, Tao (Spring 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/np193b42g?locale=pt-BR

Published

Abstract

The complexity of a distributed system with large-scale applications poses a great challenge to diagnose its anomalous behaviors, such as faults, high latency, and others. Although many solutions have been proposed to analyze system anomalies, they all have their limitations. Acknowledging the difficulties to identify root causes of anomalies, we avoid the common approach that, through either statistics or inference, attempts to identify root causes. In our thesis, we designed and implemented a structure that can automatically detect and localize anomalies in a distributed system, with the help of retrospective sampling, our self-defined attributes, and a modified Association Algorithm.

Our project performs real-time per-triggered-trace analysis and produces a prioritized list. Every entry is a set that contains information about the possible locations of root causes and the higher rank corresponds to a stronger association with the anomaly. When anomalous symptoms are detected in the system, the operator is expected to quickly locate the root cause with the help of the prioritized list.

1 Introduction

2 Background

2.1 Trace Collection

2.2 Trace Analysis

2.2.1 Machine Learning

2.2.2 Trace Tree & Provenance

2.2.3 Zeno

2.3 Our Goal

2.3.1 Anomaly Localization

2.3.2 Trace Filtering

2.3.3 Trace Aggregation

2.3.4 Portability

2.3.5 Responsiveness

2.3.6 Visualization

3 Motivation: Hindsight

3.1 Head-Based Sampling

3.2 Retrospective Sampling

3.2.1 Basic Architecture

3.2.2 Data Generation

3.2.3 Data Storage

3.2.4 Trigger Mechanism

3.2.5 Trace Collection

3.2.6 Limitations

4 Design

4.1 Attribute

4.2 Attributes Enable Trace Aggregation

4.3 Motivation: Association Rule Mining

4.4 Interest of Association

4.5 Interest of Attribute Sets

4.6 Modified Interest of Attribute Sets

5 Implementation

5.1 Attribute Generation

5.2 Attribute Database of Agent

5.3 Database Functions

5.4 Attribute Query by Central Collector

5.5 Attribute Database of Central Collector

5.6 Implementing the Interest of Association

6 Evaluations

6.1 Sample Output

6.2 Experimentation

6.3 Theoretical Analysis of Efficiency

7 Limitations & Future Directions

7.1 Compatibility with Hindsight

7.2 Existence of Multiple Anomalies

7.3 Compatibility with Other Algorithms

8 Contributions & Takeaways

8.1 Beginning: Design of Attributes

8.2 Change of Mind: Anomaly Localization

8.3 Algorithm: From Bayesian to Association

8.4 Implementation: Hindsight and More

8.5 Takeaways

9 Conclusion

Bibliography

About this Honors Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Emory College
Department	Computer Science
Degree	B.S.
Submission	Honors Thesis
Language	English
Research Field	Computer Science
Palavra-chave	Distributed System Failure Analysis Distributed Tracing System Failure
Committee Chair / Thesis Advisor	Ymir Vigfusson, Emory University
Committee Members	Avani Wildani, Emory University Nosayba El-Sayed, Emory University

Última modificação

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Automatic Anomaly Localization in Distributed System ()	2022-05-04 13:49:09 -0400	Download

Automatic Anomaly Localization in Distributed System Público

Zhou, Tao (Spring 2022)

Abstract

Table of Contents

About this Honors Thesis

Primary PDF

Supplemental Files