Automatic Anomaly Localization in Distributed System Restricted; Files Only

Zhou, Tao (Spring 2022)

Permanent URL:


The complexity of a distributed system with large-scale applications poses a great challenge to diagnose its anomalous behaviors, such as faults, high latency, and others. Although many solutions have been proposed to analyze system anomalies, they all have their limitations. Acknowledging the difficulties to identify root causes of anomalies, we avoid the common approach that, through either statistics or inference, attempts to identify root causes. In our thesis, we designed and implemented a structure that can automatically detect and localize anomalies in a distributed system, with the help of retrospective sampling, our self-defined attributes, and a modified Association Algorithm.

Our project performs real-time per-triggered-trace analysis and produces a prioritized list. Every entry is a set that contains information about the possible locations of root causes and the higher rank corresponds to a stronger association with the anomaly. When anomalous symptoms are detected in the system, the operator is expected to quickly locate the root cause with the help of the prioritized list.

Table of Contents

1 Introduction

2 Background

2.1 Trace Collection

2.2 Trace Analysis

2.2.1 Machine Learning

2.2.2 Trace Tree & Provenance

2.2.3 Zeno

2.3 Our Goal

2.3.1 Anomaly Localization

2.3.2 Trace Filtering

2.3.3 Trace Aggregation

2.3.4 Portability

2.3.5 Responsiveness

2.3.6 Visualization

3 Motivation: Hindsight

3.1 Head-Based Sampling

3.2 Retrospective Sampling

3.2.1 Basic Architecture

3.2.2 Data Generation

3.2.3 Data Storage

3.2.4 Trigger Mechanism

3.2.5 Trace Collection

3.2.6 Limitations

4 Design

4.1 Attribute

4.2 Attributes Enable Trace Aggregation

4.3 Motivation: Association Rule Mining

4.4 Interest of Association

4.5 Interest of Attribute Sets

4.6 Modified Interest of Attribute Sets

5 Implementation

5.1 Attribute Generation

5.2 Attribute Database of Agent

5.3 Database Functions

5.4 Attribute Query by Central Collector

5.5 Attribute Database of Central Collector

5.6 Implementing the Interest of Association

6 Evaluations

6.1 Sample Output

6.2 Experimentation

6.3 Theoretical Analysis of Efficiency

7 Limitations & Future Directions

7.1 Compatibility with Hindsight

7.2 Existence of Multiple Anomalies

7.3 Compatibility with Other Algorithms

8 Contributions & Takeaways

8.1 Beginning: Design of Attributes

8.2 Change of Mind: Anomaly Localization

8.3 Algorithm: From Bayesian to Association

8.4 Implementation: Hindsight and More

8.5 Takeaways

9 Conclusion


About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified Preview image embargoed

Primary PDF

Supplemental Files