The Whole Nine-nine Yard: Observability For The Observers Público
Seshagiri, Vishwanath (Spring 2025)
Abstract
The rapid adoption of microservice architectures in modern cloud computing has
exposed critical limitations in traditional performance measurement methodologies and
observability tools. This work demonstrates how the separation of development and
operational responsibilities in modern enterprises renders conventional observability
tools inadequate to modern software engineering workflows. We present a three-
pronged approach to bridge this gap: First, through systematic characterization of
industrial microservice deployments revealing critical design choice disparities with
academic testbeds. Second, via the development of NL2QL a novel natural language
interface and curated dataset that enables precise log query generation through fine-
tuned language models, achieving up to 75% improvement in accuracy. Finally, by
developing Sauron, a semantic search engine leveraging vector embeddings and
retrieval-augmented generation to overcome terminology inconsistencies in log analysis,
demonstrating 46.7-116.7% improvement in search relevance metrics.
This work establishes that next-generation observability tools must account for both
technical complexity and human factors in distributed systems. By combining domain-
specific language model fine-tuning with semantic search architectures, we show how to
democratize access to observability data to be used beyond traditional logging use cases.
The thesis contributes practical frameworks for performance analysis in microservice
environments and provides empirical evidence that closing the academia-industry
divide requires tooling adaptations mirroring real-world organizational structures and
developer workflows.
Table of Contents
1 Introduction 1
1.1 Introduction 2
1.1.1 Bridging the gap in Microservice testbeds 4
1.1.2 Chatting with Logs 6
1.1.3 SAURON: Full-fledged semantic search 9
2 Bridging the gap in Microservice testbeds 11
2.1 Introduction 12
2.2 Motivation 16
2.2.1 Microservice Testbeds 18
2.2.2 Testbeds' Design Choices 22
2.3 Methodology 30
2.3.1 Recruiting Participants 30
2.3.2 Creating Interview Questions 31
2.3.3 Interviews & Data Analysis 33
2.3.4 Systematization & Mismatches 33
2.4 Results 35
2.4.1 Grounding questions 35
2.4.2 Communication 36
2.4.3 Topology 39
2.4.4 Service Reuse 42
2.4.5 Evolvability 44
2.4.6 Performance & Correctness 46
2.4.7 Security 49
2.5 Analysis 52
2.5.1 Recommendations and Analysis 52
2.5.2 Communication 53
2.5.3 Topology 56
2.5.4 Service Reuse 59
2.5.5 Evolvability 62
2.5.6 Performance Analysis Support 63
2.5.7 Security 65
3 Chatting with Logs 67
3.1 Introduction 67
3.2 NL Interface for Log Search 72
3.2.1 Challenge: Querying Logs is Difficult 72
3.2.2 Background: LogQL 73
3.2.3 Our Vision: LLM assisted query generation 75
3.3 LOGQL-LM 77
3.3.1 Dataset 78
3.3.2 Finetuning LLMs 81
3.3.3 Metrics 82
3.3.4 Demonstration 84
3.4 Evaluation 85
3.4.1 Performance of finetuned models 86
3.4.2 Effect of number of finetuning samples 88
3.4.3 Transferability of the finetuned models 90
3.4.4 Code Quality Analysis 92
3.5 Discussion 93
3.5.1 Threats to Validity 94
4 Sauron: Semantic Search Engine 100
4.1 Introduction 100
4.2 SAURON 105
4.2.1 System 107
4.2.2 Indexing step 108
4.2.3 Querying Step 110
4.3 Evaluation 113
4.3.1 Embedding Model Performance 114
4.3.2 End to End Log Search 115
4.4 Discussion 116
5 Conclusion 119
5.1 Conclusion 119
Bibliography 122
About this Dissertation
| School | |
|---|---|
| Department | |
| Degree | |
| Submission | |
| Language |
|
| Research Field | |
| Palabra Clave | |
| Committee Chair / Thesis Advisor | |
| Committee Members |
Primary PDF
| Thumbnail | Title | Date Uploaded | Actions |
|---|---|---|---|
|
|
The Whole Nine-nine Yard: Observability For The Observers () | 2025-04-29 14:45:29 -0400 |
|
Supplemental Files
| Thumbnail | Title | Date Uploaded | Actions |
|---|