Optimizing Block Storage Servers Using Multi-tier Caches Público

Bhandari, Pranav (Summer 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/fq977w23p?locale=pt-BR

Published

Abstract

Storage systems persist large volumes of data and provide fast data access to applications that are ubiquitous in our society, such as banking, social networks, machine learning, video streaming, and ride hailing. The cheap, high-capacity, lowbandwidth backing store provides persistence, whereas the expensive, low-capacity, high-bandwidth cache delivers performance. Multiple storage nodes with backing store and cache provide the required capacity and performance. Large storage clusters with expensive hardware can be costly, both financially and ecologically. In order to reduce the size and consequently the cost of storage systems, we need to squeeze more performance from a single server. An approach is to add a flash cache to support the DRAM cache, which increases the potential throughput of a storage server. This can reduce the number of storage servers that are required to meet the performance requirement. However, it is challenging to determine when using a flash cache can be beneficial. This dissertation compares the performance of storage servers with/without multi-tier caches using diverse servers and workloads, and develops techniques to determine when to use a multitier cache. First, we evaluate the potential performance and cost benefit of using multi-tier caches using simulation and analysis. We developed an algorithm, Cydonia, to determine cost-effective tier sizes given a workload and storage devices on the server. Next, we use trace replay to validate the performance improvement from multi-tier caches and demonstrate the importance of request rate along with miss ratio in determining performance. We train decision tree models that accurately predict whether using a multi-tier cache will improve performance using the large corpus of data collected using trace replay for a given server. We follow it up with BlkSample, a technique to generate accurate block trace samples that reduces the overhead of multi-tier cache analysis and replay.

1 Introduction

2 Background

3 Turning the Storage Hierarchy On Its Head: The Strange World of Heterogeneous Tiered Caches

4 Large Scale Study of MT Caching Using Trace Replay

5 BlkSample: Sampling for Block Storage Traces

6 Conclusion and Future Directions

About this Dissertation

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Laney Graduate School
Department	Computer Science and Informatics
Degree	Ph.D.
Submission	Dissertation
Language	English
Research Field	Computer Science
Palavra-chave	Cache Storage
Committee Chair / Thesis Advisor	Avani Wildani, Emory University
Committee Members	Vasily Tarasov, IBM Michelangelo Grigni, Emory University Ymir Vigfusson, Emory University

Última modificação

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Optimizing Block Storage Servers Using Multi-tier Caches ()	2024-08-08 11:29:12 -0400	Download

Optimizing Block Storage Servers Using Multi-tier Caches Público

Bhandari, Pranav (Summer 2024)

Abstract

Table of Contents

About this Dissertation

Primary PDF

Supplemental Files