Optimizing Block Storage Servers Using Multi-tier Caches Público

Bhandari, Pranav (Summer 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/fq977w23p?locale=pt-BR
Published

Abstract

Storage systems persist large volumes of data and provide fast data access to applications that are ubiquitous in our society, such as banking, social networks, machine learning, video streaming, and ride hailing. The cheap, high-capacity, lowbandwidth backing store provides persistence, whereas the expensive, low-capacity, high-bandwidth cache delivers performance. Multiple storage nodes with backing store and cache provide the required capacity and performance. Large storage clusters with expensive hardware can be costly, both financially and ecologically. In order to reduce the size and consequently the cost of storage systems, we need to squeeze more performance from a single server. An approach is to add a flash cache to support the DRAM cache, which increases the potential throughput of a storage server. This can reduce the number of storage servers that are required to meet the performance requirement. However, it is challenging to determine when using a flash cache can be beneficial. This dissertation compares the performance of storage servers with/without multi-tier caches using diverse servers and workloads, and develops techniques to determine when to use a multitier cache. First, we evaluate the potential performance and cost benefit of using multi-tier caches using simulation and analysis. We developed an algorithm, Cydonia, to determine cost-effective tier sizes given a workload and storage devices on the server. Next, we use trace replay to validate the performance improvement from multi-tier caches and demonstrate the importance of request rate along with miss ratio in determining performance. We train decision tree models that accurately predict whether using a multi-tier cache will improve performance using the large corpus of data collected using trace replay for a given server. We follow it up with BlkSample, a technique to generate accurate block trace samples that reduces the overhead of multi-tier cache analysis and replay.

Table of Contents

1 Introduction

2 Background

3 Turning the Storage Hierarchy On Its Head: The Strange World of Heterogeneous Tiered Caches 

4 Large Scale Study of MT Caching Using Trace Replay

5 BlkSample: Sampling for Block Storage Traces

6 Conclusion and Future Directions

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Palavra-chave
Committee Chair / Thesis Advisor
Committee Members
Última modificação

Primary PDF

Supplemental Files