Skip to content

yuanjingsong/PaperNoteCollection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 

Repository files navigation

Storage System Paper List

In this repo, it records some paper related to storage system, including Data Deduplication (aka, dedup), Erasure Coding (aka, EC), general Distributed Storage System (aka, DSS) and other related topics (i.e., Network Security.....), updating from time to time~

[TOC]

A. Data Deduplication

Summary

  1. 99 Deduplication Problems----HotStorage'16 (link) (summary)
  2. A Comprehensive Study of the Past, Present, and Future on Data Deduplication----Proceedings of the IEEE'16 (link)
  3. A Survey of Secure Data Deduplication Schemes for Cloud Storage Systems----ACM Computing Surveys'17 (link)
  4. A Survey of Classification of Storage Deduplication Systems----ACM Computing Surveys'14 (link)
  5. Understanding Data Deduplication Ratios----SNIA'08 (link)

Workload Analysis

  1. Characteristics of Backup Workloads in Production Systems----FAST'12 (link)
  2. A Study of Practical Deduplication----FAST'11 (link) summary
  3. A Long-Term User-Centric Analysis of Deduplication Patterns----MSST'16 (link)
  4. Capacity Forecasting in a Backup Storage Environment----LISA'11 (link) summary
  5. Generating Realistic Datasets for Deduplication Analysis ---- ATC'12 (link)

Deduplication System Design

  1. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System----FAST'08 (link) summary
  2. dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)----MSST'10 (link) summary
  3. Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup----MASCOTS'09 (link) summary
  4. Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality----FAST'09 (link) summary
  5. Building a High-performance Deduplication System----USENIX ATC'11(link)
  6. Primary Data Deduplication - Large Scale Study and System Design----USENIX ATC'12
  7. Storage Efficiency Opportunities and Analysis for Video Repositories----HotStorage'15
  8. Venti: A New Approach to Archival Storage----FAST'02 (link)
  9. ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory----USENIX ATC'10 (link)
  10. Data Domain Cloud Tier: Backup here, Backup there, Deduplicated Everywhere!----USENIX ATC'19 (link)
  11. SmartDedup: Optimizing Deduplication for Resource-constrained Devices----USENIX ATC'19 (link)
  12. Probabilistic Deduplication for Cluster-Based Storage Systems----SoCC'12 (link)
  13. Can't We All Get Along? Redesigning Protection Storage for Modern Workloads----USENIX ATC'18 (link) summary
  14. Deduplication in SSDs: Model and quantitative analysis----MSST'12 (link)
  15. SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme withLow RAM Overhead and High Throughput ---- ATC'11 (link)
  16. Design Tradeoffs for Data Deduplication Performance in Backup Workloads ----FAST'15 (link)
  17. The Dilemma between Deduplication and Locality: Can Both be Achieved? ---- FAST'21 (link)

Restore Performance

  1. RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups----APSys'13 (link) summary
  2. ALACC: Accelerating Restore Performance of Data Deduplication Systems Using Adaptive Look-Ahead Window Assisted Chunk Caching----FAST'18 (link) summary
  3. Reducing Impact of Data Fragmentation Caused by In-line Deduplication----SYSTOR'12 (link)
  4. Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets----MASCOTS'12
  5. Accelerating Restore and Garbage Collection in Deduplication-based Backup System via Exploiting Historical Information----USENIX ATC'14 (link)
  6. Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance----FAST'19
  7. Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication---FAST'13 (link) summary

Secure Deduplication

  1. Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds----HotStorage'14 (link) summary
  2. CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal----USENIX ATC'15 (link) summary
  3. Information Leakage in Encrypted Deduplication via Frequency Analysis----DSN'17
  4. DupLESS: Server-Aided Encryption for Deduplicated Storage----USENIX Security'13 (link) summary
  5. Side Channels in Cloud Services, the Case of Deduplication in Cloud Storage----S&P'10 (link) summary
  6. Side Channels in Deduplication: Trade-offs between Leakage and Efficiency----AsiaCCS'17
  7. On Information Leakage in Deduplication Storage Systems----CCS Workshop'16 summary
  8. SecDep: A User-Aware Efficient Fine-Grained Secure Deduplication Scheme with Multi-Level Key Management----MSST'15 (link)
  9. Message-Locked Encryption and Secure Deduplication----EuroCrypt'13
  10. Proofs of Ownership in Remote Storage System----CCS'11
  11. Tapping the Potential: Secure Chunk-based Deduplication of Encrypted Data for Cloud Backup----CNS'18 summary
  12. A Bandwidth-Efficient Middleware for Encrypted Deduplication----DSC'18 summary
  13. Bloom Filter Based Privacy Preserving Deduplication System----Springer International Conference on Security & Privacy'19 (link) summary
  14. Enhanced Secure Thresholded Data Deduplication Scheme for Cloud Storage----TDSC'16 (link) summary
  15. Transparent Data Deduplication in the Cloud----CCS'15 (link) summary
  16. Secure Deduplication of Encrypted Data without Additional Independent Servers----CCS'15 (link) summary
  17. Fast and Secure Laptop Backups with Encrypted Deduplication----LISA'10 (link)
  18. Weak Leakage-Resilient Client-side Deduplication of Encrypted Data in Cloud Storage----ASIA CCS'13 (link)
  19. Lamassu: Storage-Efficient Host-Side Encryption----USENIX ATC'15 (link)
  20. Mitigating Traffic-based Side Channel Attacks in Bandwidth-efficient Cloud Storage----IPDPS'18 (link) summary
  21. RARE: Defeating Side Channels based on Data-Deduplication in Cloud Storage----INFOCOM'18 (link)
  22. PerfectDedup: Secure Data Deduplication----Data Privacy Management, and Security Assurance'15 (link)

Metadata Management

  1. Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection----MSST'19 (link)
  2. Rekeying for Encrypted Deduplication Storage----DSN'16 (link) summary
  3. File Recipe Compression in Data Deduplication Systems----FAST'13 (link) summary
  4. Metadata Considered Harmful ... to Deduplication----HotStorage'15 (link) summary
  5. GoSeed: Generating an Optimal Seeding Plan for Deduplicated Storage ---- FAST'19 (link) (* This paper is actually not for metadata management, it is for balancing data in data transfer between volumes)

Indexing & Caching

  1. LIPA: A Learning-based Indexing and Prefetching Approach for Data Deduplication----MSST'19 (link) summary
  2. Lazy Exact Deduplication----MSST'16
  3. MAD2: A Scalable High-throughput Exact Deduplication Approach for Network Backup Services----MSST'10
  4. Block Locality Caching for Data Deduplication----SYSTOR'13 (link)
  5. HANDS: A Heuristically Arranged Non-Backup In-line Deduplication System----ICDE'13 (link)
  6. Austere Flash Caching with Deduplication and Compression---ATC'20 (link)

Deduplication Estimation

  1. Estimating Unseen Deduplication - from Theory to Practice----FAST'16 (link) summary
  2. Estimation of Deduplication Ratios in Large Data Sets----MSST'12 (link) summary
  3. Sketching Volume Capacities in Deduplicated Storage----FAST'19 (link) summary
  4. Estimating Duplication by Content-based Sampling----USENIX ATC'13 summary
  5. Content-aware Load Balancing for Distributed Backup----LISA'11 (link)
  6. Rangoli: Space Management in Deduplication Environments----SYSTOR'13 (link)

Post-Deduplication: Data Compression

  1. Finesse: Fine-Grained Feature Locality based Fast Resemblance Detection for Post-Deduplication Delta Compression----FAST'19 (link) summary
  2. The Design of a Similarity Based Deduplication System----SYSTOR'09
  3. Ddelta: A deduplication-inspired fast delta compression approach ---- Performance Evaluation (link)
  4. Exploring the Potential of Fast Delta Encoding: Marching to a Higher Compression Ratio ---- TOPDS'2020 (link)
  5. Odess: Speeding up Resemblance Detection for RedundancyElimination by Fast Content-Defined Sampling ---- ICDE' 2021 (link)

Memory && Block-Layer Deduplication

  1. UKSM: Swift Memory Deduplication via Hierarchical and Adaptive Memory Region Distilling----FAST'18 summary
  2. Using Hints to Improve Inline Block-Layer Deduplication----FAST'16 summary
  3. XLM: More Effective Memory Deduplication Scanners through Cross-Layer Hints----USENIX ATC'13

Data Chunking

  1. SS-CDC: A Two-stage Parallel Content-Defined Chunking for Deduplicating Backup Storage----SYSTOR'19 summary
  2. Frequency Based Chunking for Data De-Duplication----MASCOTS'10 summary
  3. Bimodal Content Defined Chunking for Backup Streams----FAST'10
  4. Delta: a Deduplication-inspired Fast Delta Compression Approach----IFIP Performance'14
  5. P-dedupe: Exploiting Parallelism in Data Deduplication System----NAS'12
  6. MUCH: Multi-threaded Content-Based File Chunking----TC'15
  7. Multi-Level Comparison of Data Deduplication in a Backup Scenario----SYSTOR'09
  8. A Framework for Analyzing the Improving Content-Based Chunking Algorithms----HP Technique Report'05
  9. FastCDC - The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems ---- IEEE TOPDS' 2020 link

Deduplication Reliability

  1. A Simulation Analysis of Redundancy and Reliability in Primary Storage Deduplication----TC'18 (link) summary
  2. A Simulation Analysis of Reliability in Primary Storage Deduplication----IISWC'16

Cache Deduplication

  1. CDAC: Content-Driven Deduplication-Aware Storage Cache----MSST'19 (link)
  2. PLC-cache: Endurable SSD cache for deduplication-based primary storage----MSST'14 (link)
  3. Nitro: A Capacity-Optimized SSD Cache for Primary Storage----USENIX ATC'14 (link)

Benchmark

  1. SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks----FAST'15 (link)

Garbage Collection

  1. Memory Efficient Sanitization of a Deduplicated Storage System----FAST'13 (link)
  2. The Logic of Physical Garbage Collection in Deduplicating Storage----FAST'17 (link)(summary)
  3. A scalable dedupliation and garbage collection engine for incremental backup----SYSTOR'13 (link)
  4. Concurrent Deletion in a Distributed Content-AddressableStorage Systemwith Global Deduplication----FAST'13 (link)

B. Erasure Coding

Erasure Coding Basics

  1. Network Coding for Distributed Storage System----TIT'09
  2. A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage----FAST'09
  3. Erasure Coding for Cloud Storage Systems: A Survey----By Jun Li in 2013

Improve Data Recovery

  1. CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems----MSST'13
  2. Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters----DSN'14
  3. Repair Pipelining for Erasure-Coded Storage----USENIX ATC'17
  4. A Tale of Two Erasure Codes in HDFS----FAST'15
  5. On the Speedup of Single-Disk Failure Recovery in XOR-Coded Storage Systems: Theory and Prantice----MSST'12
  6. Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads----FAST'12
  7. Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage----SYSTOR'14
  8. Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File System----DSN'15
  9. Reconsidering Single Failure Recovery in Clustered File Systems----DSN'16 summary
  10. RAFI: Risk-Aware Failure Identification to Improve the RAS in Erasure-coded Data Center----USENIX ATC'18
  11. Partial-Parallel-Repair (PPR): A Distributed Technique for Repairing Erasure Coded Storage----EuroSys'16

EC Update Issue

  1. Cross-Rack-Aware Updates in Erasure-Coded Data Centers----ICPP'18
  2. PARIX: Speculative Partial Writes in Erasure-Coded Systems----USENIX ATC'17

EC Framework

  1. OpenEC: Toward Unified and Configurable Erasure Coding Management in Distributed Storage Systems----FAST'19

New EC code

  1. CodePlugin: Plugging Deduplication into Erasure Coding for Cloud Storage----HotCloud'15
  2. Double Regenerating Codes for Hierarchical Data Centers----ISIT'16
  3. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems----ToS'13
  4. Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage and Network-bandwidth----FAST'15
  5. Opening the Chrysalis: On the Real Repair Performance of MSR Codes----FAST'16
  6. NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds----FAST'12
  7. Erasure Coding in Windows Azure Storage----USENIX ATC'12
  8. XORing Elephants: Novel Erasure Codes for Big Data----VLDB'13
  9. Clay Codes: Moulding MDS Codes to Yield an MSR Code----FAST'18
  10. Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments----DSN'18 summary
  11. On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes----USENIX ATC'18
  12. Parallelism-Aware Locally Repairable Code for Distributed Storage Systems----ICDCS'18
  13. Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems----HotStorage'15
  14. Pipelined regeneration with Regenerating Codes for Distributed Storage Systems----NetCod'11
  15. Cooperative Pipelined Regeneration in Distribution Storage Systems----INFOCOM'14
  16. Zebra: Demand-aware Erasure Coding for Distributed Storage Systems----IWQoS'16
  17. On Data Parallelism of Erasure Coding in Distributed Storage Systems----ICDCS'17

EC System

  1. Giza: Erasure Coding Objects across Global Data Centers----USENIX ATC'17
  2. EC-Store: Bridging the Gap Between Storage and Latency in Distributed Erasure Coded Systems----ICDCS'18
  3. Latency Reduction and Load Balancing in Coded Storage Systems----SoCC'17

C. Security

Survey

  1. A Survey on Systems Security Metrics----ACM Computing Surveys'16

Secret Sharing

  1. How to Best Share a Big Secret----SYSTOR'18
  2. AONT-RS: Blending Security and Performance in Dispersed Storage Systems----FAST'11
  3. Secure Deletion for a Versioning File System----FAST'05

Data Encryption

  1. Differentially Private Access Patterns for Searchable Symmetric Encryption----INFOCOM'18 summary
  2. Frequency-Hiding Order-Preserving Encryption----CCS'15
  3. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response----CCS'14
  4. Privacy at Scale: Local Differential Privacy in Practice----SIGMOD'18
  5. Frequency-smoothing Encryption: Preventing Snapshot Attacks on Deterministically Encrypted Data----IACR'17 summary
  6. Efficient Homophonic Coding----TIT'99
  7. A Note on the Optimality of Frequency Analysis vs. lp-Optimization----IACR'15
  8. Inference Attacks on Property-Preserving Encrypted Databases----CCS'15
  9. How Far Can we Go Beyond Linear Cryptanalysis?----AsiaCRYPTO'04
  10. CryptDB: Protecting Confidentiality with Encrypted Query Processing----SOSP'11 (link)
  11. Secure Deletion for a Versioning File System----FAST'05
  12. Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space----USENIX Security'11 (link)

Differential Privacy

  1. Differential Privacy----ICALP'06 (link)
  2. Calibrating Noise to Sensitivity in Private Data Analysis----TCC'06 (link)

SGX

  1. NEXUS: Practical and Secure Access Control on Untrusted Storage Platforms using Client-side SGX----DSN'19 (link)

D. Others

Multi-Cloud System

  1. Kurma: Secure Geo-Distributed Multi-Cloud Storage Gateways----SYSTOR'19 summary
  2. SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services----SOSP'13 summary
  3. CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Service----NSDI'15
  4. A Day Late and a Dollar Short: The Case for Research on Cloud Billing Systems----HotCloud'14

New PAXOS

  1. In Search of an Understandable Consensus Algorithm----USENIX ATC'14

Distributed File System

  1. Ceph: A Salable, High-Performance Distributed File System----OSDI'06
  2. The Hadoop Distributed File System----MSST'10 (link) summary
  3. RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters----PDSW'07
  4. CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data----SC'06

Hash

  1. Compare-by-Hash: A Reasoned Analysis----USENIX ATC'06 (link) summary
  2. An Analysis of Compare-by-Hash----HotOS'03 (link)

Streaming Process

  1. A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring----IPDPS'10

About

Some paper lists related to storage system

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published