GFS

Goals (shared storage)

  • capacity
  • e.g., 1000 servers, 300TB
  • performance
  • fault tolerance
  • map-reduce

Approach

  • filesystem-like API
  • proprietary library (write/read/append)
  • not POSIX
  • single master (metadata)
  • (filename, offset) -> chunk
  • chunk size: 64MB. why not smaller, such as 4KB(hdd), 4MB (ssd)?
  • 3-way replication

How GFS works

  • client communicates with master to retrieve data locations
  • client buffers information locally
  • client sends to data servers

Performance

  • why single master and 64 MB sufficient?
  • workload: large files; sequential read/write
  • not a good design if
  • small files (aggregate)
  • random accesses (buffer)

Consistency

  • correctness: outcome = expectation
  • concurrency
  • failures
  • tradeoff
  • weak consistency: easier to implement, hard to use
  • strong consistency: hard to implement, easy to use

Case 1 (strawman, inconsistent with concurrency)

S1: C1  C2
S2: C2  C1

Case 2 (consistent with concurrency)

S1(P): C1 C2 C1-id C2-id      
S2   : C2 C1             S1-id-C1 S1-id-C2

Case 3 (consistent but undefined)

  • a write operation breaks into many smaller write operations

Case 4 (inconsistent with failures)

  • follower fails
  • primary fails
  • two primary? (what about serial numbers?)
  • leases to avoid two masters

Case 5 (more consistency anomaly)

S1(P): C1 C2 C2-id C1-id C3-read
S2   : C2 C1                     S1-id-C2 C3-read S1-id-C1 C3-read