P2P and DHT
Peer-to-peer
- Practical distributed systems in early 2000s
- eMule 2000, BitTorrent 2001
- BitTorrent
- tracker servers exchange info “who has the file”
- client contact tracker to contact the file owners
- after finish, the client report to tracker as an owner
- centralized bottleneck: some servers are more important than others.
Consistent hashing
- deterministic hashing function
- uniform distribution
- choose m-bit in results so that the collision rate is low
- each key is hashed to m bits
- each server (ip, port) is also hashed to m bits
- a ring connects 0 and 2^m-1
- a key (its hash) is stored on the succeeding server
- if every node has the membership info, then it can route the request to the right node with one hop
- on membership change
- if a node joins, it takes over some keys from its immediate successor
- if a node leaves, it gives all its keys to its immediate successor.
- what about failures?
- use replicas: for a key who hash is k, use k + F, k + 2F as extra replicas (F is a constant)
- if everyone has the membership, this is it
- Dynamo is implemented like this
- reasonable for a moderate sized system with closed membership
- what about a million nodes?
- what about more open network?
Chord, a scalable lookup service
- no node stores the full membership
- the minimum: a node only stores its predecessor and successor
- with N nodes, search O(N) times.
- Idea: store multiple “fingers” on the ring, look up similar a binary search
- how can we distribute the fingers?
- store m fingers, divided by increasing index of 2
- e.g., m=3, every node (node 0, 1, 3) store three fingers
- node 0: 1, 2, 4
- node 1: 2, 3, 5
- node 3: 4, 5, 7
- every finger has the successor and interval (range to next finger)
- search rule for k
- if find a match (the immediate successor is the one), return
- else, find the interval that includes k, forward the search to that successor, and repeat
- each forward will halve the search range, hence seach time is O(logN)
- search node: n, i-th finger node: f, succ node: p
- distance between n and f >= 2^(i-1)
- distance between f and p <= 2^(i-1)
- f to p is at most half of n to p
- membership change (join), base version
- init finger table
- delegate to exisiting nodes
- quicker to ask a neighbor
- update fingers of other nodes
- transfer keys
- dynamic stablization
- a new node joins: notify immediate successor
- exisiting node calibrates: ask the immediate successor about its predecessor