TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Bittorrent
1. Introduction
• A P2P file distribution protocol
Bit Torrent • Designed by Bram Cohen circa July,
2001.
By Bram Cohen
• BT’s mechanisms achieve pareto
- Suman Karumuri efficiency, while off loading
bandwidth cost to downloaders.
Popularity of BT Usage Patterns
• About 18-35% of internet traffic is
BT traffic.
• Why?
– Ease of use through HTTP.
– Reduce the hosting costs.
– Useful for legitimate purposes too.
– Fairness
1
2. Why BitTorrent?
• Previous work
– is not robust enough when large number
of peers and files are involved.
– Do not account Selfish users. BT Overview
– Doesn’t account for high churn rate.
– Is not be fair.
First contact Finding each other
• Obtain a torrent • Trackers help find peers each other.
– A Publisher (thepiratebay.com)
– email • Simple Custom protocol on top of HTTP.
– Quantum teleportation. • Client sends file hash and connection
• A torrent contains info.
– File name
– File length • Tracker returns a random list of peers
– Hashing information currently downloading the file.
– Tracker information (Anonymity issue!)
2
3. Rock n Roll Seeder
• Connect to peers. • Downloader with a complete copy of
• Ask what pieces of file they have. the file.
• Download required pieces of a file. • Only uploads the file.
• Advertise your pieces to your peers.
• Upload downloaded pieces.
Overview animation Files
• A file is broken into pieces of .25-
.5MB each before it is distributed.
• File integrity: SHA1 hash inside the
torrent.
3
4. TCP friendly Piece selection
• While transferring files many requests are • Strict priority
pending. – One piece after another.
• Pipelining
• Rarest First
– Break each piece into further 16kb pieces.
– Send one piece after the other.
– Improves availability of file.
– Only have 5 requests in queue at once. – Improves download performance.
– Saturates most connections. – Handles churn well.
Piece selection (contd) Choking Algorithms
• Random First Piece • Peers reciprocate by uploading to
– Give peer a chance to start upload quickly. peers which upload to them.
– Rarest first slows things down.
• Unutilized connections are uploaded
• Endgame mode
on a trial basis to see if better rates
– Last pieces are obtained from faster peers, to
complete downloads quicker. can be obtained.
– Again addresses churn. • Achieves pareto efficiency.
4
5. Choking Algo. Optimistic unchoking
• Unchoke a fixed number of peers • There may be peers with better
(defualt 4) download rates.
• Every 10secs: • Every 30 seconds:
– Compute the download rate for 20 – If previous peer better:
second window for all active peers • Make active peer
– Keep the peers with highest download – Unchoke another peer.
bandwidth. Choke others.
Anti-snubbing Upload only
• If no data downloaded from peer in • Once download finished:
last 60 seconds: – Upload to peers with good upload rates.
– Assume snubbed. – Upload to peers that no one else is
– Stop uploading to that peer. uploading to.
– Use optimistic unchoke to find better
peers
5
6. Splitstream/Bullet
comparision
• Ad-hoc design to address real world issues.
• BT Not designed for streaming. ( So no fancy
erasure codes)
• No tree-structure.
• BT takes advantage of upload capacity at peers. BT Extensions
• Handles churn well.
• Less overhead to transmit info about what peers
have what.
• BT achieves Pareto efficiency.
Super seeders Anonymity
• Used when there is only 1 seed. • Trackerless and encryption for
• Seeder claims no pieces at outset. anonymity.
• Uploads a new piece only after it • Hide content from network shapers.
finds the previous uploaded pieces • Proxying through Tor anonymity
are uploaded atleast once. network.
6
7. No Trackers Private Trackers
• Use DHT like Kademilia • Too many selfish users.
• Allows a client to use torrents that • Private trackers provide
do not have working BT tracker. – High quality peers (uses statistics).
– Ensures fairness.
– Better download rates.
BT Streaming
• BiToS
– Divide file into 3 sets :
• Received pieces
• High Priority ( Next frames + rare)
• Remaining pieces.
– Chose a set with probability p. In the set, get
Questions?
rarest first.
– P can be dynamically computed.
• Using number of missed frames.
• Bandwidth of client.
• Size of High Priority set.
7