SlideShare una empresa de Scribd logo
1 de 24
slides ©yoursunny.com 2013, CreativeCommons BY-NC 3.0

Network Redundancy
Elimination
JUNXIAO SHI 2013-11-05

Neil T. Spring and David Wetherall. 2000. A protocol-independent technique for eliminating redundant network traffic. SIGCOMM Comput. Commun. Rev. 30, 4 (August
2000), 87-95. DOI=10.1145/347057.347408 http://doi.acm.org/10.1145/347057.347408
Problem
Back in 2000, home Internet is slow
MODEM data rate:
33.6Kbps or 56Kbps
round trip latency:
>100ms
2 minutes to load a
webpage
Today, Internet isn’t always fast
Satellite link (eg. Iridium)
◦ high latency
◦ 2.4KB/s
◦ $1.35 per minute

2G cellular data (eg. H2O Wireless)
◦ high latency
◦ low bandwidth
◦ $0.30 per MB
Web contents are redundant

Screenshots of http://quotes.wsj.com/index/CN/SHCOMP during a trading day. Quote changes, but other remains same.
Web contents are often uncached
Web authors don’t want you to cache
their contents, because:
◦ Contents are dynamic. Stock price may
change at any time. News articles are
posted throughout the day.
◦ Contents are personalized. Your Facebook
homepage is different from anyone else’s.
◦ Access count must be accurate. Advertising
revenue is calculated per thousand
impressions.

response headers of http://www.dailyfinance.com/
To the naïve user -
Design
Architecture
convert repeated
strings into tokens

network layer,
protocol-independent

reconstruct
original packet

bandwidthconstrained
channel

cache

cache

contents of both caches must be consistent
The Cache
Cache: holds most recent packets
◦ admission policy: admit all
◦ replacement policy: FIFO

Indexed by representative fingerprints of the packets it holds
◦ map fingerprint to the most recent packet it appears
window size: β
select one in 2γ fingerprints
fingerprint space: M

Representative fingerprints

1. Calculate rolling Rabin fingerprints for sequences of β bytes, mod M.
2. Select fingerprints ending with γ zeros as representative fingerprints.
Rabin fingerprints are not cryptographically secure. Algorithm should not
assume collision-free.
Rabin fingerprints are used for finding similar documents, not for chunking.
Sender process
generate representative
fingerprints

lookup fingerprints in
cache index

cache

add packet to
cache, evicting
oldest packet if
necessary

verify no collision

expand to the left and to
the right, byte-by-byte

token format
• the fingerprint
• # bytes expanded to the left
• # bytes expanded to the right

convert matched regions
into tokens
send encoded, smaller packet
Receiver process
lookup tokens in cache
index

generate representative
fingerprints

reconstruct original
packet

add packet to cache,
evicting oldest
packet if necessary
cache
deliver original packet
Cache consistency
Contents of sender cache and receiver cache must be consistent.
Why caches might be inconsistent?
◦ Network channel isn’t reliable. A packet that entered sender cache but lost on the
channel will not be present in receiver cache.

How to detect cache inconsistency?
◦ Fingerprints! If there’s no collision, receiving an unrecognized fingerprint indicates
caches are inconsistent.

What happens if caches are inconsistent?
◦ Receiver cannot reconstruct original packet.
Implementation
Trace analyzer
The algorithm is implemented as a user-level process to analyze a trace.
Parameters
Fingerprint space: M=260
◦ collision almost impossible

Penalty for each matching region: 12 octets
◦ to represent the space needed for the token

Windows size β and fingerprint selecting frequency 2γ
◦
◦
◦
◦
◦

large β: better “quality” of matches, less potential bytes saving
small β: worse “quality” of matches (shorter matches in more recent packets)
small γ: more likely to find a match, larger index (=less memory for cached packets)
large γ: less likely to find a match, less memory usage
γ=5, β=64
Performance
45Mbps on a PC with Pentium Ⅲ-550 and 1GB memory
This work is designed for slow links.
Follow-up work
Future works by same authors:
◦ universal redundancy elimination
◦ SmartRE: coordinated network-wide redundancy elimination
◦ EndRE: end-system redundancy elimination
Traffic Analysis
How much redundancy is there?
Amount of redundancy

Internet => corporate
30% redundant

with just 1MB of memory
for cache+index:
at least 10% redundant

corporate => Internet
50% redundant
redundant traffic

60

Redundancy by protocol

traffic amount (%)

50

HTTP, Telnet, POP, ASF have high percentage of repeated strings.

40

HTTPS, FTP-data, Napster, RTSP, NNTP have low percentage of
repeated strings.
30

20

Redundancy elimination algorithm is protocol-independent, so we can save bytes on non-Web traffic.

10

0

HTTP

RTSP

Napster

Lotus

HTTPS FTP-data NNTP

DNS

ASF

AOL

SMTP

POP

Telnet

Other
Comparison with HTTP caching
100

redundancy elimination
works better than HTTP
caching and compression

traffic (%)

80

60

40

20

0

Squid

gzip

Squid+gzip

RE

Squid+RE
Network Redundancy Elimination

Más contenido relacionado

Similar a Network Redundancy Elimination

BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...Amazon Web Services
 
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:Tony Antony
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...DataStax
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures小新 制造
 
Feasibility of Security in Micro-Controllers
Feasibility of Security in Micro-ControllersFeasibility of Security in Micro-Controllers
Feasibility of Security in Micro-Controllersardiri
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance CachingScyllaDB
 
Challenges of Network Optimization in a WAN-Cloud World
Challenges of Network Optimization in a WAN-Cloud WorldChallenges of Network Optimization in a WAN-Cloud World
Challenges of Network Optimization in a WAN-Cloud WorldAtchison Frazer
 
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...ETCenter
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Cloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxCloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxRahulBhole12
 
Challenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewChallenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewbrouer
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networkingStephen Hemminger
 

Similar a Network Redundancy Elimination (20)

A new perspective on Network Visibility - RISK 2015
A new perspective on Network Visibility - RISK 2015A new perspective on Network Visibility - RISK 2015
A new perspective on Network Visibility - RISK 2015
 
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
 
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures
 
Feasibility of Security in Micro-Controllers
Feasibility of Security in Micro-ControllersFeasibility of Security in Micro-Controllers
Feasibility of Security in Micro-Controllers
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Computer Networking Assignment Help
Computer Networking Assignment HelpComputer Networking Assignment Help
Computer Networking Assignment Help
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
Challenges of Network Optimization in a WAN-Cloud World
Challenges of Network Optimization in a WAN-Cloud WorldChallenges of Network Optimization in a WAN-Cloud World
Challenges of Network Optimization in a WAN-Cloud World
 
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Internet census 2012
Internet census 2012Internet census 2012
Internet census 2012
 
Cloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxCloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptx
 
Challenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewChallenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of view
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 

Más de Shi Junxiao

Making Inter-domain Routing Power-Aware?
Making Inter-domain Routing Power-Aware?Making Inter-domain Routing Power-Aware?
Making Inter-domain Routing Power-Aware?Shi Junxiao
 
PowerTrade SurgeGuard
PowerTrade SurgeGuardPowerTrade SurgeGuard
PowerTrade SurgeGuardShi Junxiao
 
NFD InterestDigest
NFD InterestDigestNFD InterestDigest
NFD InterestDigestShi Junxiao
 
ICN Publish/Subscribe Networking
ICN Publish/Subscribe NetworkingICN Publish/Subscribe Networking
ICN Publish/Subscribe NetworkingShi Junxiao
 
Age-based Cooperative Caching in Information-Centric Networks
Age-based Cooperative Caching in Information-Centric NetworksAge-based Cooperative Caching in Information-Centric Networks
Age-based Cooperative Caching in Information-Centric NetworksShi Junxiao
 
Faster Content Distribution with Content Addressable NDN Repository
Faster Content Distribution with Content Addressable NDN RepositoryFaster Content Distribution with Content Addressable NDN Repository
Faster Content Distribution with Content Addressable NDN RepositoryShi Junxiao
 
ISA meeting 20131031
ISA meeting 20131031ISA meeting 20131031
ISA meeting 20131031Shi Junxiao
 
Content Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpointContent Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpointShi Junxiao
 
ISA meeting 20131004
ISA meeting 20131004ISA meeting 20131004
ISA meeting 20131004Shi Junxiao
 
Content Addressable NDN Repository - proposal
Content Addressable NDN Repository - proposalContent Addressable NDN Repository - proposal
Content Addressable NDN Repository - proposalShi Junxiao
 
Information Centric Networking and Content Addressability
Information Centric Networking and Content AddressabilityInformation Centric Networking and Content Addressability
Information Centric Networking and Content AddressabilityShi Junxiao
 
VIỆT NAM, The hidden charm
VIỆT NAM, The hidden charmVIỆT NAM, The hidden charm
VIỆT NAM, The hidden charmShi Junxiao
 

Más de Shi Junxiao (20)

NFD LuCI
NFD LuCINFD LuCI
NFD LuCI
 
Ride On Today
Ride On TodayRide On Today
Ride On Today
 
Making Inter-domain Routing Power-Aware?
Making Inter-domain Routing Power-Aware?Making Inter-domain Routing Power-Aware?
Making Inter-domain Routing Power-Aware?
 
PowerTrade SurgeGuard
PowerTrade SurgeGuardPowerTrade SurgeGuard
PowerTrade SurgeGuard
 
NFD InterestDigest
NFD InterestDigestNFD InterestDigest
NFD InterestDigest
 
ICN Publish/Subscribe Networking
ICN Publish/Subscribe NetworkingICN Publish/Subscribe Networking
ICN Publish/Subscribe Networking
 
Age-based Cooperative Caching in Information-Centric Networks
Age-based Cooperative Caching in Information-Centric NetworksAge-based Cooperative Caching in Information-Centric Networks
Age-based Cooperative Caching in Information-Centric Networks
 
Shaanxi Henan
Shaanxi HenanShaanxi Henan
Shaanxi Henan
 
Xinjiang
XinjiangXinjiang
Xinjiang
 
pcap-map
pcap-mappcap-map
pcap-map
 
Yuanxiao
YuanxiaoYuanxiao
Yuanxiao
 
Faster Content Distribution with Content Addressable NDN Repository
Faster Content Distribution with Content Addressable NDN RepositoryFaster Content Distribution with Content Addressable NDN Repository
Faster Content Distribution with Content Addressable NDN Repository
 
ISA meeting 20131031
ISA meeting 20131031ISA meeting 20131031
ISA meeting 20131031
 
Content Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpointContent Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpoint
 
ISA meeting 20131004
ISA meeting 20131004ISA meeting 20131004
ISA meeting 20131004
 
Content Addressable NDN Repository - proposal
Content Addressable NDN Repository - proposalContent Addressable NDN Repository - proposal
Content Addressable NDN Repository - proposal
 
Information Centric Networking and Content Addressability
Information Centric Networking and Content AddressabilityInformation Centric Networking and Content Addressability
Information Centric Networking and Content Addressability
 
VIỆT NAM, The hidden charm
VIỆT NAM, The hidden charmVIỆT NAM, The hidden charm
VIỆT NAM, The hidden charm
 
Mexico
MexicoMexico
Mexico
 
Hawaii
HawaiiHawaii
Hawaii
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Network Redundancy Elimination

  • 1. slides ©yoursunny.com 2013, CreativeCommons BY-NC 3.0 Network Redundancy Elimination JUNXIAO SHI 2013-11-05 Neil T. Spring and David Wetherall. 2000. A protocol-independent technique for eliminating redundant network traffic. SIGCOMM Comput. Commun. Rev. 30, 4 (August 2000), 87-95. DOI=10.1145/347057.347408 http://doi.acm.org/10.1145/347057.347408
  • 3. Back in 2000, home Internet is slow MODEM data rate: 33.6Kbps or 56Kbps round trip latency: >100ms 2 minutes to load a webpage
  • 4. Today, Internet isn’t always fast Satellite link (eg. Iridium) ◦ high latency ◦ 2.4KB/s ◦ $1.35 per minute 2G cellular data (eg. H2O Wireless) ◦ high latency ◦ low bandwidth ◦ $0.30 per MB
  • 5. Web contents are redundant Screenshots of http://quotes.wsj.com/index/CN/SHCOMP during a trading day. Quote changes, but other remains same.
  • 6. Web contents are often uncached Web authors don’t want you to cache their contents, because: ◦ Contents are dynamic. Stock price may change at any time. News articles are posted throughout the day. ◦ Contents are personalized. Your Facebook homepage is different from anyone else’s. ◦ Access count must be accurate. Advertising revenue is calculated per thousand impressions. response headers of http://www.dailyfinance.com/
  • 7. To the naïve user -
  • 9. Architecture convert repeated strings into tokens network layer, protocol-independent reconstruct original packet bandwidthconstrained channel cache cache contents of both caches must be consistent
  • 10. The Cache Cache: holds most recent packets ◦ admission policy: admit all ◦ replacement policy: FIFO Indexed by representative fingerprints of the packets it holds ◦ map fingerprint to the most recent packet it appears
  • 11. window size: β select one in 2γ fingerprints fingerprint space: M Representative fingerprints 1. Calculate rolling Rabin fingerprints for sequences of β bytes, mod M. 2. Select fingerprints ending with γ zeros as representative fingerprints. Rabin fingerprints are not cryptographically secure. Algorithm should not assume collision-free. Rabin fingerprints are used for finding similar documents, not for chunking.
  • 12. Sender process generate representative fingerprints lookup fingerprints in cache index cache add packet to cache, evicting oldest packet if necessary verify no collision expand to the left and to the right, byte-by-byte token format • the fingerprint • # bytes expanded to the left • # bytes expanded to the right convert matched regions into tokens send encoded, smaller packet
  • 13. Receiver process lookup tokens in cache index generate representative fingerprints reconstruct original packet add packet to cache, evicting oldest packet if necessary cache deliver original packet
  • 14. Cache consistency Contents of sender cache and receiver cache must be consistent. Why caches might be inconsistent? ◦ Network channel isn’t reliable. A packet that entered sender cache but lost on the channel will not be present in receiver cache. How to detect cache inconsistency? ◦ Fingerprints! If there’s no collision, receiving an unrecognized fingerprint indicates caches are inconsistent. What happens if caches are inconsistent? ◦ Receiver cannot reconstruct original packet.
  • 16. Trace analyzer The algorithm is implemented as a user-level process to analyze a trace.
  • 17. Parameters Fingerprint space: M=260 ◦ collision almost impossible Penalty for each matching region: 12 octets ◦ to represent the space needed for the token Windows size β and fingerprint selecting frequency 2γ ◦ ◦ ◦ ◦ ◦ large β: better “quality” of matches, less potential bytes saving small β: worse “quality” of matches (shorter matches in more recent packets) small γ: more likely to find a match, larger index (=less memory for cached packets) large γ: less likely to find a match, less memory usage γ=5, β=64
  • 18. Performance 45Mbps on a PC with Pentium Ⅲ-550 and 1GB memory This work is designed for slow links.
  • 19. Follow-up work Future works by same authors: ◦ universal redundancy elimination ◦ SmartRE: coordinated network-wide redundancy elimination ◦ EndRE: end-system redundancy elimination
  • 20. Traffic Analysis How much redundancy is there?
  • 21. Amount of redundancy Internet => corporate 30% redundant with just 1MB of memory for cache+index: at least 10% redundant corporate => Internet 50% redundant
  • 22. redundant traffic 60 Redundancy by protocol traffic amount (%) 50 HTTP, Telnet, POP, ASF have high percentage of repeated strings. 40 HTTPS, FTP-data, Napster, RTSP, NNTP have low percentage of repeated strings. 30 20 Redundancy elimination algorithm is protocol-independent, so we can save bytes on non-Web traffic. 10 0 HTTP RTSP Napster Lotus HTTPS FTP-data NNTP DNS ASF AOL SMTP POP Telnet Other
  • 23. Comparison with HTTP caching 100 redundancy elimination works better than HTTP caching and compression traffic (%) 80 60 40 20 0 Squid gzip Squid+gzip RE Squid+RE