SlideShare a Scribd company logo
1 of 18
Download to read offline
Rolling and Extendable Hashes
December 18, 2013

John Graham-Cumming

www.cloudflare.com!
CRC

www.cloudflare.com!
Cyclic Redundancy Check
•  Widely used as hash function
•  e.g. common to hash keys uses a CRC for load balancing
•  memcached, nginx, ...
•  crc(key) % #servers

•  Not designed as a hash function!


•  Choice of polynomial can change collisions in real world
•  CRCs have no internal state and they are fast
•  Shifts, XOR, small table lookup

www.cloudflare.com!
Extending a CRC
Compute the hash (a CRC) by extending the string to the
right

 h( )
S

h( )
S

h(
)
S

h(
)
S

Typical CRC allows this because when adding a bit/byte/
word the CRC is calculated from the previous CRC and the
added data.

www.cloudflare.com!
CRC64
•  Typical implementation (t is lookup table)
BIT64 crc = 0;!
!
while ( *s ) {!
BYTE c = *s++;!
crc = (crc >> 8) ^ t[(crc & 0xff) ^ c];!
}!

•  Some common polynomials:
64
4
3
•  CRC64-ISO (0x800000000000000D)
 x +x +x +x +1
•  CRC64-ECMA (0xA17870F5D4F51B49)

x 64 +x 63+ x 61 + x 59 + x 56 + x 55 + x 52 + x 49 + x 48 + x 47 + x 46 + x 44 +
x 41 + x 37 + x 36 + x 34 + x 32 + x 31 + x 28 + x 26 + x 23 + x 22 + x19 + x16 +
x13 + x12 + x10 + x 8 + x 7 + x 5 + x 3 + x1
www.cloudflare.com!
One use: spam filter tokens
•  Common to see in open source spam filters
From: Prince of Nigeria!
Subject: I beg pardon for interrupting your fine day!
!
[...]!
!

•  Generates tokens:

from:prince from:nigeria subject:pardon subject:interrupting!
subject:fine subject:day!

•  And from the message body
body:html:font:red body:html:image:size:0!

•  Repeated fixed parts (from:, subject:, html:image:size:)

can be CRCed once


www.cloudflare.com!
RABIN/KARP

www.cloudflare.com!
Rabin/Karp String Searching
“Efficient randomized pattern-matching algorithms”
Rabin/Karp, IBM Journal of Research and Development
March 1987


h(
)
P


h(
)
S


Compute and compare len(S)-len(P)+2 hashes
When hash matches verify that substrings actually match

www.cloudflare.com!
Two Expensive Operations
•  The h() function itself
•  Need a very fast hash algorithm
•  Average complexity is O(len(S))

•  Substring comparison on hash matches
•  Need minimal hash collisions
•  Worst case complexity is O(len(S) * len(P))
•  Trivial to make work with multiple patterns
•  Just compute their hashes
•  Use Bloom filter (or other hash table) to lookup patterns for each
hash calculated
•  Average complexity is O(len(S) + #patterns)

www.cloudflare.com!
Rabin/Karp Rolling Hash
•  Basic on arithmetic in the ring Zp where p is prime
•  Treats a substring s0 .. sn as a number in a chosen base

b.

n

h(s0 …sn ) = ∑ si b

n−i

mod p

i=0
•  This hash can be updated when sliding across a string

www.cloudflare.com!
Rabin/Karp Efficient Update
h(s1 …sn+1 )
n+1

= ∑ si b n+1−i mod p
i=1
n+1

= b∑ si b n−i mod p
i=1

Previous hash

# n
&
n−i
n
= % b∑ si b − s0 b + sn+1 ( mod p
$ i=0
'
= ( bh(s0 …sn−1 ) − s0 b n + sn+1 ) mod p

(

)

= b ( h(s0 …sn−1 ) − s0 b n−1 ) + sn+1 mod p
www.cloudflare.com!

Precomputed
Rabin/Karp base and modulus values
•  Use a prime base
•  When operating on bytes a suitable b is 257

•  Choose a prime modulus closest to the desired hash

output size
•  e.g. for 32-bit hash value use 4294967291

•  Various tricks can speed up the calculation
•  Don’t calculate bn-1, calculate bn-1 mod p instead
•  Choose a p such that bp fits in a convenient bit size
•  Memoize x (bn-1 mod p) for all x byte values
•  Choose p is a power of two (instead of a prime) can use & instead
of %
•  Result: update requires one *, one +, one –, one &
www.cloudflare.com!
Bentley/McIlroy Compression
"Data Compression Using Long Common Strings”
Jon Bentley, Douglas McIlroy
Proceedings of the IEEE Data Compression Conference,
1999

 “Dictionary” to compress against (could be self)
Broken into blocks, each block hashed
h0

h(

h1

h2

)

h3

h4

h5

S

Slide Rabin/Karp hash across string to compress
Look for matches in dictionary
Extend matches backwards block size bytes, forwards indefinitely
O(len(S))
www.cloudflare.com!
Never gonna give you up
We're no strangers to love!
You know the rules and so do I!
A full commitment's what I'm thinking of!
You wouldn't get this from any other guy!
I just wanna tell you how I'm feeling!
Gotta make you understand!
Never gonna give you up<204,13>let you down<204,13>run around<45,5>desert you<20!
4,13><184,9>cry<204,13>say goodbye<204,13>tell a lie<45,5>hu<285,7>!
We've<30,5>n each<129,7>for so long!
Your heart's been aching but!
You're too shy to say it!
Inside we both<30,6>wha<422,9>going on!
We<30,10>game<45,5>we're<210,7>play it!
And if you ask me<161,17>Don't tell m<220,5>'re too blind to see!
<340,13><217,161><229,12><634,161>(Ooh,<633,12>)<967,24>)<340,13>give, n<230,11>!
give!
(G<635,10><1004,57><377,11><389,160><139,239><229,12><634,333>!

•  Open source implementation from CloudFlare in Go
•  Bentley/McIlroy then gzip very effective

www.cloudflare.com!
RSYNC PROTOCOL

www.cloudflare.com!
RSYNC hashing
"Efficient Algorithms for Sorting and Synchronization”
Andrew Tridgell
Doctoral Thesis, February 1999
File on remote machine
Broken into blocks, each block hashed twice (rolling hash plus MD5)
Hash sent to local machine
h0

h(

h1

h2

)

h3

h4

h5

S

Slide rolling hash across source file looking for match
Check MD5 hash to eliminate rolling hash collisions
Send bytes form S to remote host, or token saying which hash matched
www.cloudflare.com!
The RSYNC rolling hash
•  Block size is b, offset into file is k, the file bytes are s, M

is an arbitrary modulus
# b−1
&
r1 (k, b) = % ∑ ai+k ( mod M
$ i=0
'
# b−1
&
r2 (k, b) = % ∑ (b − i)ai+k ( mod M
$ i=0
'
r(k, b) = r1 (k, b) + Mr2 (k, b)
•  In practice, M = 216

www.cloudflare.com!
Hash updates are fast
r1 (k +1, b) = ( r1 (k, b) − ak + ak+b ) mod M
r2 (k +1, b) = ( r2 (k, b) − bak + r1 (k +1, b)) mod M

(* proof is left as an exercise for the reader)
www.cloudflare.com!

More Related Content

What's hot

ktruss-short
ktruss-shortktruss-short
ktruss-short
Jia Wang
 
Pygrib documentation
Pygrib documentationPygrib documentation
Pygrib documentation
Arulalan T
 

What's hot (19)

Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
 
Concurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfConcurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdf
 
ktruss-short
ktruss-shortktruss-short
ktruss-short
 
Pygrib documentation
Pygrib documentationPygrib documentation
Pygrib documentation
 
Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)
 
Using PyPy instead of Python for speed
Using PyPy instead of Python for speedUsing PyPy instead of Python for speed
Using PyPy instead of Python for speed
 
Python Coding Examples for Drive Time Analysis
Python Coding Examples for Drive Time AnalysisPython Coding Examples for Drive Time Analysis
Python Coding Examples for Drive Time Analysis
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Raspberry pi a la cfml
Raspberry pi a la cfmlRaspberry pi a la cfml
Raspberry pi a la cfml
 
時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.sts時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.sts
 
深層学習とベイズ統計
深層学習とベイズ統計深層学習とベイズ統計
深層学習とベイズ統計
 
How to extract information from text with Semgrex
How to extract information from text with SemgrexHow to extract information from text with Semgrex
How to extract information from text with Semgrex
 
ベイジアンディープニューラルネット
ベイジアンディープニューラルネットベイジアンディープニューラルネット
ベイジアンディープニューラルネット
 
Cryptography : From Demaratus to RSA
Cryptography : From Demaratus to RSACryptography : From Demaratus to RSA
Cryptography : From Demaratus to RSA
 
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム
 
PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8
 
Presentation template - jupyter2slides
Presentation template - jupyter2slidesPresentation template - jupyter2slides
Presentation template - jupyter2slides
 

Viewers also liked

Spring 3 smithe
Spring 3 smitheSpring 3 smithe
Spring 3 smithe
Simpony
 
Fire safetyinseniorapartments
Fire safetyinseniorapartmentsFire safetyinseniorapartments
Fire safetyinseniorapartments
marjellotti
 
Why Outsource Business Issues
Why Outsource Business IssuesWhy Outsource Business Issues
Why Outsource Business Issues
brooklynclasby
 
Winter 1 vega
Winter 1 vegaWinter 1 vega
Winter 1 vega
Simpony
 
Lua London Meetup 2013
Lua London Meetup 2013Lua London Meetup 2013
Lua London Meetup 2013
Cloudflare
 
Pregnancy four
Pregnancy fourPregnancy four
Pregnancy four
Simpony
 
Spring 2 Smithe
Spring 2 SmitheSpring 2 Smithe
Spring 2 Smithe
Simpony
 
rowell resperatory system
rowell resperatory systemrowell resperatory system
rowell resperatory system
rowellcabate
 
Spring 3 goodie
Spring 3 goodieSpring 3 goodie
Spring 3 goodie
Simpony
 
Spring 3 vega
Spring 3 vegaSpring 3 vega
Spring 3 vega
Simpony
 
Fall 2 smithe
Fall 2 smitheFall 2 smithe
Fall 2 smithe
Simpony
 

Viewers also liked (20)

Dualacy chap 2 fall
Dualacy chap 2 fallDualacy chap 2 fall
Dualacy chap 2 fall
 
Spring 3 smithe
Spring 3 smitheSpring 3 smithe
Spring 3 smithe
 
Summer 1
Summer 1Summer 1
Summer 1
 
CMC Teacher Education SIG Presentation; Mitchell
CMC Teacher Education SIG Presentation; MitchellCMC Teacher Education SIG Presentation; Mitchell
CMC Teacher Education SIG Presentation; Mitchell
 
Fire safetyinseniorapartments
Fire safetyinseniorapartmentsFire safetyinseniorapartments
Fire safetyinseniorapartments
 
Why Outsource Business Issues
Why Outsource Business IssuesWhy Outsource Business Issues
Why Outsource Business Issues
 
Présentation Olive d&rsquo;Or by SIAL
Présentation Olive d&rsquo;Or by SIALPrésentation Olive d&rsquo;Or by SIAL
Présentation Olive d&rsquo;Or by SIAL
 
CMC Teacher Education SIG Presentation; Guichon
CMC Teacher Education SIG Presentation; GuichonCMC Teacher Education SIG Presentation; Guichon
CMC Teacher Education SIG Presentation; Guichon
 
CMC Teacher Education SIG Presentation; Antoniadou
CMC Teacher Education SIG Presentation; AntoniadouCMC Teacher Education SIG Presentation; Antoniadou
CMC Teacher Education SIG Presentation; Antoniadou
 
Winter 1 vega
Winter 1 vegaWinter 1 vega
Winter 1 vega
 
Lua London Meetup 2013
Lua London Meetup 2013Lua London Meetup 2013
Lua London Meetup 2013
 
Bio Lab Paper
Bio Lab PaperBio Lab Paper
Bio Lab Paper
 
Pregnancy four
Pregnancy fourPregnancy four
Pregnancy four
 
Dualacy chap 1 fresh ss
Dualacy chap 1 fresh ssDualacy chap 1 fresh ss
Dualacy chap 1 fresh ss
 
Spring 2 Smithe
Spring 2 SmitheSpring 2 Smithe
Spring 2 Smithe
 
rowell resperatory system
rowell resperatory systemrowell resperatory system
rowell resperatory system
 
Spring 3 goodie
Spring 3 goodieSpring 3 goodie
Spring 3 goodie
 
Spring 3 vega
Spring 3 vegaSpring 3 vega
Spring 3 vega
 
Fall 2 smithe
Fall 2 smitheFall 2 smithe
Fall 2 smithe
 
CMC Teacher Education SIG Presentation; Lomicka
CMC Teacher Education SIG Presentation; LomickaCMC Teacher Education SIG Presentation; Lomicka
CMC Teacher Education SIG Presentation; Lomicka
 

Similar to Cloud flare jgc bigo meetup rolling hashes

An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to Go
Cloudflare
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Boris Yen
 

Similar to Cloud flare jgc bigo meetup rolling hashes (20)

An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to Go
 
Code and memory optimization tricks
Code and memory optimization tricksCode and memory optimization tricks
Code and memory optimization tricks
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against itEvgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go Programming
 
Modern C++
Modern C++Modern C++
Modern C++
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Go Concurrency
Go ConcurrencyGo Concurrency
Go Concurrency
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
Serial-War
Serial-WarSerial-War
Serial-War
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 

More from Cloudflare

More from Cloudflare (20)

Succeeding with Secure Access Service Edge (SASE)
Succeeding with Secure Access Service Edge (SASE)Succeeding with Secure Access Service Edge (SASE)
Succeeding with Secure Access Service Edge (SASE)
 
Close your security gaps and get 100% of your traffic protected with Cloudflare
Close your security gaps and get 100% of your traffic protected with CloudflareClose your security gaps and get 100% of your traffic protected with Cloudflare
Close your security gaps and get 100% of your traffic protected with Cloudflare
 
Why you should replace your d do s hardware appliance
Why you should replace your d do s hardware applianceWhy you should replace your d do s hardware appliance
Why you should replace your d do s hardware appliance
 
Don't Let Bots Ruin Your Holiday Business - Snackable Webinar
Don't Let Bots Ruin Your Holiday Business - Snackable WebinarDon't Let Bots Ruin Your Holiday Business - Snackable Webinar
Don't Let Bots Ruin Your Holiday Business - Snackable Webinar
 
Why Zero Trust Architecture Will Become the New Normal in 2021
Why Zero Trust Architecture Will Become the New Normal in 2021Why Zero Trust Architecture Will Become the New Normal in 2021
Why Zero Trust Architecture Will Become the New Normal in 2021
 
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...
 
Zero trust for everybody: 3 ways to get there fast
Zero trust for everybody: 3 ways to get there fastZero trust for everybody: 3 ways to get there fast
Zero trust for everybody: 3 ways to get there fast
 
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...
 
Network Transformation: What it is, and how it’s helping companies stay secur...
Network Transformation: What it is, and how it’s helping companies stay secur...Network Transformation: What it is, and how it’s helping companies stay secur...
Network Transformation: What it is, and how it’s helping companies stay secur...
 
Scaling service provider business with DDoS-mitigation-as-a-service
Scaling service provider business with DDoS-mitigation-as-a-serviceScaling service provider business with DDoS-mitigation-as-a-service
Scaling service provider business with DDoS-mitigation-as-a-service
 
Application layer attack trends through the lens of Cloudflare data
Application layer attack trends through the lens of Cloudflare dataApplication layer attack trends through the lens of Cloudflare data
Application layer attack trends through the lens of Cloudflare data
 
Recent DDoS attack trends, and how you should respond
Recent DDoS attack trends, and how you should respondRecent DDoS attack trends, and how you should respond
Recent DDoS attack trends, and how you should respond
 
Cybersecurity 2020 threat landscape and its implications (AMER)
Cybersecurity 2020 threat landscape and its implications (AMER)Cybersecurity 2020 threat landscape and its implications (AMER)
Cybersecurity 2020 threat landscape and its implications (AMER)
 
Strengthening security posture for modern-age SaaS providers
Strengthening security posture for modern-age SaaS providersStrengthening security posture for modern-age SaaS providers
Strengthening security posture for modern-age SaaS providers
 
Kentik and Cloudflare Partner to Mitigate Advanced DDoS Attacks
Kentik and Cloudflare Partner to Mitigate Advanced DDoS AttacksKentik and Cloudflare Partner to Mitigate Advanced DDoS Attacks
Kentik and Cloudflare Partner to Mitigate Advanced DDoS Attacks
 
Stopping DDoS Attacks in North America
Stopping DDoS Attacks in North AmericaStopping DDoS Attacks in North America
Stopping DDoS Attacks in North America
 
It’s 9AM... Do you know what’s happening on your network?
It’s 9AM... Do you know what’s happening on your network?It’s 9AM... Do you know what’s happening on your network?
It’s 9AM... Do you know what’s happening on your network?
 
Cyber security fundamentals (simplified chinese)
Cyber security fundamentals (simplified chinese)Cyber security fundamentals (simplified chinese)
Cyber security fundamentals (simplified chinese)
 
Bring speed and security to the intranet with cloudflare for teams
Bring speed and security to the intranet with cloudflare for teamsBring speed and security to the intranet with cloudflare for teams
Bring speed and security to the intranet with cloudflare for teams
 
Accelerate your digital transformation
Accelerate your digital transformationAccelerate your digital transformation
Accelerate your digital transformation
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Cloud flare jgc bigo meetup rolling hashes

  • 1. Rolling and Extendable Hashes December 18, 2013 John Graham-Cumming www.cloudflare.com!
  • 3. Cyclic Redundancy Check •  Widely used as hash function •  e.g. common to hash keys uses a CRC for load balancing •  memcached, nginx, ... •  crc(key) % #servers •  Not designed as a hash function! •  Choice of polynomial can change collisions in real world •  CRCs have no internal state and they are fast •  Shifts, XOR, small table lookup www.cloudflare.com!
  • 4. Extending a CRC Compute the hash (a CRC) by extending the string to the right h( ) S h( ) S h( ) S h( ) S Typical CRC allows this because when adding a bit/byte/ word the CRC is calculated from the previous CRC and the added data. www.cloudflare.com!
  • 5. CRC64 •  Typical implementation (t is lookup table) BIT64 crc = 0;! ! while ( *s ) {! BYTE c = *s++;! crc = (crc >> 8) ^ t[(crc & 0xff) ^ c];! }! •  Some common polynomials: 64 4 3 •  CRC64-ISO (0x800000000000000D) x +x +x +x +1 •  CRC64-ECMA (0xA17870F5D4F51B49) x 64 +x 63+ x 61 + x 59 + x 56 + x 55 + x 52 + x 49 + x 48 + x 47 + x 46 + x 44 + x 41 + x 37 + x 36 + x 34 + x 32 + x 31 + x 28 + x 26 + x 23 + x 22 + x19 + x16 + x13 + x12 + x10 + x 8 + x 7 + x 5 + x 3 + x1 www.cloudflare.com!
  • 6. One use: spam filter tokens •  Common to see in open source spam filters From: Prince of Nigeria! Subject: I beg pardon for interrupting your fine day! ! [...]! ! •  Generates tokens: from:prince from:nigeria subject:pardon subject:interrupting! subject:fine subject:day! •  And from the message body body:html:font:red body:html:image:size:0! •  Repeated fixed parts (from:, subject:, html:image:size:) can be CRCed once www.cloudflare.com!
  • 8. Rabin/Karp String Searching “Efficient randomized pattern-matching algorithms” Rabin/Karp, IBM Journal of Research and Development March 1987 h( ) P h( ) S Compute and compare len(S)-len(P)+2 hashes When hash matches verify that substrings actually match www.cloudflare.com!
  • 9. Two Expensive Operations •  The h() function itself •  Need a very fast hash algorithm •  Average complexity is O(len(S)) •  Substring comparison on hash matches •  Need minimal hash collisions •  Worst case complexity is O(len(S) * len(P)) •  Trivial to make work with multiple patterns •  Just compute their hashes •  Use Bloom filter (or other hash table) to lookup patterns for each hash calculated •  Average complexity is O(len(S) + #patterns) www.cloudflare.com!
  • 10. Rabin/Karp Rolling Hash •  Basic on arithmetic in the ring Zp where p is prime •  Treats a substring s0 .. sn as a number in a chosen base b. n h(s0 …sn ) = ∑ si b n−i mod p i=0 •  This hash can be updated when sliding across a string www.cloudflare.com!
  • 11. Rabin/Karp Efficient Update h(s1 …sn+1 ) n+1 = ∑ si b n+1−i mod p i=1 n+1 = b∑ si b n−i mod p i=1 Previous hash # n & n−i n = % b∑ si b − s0 b + sn+1 ( mod p $ i=0 ' = ( bh(s0 …sn−1 ) − s0 b n + sn+1 ) mod p ( ) = b ( h(s0 …sn−1 ) − s0 b n−1 ) + sn+1 mod p www.cloudflare.com! Precomputed
  • 12. Rabin/Karp base and modulus values •  Use a prime base •  When operating on bytes a suitable b is 257 •  Choose a prime modulus closest to the desired hash output size •  e.g. for 32-bit hash value use 4294967291 •  Various tricks can speed up the calculation •  Don’t calculate bn-1, calculate bn-1 mod p instead •  Choose a p such that bp fits in a convenient bit size •  Memoize x (bn-1 mod p) for all x byte values •  Choose p is a power of two (instead of a prime) can use & instead of % •  Result: update requires one *, one +, one –, one & www.cloudflare.com!
  • 13. Bentley/McIlroy Compression "Data Compression Using Long Common Strings” Jon Bentley, Douglas McIlroy Proceedings of the IEEE Data Compression Conference, 1999 “Dictionary” to compress against (could be self) Broken into blocks, each block hashed h0 h( h1 h2 ) h3 h4 h5 S Slide Rabin/Karp hash across string to compress Look for matches in dictionary Extend matches backwards block size bytes, forwards indefinitely O(len(S)) www.cloudflare.com!
  • 14. Never gonna give you up We're no strangers to love! You know the rules and so do I! A full commitment's what I'm thinking of! You wouldn't get this from any other guy! I just wanna tell you how I'm feeling! Gotta make you understand! Never gonna give you up<204,13>let you down<204,13>run around<45,5>desert you<20! 4,13><184,9>cry<204,13>say goodbye<204,13>tell a lie<45,5>hu<285,7>! We've<30,5>n each<129,7>for so long! Your heart's been aching but! You're too shy to say it! Inside we both<30,6>wha<422,9>going on! We<30,10>game<45,5>we're<210,7>play it! And if you ask me<161,17>Don't tell m<220,5>'re too blind to see! <340,13><217,161><229,12><634,161>(Ooh,<633,12>)<967,24>)<340,13>give, n<230,11>! give! (G<635,10><1004,57><377,11><389,160><139,239><229,12><634,333>! •  Open source implementation from CloudFlare in Go •  Bentley/McIlroy then gzip very effective www.cloudflare.com!
  • 16. RSYNC hashing "Efficient Algorithms for Sorting and Synchronization” Andrew Tridgell Doctoral Thesis, February 1999 File on remote machine Broken into blocks, each block hashed twice (rolling hash plus MD5) Hash sent to local machine h0 h( h1 h2 ) h3 h4 h5 S Slide rolling hash across source file looking for match Check MD5 hash to eliminate rolling hash collisions Send bytes form S to remote host, or token saying which hash matched www.cloudflare.com!
  • 17. The RSYNC rolling hash •  Block size is b, offset into file is k, the file bytes are s, M is an arbitrary modulus # b−1 & r1 (k, b) = % ∑ ai+k ( mod M $ i=0 ' # b−1 & r2 (k, b) = % ∑ (b − i)ai+k ( mod M $ i=0 ' r(k, b) = r1 (k, b) + Mr2 (k, b) •  In practice, M = 216 www.cloudflare.com!
  • 18. Hash updates are fast r1 (k +1, b) = ( r1 (k, b) − ak + ak+b ) mod M r2 (k +1, b) = ( r2 (k, b) − bak + r1 (k +1, b)) mod M (* proof is left as an exercise for the reader) www.cloudflare.com!