SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
SESSION ID:
#RSAC
Mikael Simovits
Linguistic Passphrase Cracking
HT-R03
M.Sc. CISSP
Simovits Consulting (CEO)
mikael@simovits.com
Peder Sparell
M.Sc. GPEN CHFI
Simovits Consulting
peder.sparell@simovits.com
#RSAC
Background
Implementation
Objectives and entropy calculations
Results
#RSAC
Background
Standards requirements/recommendations on longand complex/random passwords (e.g.
within ISO27000 series and NIST Electronic Authentication Guideline)
Increasing use of phrases as passwords
Publicly available cracking software are not very effective on longer passwords or phrases >
2 words
Languages are relatively predictable – not random
Linguistically correct phrases- more effective cracking?
How can language be modelled to generate/crack such phrases?
Is it advisable to base a password policy on phrases?
3
#RSAC
Passwords – search scope/complexity
4
Password
length
Lowercase Lower + uppercase Alfanumerical Alfanumerical +
special chars
1 (base) 26 52 62 95
6 228,2 (=3,1*108) 234,2 (=2,0*1010) 235,7 (=5,7*1010) 239,4 (=7,4*1011)
8 237,6 (=2,1*1011) 245,6 (=5,3*1013) 247,6 (=2,2*1014) 252,6 (=6,6*1015)
10 247,0 (=1,4*1014) 257,0 (=1,4*1017) 259,5 (=8,4*1017) 265,7 (=6,0*1019)
14 265,8 (=6,5*1019) 279,8 (=1,1*1024) 283,4 (=1,2*1025) 292,0 (=4,9*1027)
16 275,2 (=4,3*1022) 291,2 (=2,9*1027) 295,3 (=4,8*1028) 2105 (=4,4*1031)
20 294,0 (=2,0*1028) 2114 (=2,1*1034) 2119 (=7,0*1035) 2131 (=3,6*1039)
Number of possible combinations: 𝑐𝑐 = 𝑏𝑏𝑏𝑏
Sometimes written as: 2𝑙𝑙𝑙𝑙 𝑙𝑙2(𝑐𝑐)
#RSAC
This works delimitations on passphrases
2 or more words (usually >=3)
Assembled to a string without white spaces
Lower case
Examples:
The king shall rule -> thekingshallrule
My brother rocks -> mybrotherrocks
5
#RSAC
Implementation
Objectives and entropy calculations
Results
Background
#RSAC
Markov chains
A Markov process is a process
where transitions to other states are
determined by probability
distribution based on the current
state
A Markov chain is the resulting
sequence of states from a Markov
process
7
A
C
B
0,1
0,6
1,0
0,6
0,4
0,3
#RSAC
Markov chains of order m
A Markov chain of order m
redefines the states to
include a ’history’ of m
states
8
State C
State B
State A
AB
CA
BC
0,5
1,0
1,0
0,3
0,2
BBBA
0,1
1,0
0,9
#RSAC
n-grams
n-grams are used for language modelling, using Markov chains of
order n-1
n-grams are sequences containing n elements
The elements could be characters or words
Statistics for the probability distribution is generated from counting
number of occurrences of n-grams in large texts
Example: Statistics of 3-grams shows, for example, if character ’e’ is
more likely than character ’x’ to follow a current state of ’th’
9
#RSAC
Implementation - Overview
Phrase generation
Text file
Cracker
(e.g. HashCat)
stdout
10
#RSAC
Implementation - Phrase generation
n-gram
extraction
Here the n-
gram statistics
are created
Data out
Phrases,
one per
line
Phrase
generation
Here the
phrases are
generated using
a Markov
process
n-gram data
Text file
containing
statistics on
number of
occurrences of n-
grams in the
source text
Text source
Large text file in
plain text. Ex: e-
book, corpus
11
#RSAC
Phase 1: n-gram extraction
At start-up, a text file to analyse is chosen, as well as the desired order and
level of n-grams
Uses regular expressions
Punctuation marks (.,!?:) are replaced by a single dot (.) to represent sentence
breaks
All characters are changed to lower case
Result: n-grams are saved to a text file together with their number of
occurrences
Examples
(char/word):
12
…
arw 3
ary 137
as. 24
as_ 2382
asa 48
asb 7
asc 42
asd 7
ase 207
…
…
the shelbyville runner 1
the sheldon penny 1
the shelf . 10
the shelf adaptors 1
the shelf and 6
the shelf are 1
the shelf at 3
…
#RSAC
Phase 2: Phrase generation
Interval of desired phrase lengths can be set
Number of desired words can be set
The desired n-gram file is read and loaded
to memory as in the example in the picture
Starting states (starting with ’.’) are loaded
into a separate list
A threshold value decides if n-gram
should be loaded to the list or ignored
From every starting state the phrases are
built up char by char (or word by word), by
recursively going through the possible state
transitions and adding the new chars
(or words) to the current phrase until it is
long enough
_th
_ti
6081 e
1727 a
727 i
206 o
235 r
22 u
155 m
16 n
10 d
4 c
8 e
3 g
…
…
13
#RSAC
Objectives and entropy calculations
Results
Background
Implementation
#RSAC
Entropy of the English language
Different estimation suggestions:
1,75 bits/char
1,6 bits/char
NIST (National Institute of Standards & Technology) suggests:
First char: 4 bits
Char 2-8: 2 bits
Char 9-20: 1,5 bits
Char 21 and subsequent: 1 bit
Example:
A 16 character phrase of the English language has (1*4)+(7*2)+(8*1,5) = 30 bits of
entropy
15
#RSAC
Comparison entropy calculations
For every generated list of phrases we define:
Target entropy: 𝐻𝐻𝑡𝑡 is calculated according to NIST:s interpretation of the entropy of the
English language
Potential entropy: 𝐻𝐻𝑝𝑝 = 𝑙𝑙𝑙𝑙 𝑙𝑙2 𝑋𝑋
X is the number of phrases in the list
Estimated entropy:
Efficiency: 𝑒𝑒 =
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
𝐻𝐻𝑒𝑒𝑒𝑒𝑒𝑒 = 𝑙𝑙𝑙𝑙 𝑙𝑙2
𝑋𝑋
𝑒𝑒
= 𝑙𝑙𝑙𝑙 𝑙𝑙2 𝑋𝑋 − 𝑙𝑙𝑙𝑙 𝑙𝑙2 𝑒𝑒
The objective is that the “potential” and especially the “estimated” entropy values are as
close as possible to the “target” entropy.
16
#RSAC
Some case Illustrations
17
Enough number of phrases in list.
About half of them are not valid phrases.
Potential entropy is close to Target entropy, Estimated entropy is higher.
Small number of phrases in list.
All of them are valid phrases, but they do not cover the language.
Potential entropy lower than Target entropy, Estimated entropy higher.
Large number of phrases in list.
Covers the whole language, but many non valid phrases.
Potential and Estimated entropy high.
#RSAC
Results
Background
Implementation
Objectives and entropy calculations
#RSAC
Testing variables
Test sample basis contained of 66 hashes from passwords of length 10-20
n-gram statistics were generated from 3 different source texts
English news sites (3 million sentences)
English Wikipedia (1 million sentences)
General web sites/blogs etc. (1 million sentences)
The input data values used for each list can be derived from the selected file names.
Example: ’L20W6T5N3WNews’
length of the phrases in the list/file is 20 (L20)
phrases consists of exactly 6 words (W6)
the threshold value is set to 5 (T5)
n-grams of order 3 has been used (N3)
n-grams on the word level has been used (C=char, W=words)
text source is the one from news sites (Wiki, Web or News)
19
#RSAC
Phrase List Time
generation
Time
HashCat
Cracked
hashes
Possible
outcomes
(English)
Phrases
Generated
Target
entropy
Potential
entropy
Estimated
entropy
L10T100N5CNews 4.2 h 46 s 7/15
221
(2.1 mil.)
229.6
(825 mil.)
21
29.6 30.7
L10T0N3WNews 1.5 h 1 s 2/15
223.9
(15.2 mil.)
23.9 26.8
L10T1N3WNews 16 min 0 s 2/15
221.3
(2.6 mil.)
21.3 24.2
20
Results details – Phrase length 10
#RSAC
Phrase List Time
generation
Time
HashCat
Cracked
hashes
Possible
outcomes
(English)
Phrases
Generated
Target
entropy
Potential
Entropy
Estimated
entropy
L14T3000N5CNews 1.7 h 6 s 0/15
227
(134 mil.)
226.4
(90 mil.)
27
26.4 -
L14T1N8CWiki 23.0 h
4 min
32s
2/15
230.8
(1865 mil.) 30.8 33.7
L14T1N3WNews 20.2 h 31 s 2/15
228.9
(505 mil.) 28.9 31.8
L14W-5T0N3WNews 24.0 h 45 s 2/15
228.2
(312 mil.)
28.2 31.1
21
Results details – Phrase length 14
#RSAC
Phrase List Time
generation
Time
HashCat
Cracked
hashes
Possible
outcomes
(English)
Phrases
Generated
Target
entropy
Potential
Entropy
Estimated
entropy
L16W4T5N8CWiki 7.3 h 31 s 1
228.8
(479 mil.)
L16W5-6T5N8CWiki 34.7 h
3 min
37 s
0
230.3
(1 355 mil.)
Total, group 1 42 h
4 min
8 s
1/24
<230
(<1 100 mil.)
230.8
(1 834 mil.)
<30 30.8 35.4
L16W4T0N3WNews 9.2 h
6 s
4 226.0
(67.5 mil.)
L16W5T0N3WNews 79.5 h
10min
16 s
1 229.7
(887 mil.)
L16W6T1N3WNews 58.7 h
1 min
32 s
0
229.7
(851 mil.)
Total, group 2 147.4 h
11min
54 s
5/24
<230
(<1 100 mil.)
230.7
(1 806 mil.)
<30 30.7 33.0
L16W5T0N3WWeb 16.6 h 13 s 1
227.6
(199 mil.)
22
Results details – Phrase length 16
#RSAC
Phrase List Time
generation
Time
HashCat
Cracked
hashes
Possible
outcomes
(English)
Phrases
Generated
Target
entropy
Potential
Entropy
Estimated
entropy
L20W6T40N8CWiki 12.8 h 18 s 0/12
<236
(<69 000 mil.)
227.9
(244 mil.)
<36
27.9 -
L20W6T1N3WWeb 52.2 h
2 min
14 s
0/12
229.4
(727 mil.)
29.4 -
L20W6T0N3WWeb 960 h
29 min
9 s
1/12 233.0
(8 500 mil.)
33.0 36.6
L20W6T1N3WNews 550.5 h
39 min
3 s
1/12 231.0
(2 131 mil.)
31.0 34.6
23
Results details – Phrase length 20
#RSAC
L10T100N5C
News
L10T0N3WN
ews
L10T1N3WN
ews
L14T3000N5
CNews
L14T1N8CWi
ki
L14T1N3WN
ews
L14W-
5T0N3WNew
s
L16W4-
6T5N8CWiki
(Group 1)
L16W4-6T0-
1N3WNews
(Group 2)
L20W6T40N
8CWiki
L20W6T1N3
WWeb
L20W6T0N3
WWeb
L20W6T1N3
WNews
Target entropy 21 21 21 27 27 27 27 30 30 36 36 36 36
Potential entropy 29.6 23.9 21.3 26.4 30.8 28.9 28.2 30.8 30.7 27.9 29.4 33 31
Estimated entropy 30.7 26.8 24.2 0 33.7 31.8 31.1 35.4 33 0 0 36.6 34.6
0
5
10
15
20
25
30
35
40
BITS
PHRASE LIST
24
Phrase list efficiency
#RSAC
30.7
26.8 24.2
0
33.7 31.8 31.1
35.4 33
0 0
36.6 34.6
47 47 47
65.8 65.8 65.8 65.8
75.2 75.2
94 94 94 94
0
10
20
30
40
50
60
70
80
90
100
BITS
PHRASE LIST
Estimated entropy Brute-force
25
Brute force comparison
#RSAC
LinkedIn cracking
Phrase length Cracked hashes
Target scope
(estimated total no. of
lowercase passwords)
10 18 269 (11%) 168 000
14 1 882 (10%) 18 300
16 612 (10%) 6 100
20 6 (9%) 68
Leaked hashes from LinkedIn 2012
Using the previously generated phrase lists
This was not the main objective of this work, so lists are not optimized
Estimates of the target scope (number ofcrackable passwords) of each length, based on passwords statistics on already cracked
passwords
(www.adeptus-mechanicus.com)
The target scope below still includes many words beyond the original target scope of this work (non-linguistically correct phrases)
26
#RSAC
Conclusions
Good results compared to brute-force
Efficient – and can be further improved
Lack of alternative publicly available methods
Cracks about 10% of really long passwords
Time-memory trade-off
If a password policy is based on phrases, it should at least also require
Phrases longer than 20 characters
Characters from all character sets (alphanumeric + special characters)
27
#RSAC
Future work & more info
Future work
Optimization of performance of the implementation
Include upper case, numbers and special characters
Etc.
More info
www.simovits.com
peder.sparell@simovits.com
28
#RSAC
“Apply” slide
29
After this session we hope that you will know that passwords as
a mean of authentication is more or less obsolete.
It is possible to implement linguistic password cracking for long
phrases and achieve viable results in a short time.
You know where to begin if you want to carry this work further:
Source code available on Github:
https://github.com/Sparell/Phraser
#RSAC
Thanks
peder.sparell@simovits.com
mikael@simovits.com

Más contenido relacionado

La actualidad más candente

Ariu - Ph.D. Defense Slides
Ariu - Ph.D. Defense SlidesAriu - Ph.D. Defense Slides
Ariu - Ph.D. Defense Slides
Pluribus One
 

La actualidad más candente (17)

PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
 
7 cryptography
7 cryptography7 cryptography
7 cryptography
 
Malicious Domain Profiling
Malicious Domain Profiling Malicious Domain Profiling
Malicious Domain Profiling
 
The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017
 
Aws r
Aws rAws r
Aws r
 
Hash
HashHash
Hash
 
Summarization and Abstraction using deep learning
Summarization and Abstraction using deep learningSummarization and Abstraction using deep learning
Summarization and Abstraction using deep learning
 
Ariu - Ph.D. Defense Slides
Ariu - Ph.D. Defense SlidesAriu - Ph.D. Defense Slides
Ariu - Ph.D. Defense Slides
 
Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC technique
 
Rbootcamp Day 5
Rbootcamp Day 5Rbootcamp Day 5
Rbootcamp Day 5
 
Network based file carving
Network based file carvingNetwork based file carving
Network based file carving
 
Digital signatures
Digital signaturesDigital signatures
Digital signatures
 
A Modified Technique For Performing Data Encryption & Data Decryption
A Modified Technique For Performing Data Encryption & Data DecryptionA Modified Technique For Performing Data Encryption & Data Decryption
A Modified Technique For Performing Data Encryption & Data Decryption
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Угадываем пароль за минуту
Угадываем пароль за минутуУгадываем пароль за минуту
Угадываем пароль за минуту
 
Getting the Most out of Transition-based Dependency Parsing
Getting the Most out of Transition-based Dependency ParsingGetting the Most out of Transition-based Dependency Parsing
Getting the Most out of Transition-based Dependency Parsing
 
Hash crypto
Hash cryptoHash crypto
Hash crypto
 

Similar a Linguistic Passphrase Cracking

NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 
Reducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code AnalysisReducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code Analysis
Sebastiano Panichella
 
Compiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_VanamaCompiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_Vanama
Srikanth Vanama
 

Similar a Linguistic Passphrase Cracking (20)

Lenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisLenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesis
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 
IJETAE_1013_119
IJETAE_1013_119IJETAE_1013_119
IJETAE_1013_119
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
 
Reducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code AnalysisReducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code Analysis
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
NLP - Prédictions de tags sur les questions Stackoverflow
NLP - Prédictions de tags sur les questions StackoverflowNLP - Prédictions de tags sur les questions Stackoverflow
NLP - Prédictions de tags sur les questions Stackoverflow
 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering Techniques
 
Compiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_VanamaCompiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_Vanama
 
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
 

Más de Priyanka Aash

Más de Priyanka Aash (20)

Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
 
Verizon Breach Investigation Report (VBIR).pdf
Verizon Breach Investigation Report (VBIR).pdfVerizon Breach Investigation Report (VBIR).pdf
Verizon Breach Investigation Report (VBIR).pdf
 
Top 10 Security Risks .pptx.pdf
Top 10 Security Risks .pptx.pdfTop 10 Security Risks .pptx.pdf
Top 10 Security Risks .pptx.pdf
 
Simplifying data privacy and protection.pdf
Simplifying data privacy and protection.pdfSimplifying data privacy and protection.pdf
Simplifying data privacy and protection.pdf
 
Generative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdfGenerative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdf
 
EVERY ATTACK INVOLVES EXPLOITATION OF A WEAKNESS.pdf
EVERY ATTACK INVOLVES EXPLOITATION OF A WEAKNESS.pdfEVERY ATTACK INVOLVES EXPLOITATION OF A WEAKNESS.pdf
EVERY ATTACK INVOLVES EXPLOITATION OF A WEAKNESS.pdf
 
DPDP Act 2023.pdf
DPDP Act 2023.pdfDPDP Act 2023.pdf
DPDP Act 2023.pdf
 
Cyber Truths_Are you Prepared version 1.1.pptx.pdf
Cyber Truths_Are you Prepared version 1.1.pptx.pdfCyber Truths_Are you Prepared version 1.1.pptx.pdf
Cyber Truths_Are you Prepared version 1.1.pptx.pdf
 
Cyber Crisis Management.pdf
Cyber Crisis Management.pdfCyber Crisis Management.pdf
Cyber Crisis Management.pdf
 
CISOPlatform journey.pptx.pdf
CISOPlatform journey.pptx.pdfCISOPlatform journey.pptx.pdf
CISOPlatform journey.pptx.pdf
 
Chennai Chapter.pptx.pdf
Chennai Chapter.pptx.pdfChennai Chapter.pptx.pdf
Chennai Chapter.pptx.pdf
 
Cloud attack vectors_Moshe.pdf
Cloud attack vectors_Moshe.pdfCloud attack vectors_Moshe.pdf
Cloud attack vectors_Moshe.pdf
 
Stories From The Web 3 Battlefield
Stories From The Web 3 BattlefieldStories From The Web 3 Battlefield
Stories From The Web 3 Battlefield
 
Lessons Learned From Ransomware Attacks
Lessons Learned From Ransomware AttacksLessons Learned From Ransomware Attacks
Lessons Learned From Ransomware Attacks
 
Emerging New Threats And Top CISO Priorities In 2022 (Chennai)
Emerging New Threats And Top CISO Priorities In 2022 (Chennai)Emerging New Threats And Top CISO Priorities In 2022 (Chennai)
Emerging New Threats And Top CISO Priorities In 2022 (Chennai)
 
Emerging New Threats And Top CISO Priorities In 2022 (Mumbai)
Emerging New Threats And Top CISO Priorities In 2022 (Mumbai)Emerging New Threats And Top CISO Priorities In 2022 (Mumbai)
Emerging New Threats And Top CISO Priorities In 2022 (Mumbai)
 
Emerging New Threats And Top CISO Priorities in 2022 (Bangalore)
Emerging New Threats And Top CISO Priorities in 2022 (Bangalore)Emerging New Threats And Top CISO Priorities in 2022 (Bangalore)
Emerging New Threats And Top CISO Priorities in 2022 (Bangalore)
 
Cloud Security: Limitations of Cloud Security Groups and Flow Logs
Cloud Security: Limitations of Cloud Security Groups and Flow LogsCloud Security: Limitations of Cloud Security Groups and Flow Logs
Cloud Security: Limitations of Cloud Security Groups and Flow Logs
 
Cyber Security Governance
Cyber Security GovernanceCyber Security Governance
Cyber Security Governance
 
Ethical Hacking
Ethical HackingEthical Hacking
Ethical Hacking
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Linguistic Passphrase Cracking

  • 1. SESSION ID: #RSAC Mikael Simovits Linguistic Passphrase Cracking HT-R03 M.Sc. CISSP Simovits Consulting (CEO) mikael@simovits.com Peder Sparell M.Sc. GPEN CHFI Simovits Consulting peder.sparell@simovits.com
  • 3. #RSAC Background Standards requirements/recommendations on longand complex/random passwords (e.g. within ISO27000 series and NIST Electronic Authentication Guideline) Increasing use of phrases as passwords Publicly available cracking software are not very effective on longer passwords or phrases > 2 words Languages are relatively predictable – not random Linguistically correct phrases- more effective cracking? How can language be modelled to generate/crack such phrases? Is it advisable to base a password policy on phrases? 3
  • 4. #RSAC Passwords – search scope/complexity 4 Password length Lowercase Lower + uppercase Alfanumerical Alfanumerical + special chars 1 (base) 26 52 62 95 6 228,2 (=3,1*108) 234,2 (=2,0*1010) 235,7 (=5,7*1010) 239,4 (=7,4*1011) 8 237,6 (=2,1*1011) 245,6 (=5,3*1013) 247,6 (=2,2*1014) 252,6 (=6,6*1015) 10 247,0 (=1,4*1014) 257,0 (=1,4*1017) 259,5 (=8,4*1017) 265,7 (=6,0*1019) 14 265,8 (=6,5*1019) 279,8 (=1,1*1024) 283,4 (=1,2*1025) 292,0 (=4,9*1027) 16 275,2 (=4,3*1022) 291,2 (=2,9*1027) 295,3 (=4,8*1028) 2105 (=4,4*1031) 20 294,0 (=2,0*1028) 2114 (=2,1*1034) 2119 (=7,0*1035) 2131 (=3,6*1039) Number of possible combinations: 𝑐𝑐 = 𝑏𝑏𝑏𝑏 Sometimes written as: 2𝑙𝑙𝑙𝑙 𝑙𝑙2(𝑐𝑐)
  • 5. #RSAC This works delimitations on passphrases 2 or more words (usually >=3) Assembled to a string without white spaces Lower case Examples: The king shall rule -> thekingshallrule My brother rocks -> mybrotherrocks 5
  • 6. #RSAC Implementation Objectives and entropy calculations Results Background
  • 7. #RSAC Markov chains A Markov process is a process where transitions to other states are determined by probability distribution based on the current state A Markov chain is the resulting sequence of states from a Markov process 7 A C B 0,1 0,6 1,0 0,6 0,4 0,3
  • 8. #RSAC Markov chains of order m A Markov chain of order m redefines the states to include a ’history’ of m states 8 State C State B State A AB CA BC 0,5 1,0 1,0 0,3 0,2 BBBA 0,1 1,0 0,9
  • 9. #RSAC n-grams n-grams are used for language modelling, using Markov chains of order n-1 n-grams are sequences containing n elements The elements could be characters or words Statistics for the probability distribution is generated from counting number of occurrences of n-grams in large texts Example: Statistics of 3-grams shows, for example, if character ’e’ is more likely than character ’x’ to follow a current state of ’th’ 9
  • 10. #RSAC Implementation - Overview Phrase generation Text file Cracker (e.g. HashCat) stdout 10
  • 11. #RSAC Implementation - Phrase generation n-gram extraction Here the n- gram statistics are created Data out Phrases, one per line Phrase generation Here the phrases are generated using a Markov process n-gram data Text file containing statistics on number of occurrences of n- grams in the source text Text source Large text file in plain text. Ex: e- book, corpus 11
  • 12. #RSAC Phase 1: n-gram extraction At start-up, a text file to analyse is chosen, as well as the desired order and level of n-grams Uses regular expressions Punctuation marks (.,!?:) are replaced by a single dot (.) to represent sentence breaks All characters are changed to lower case Result: n-grams are saved to a text file together with their number of occurrences Examples (char/word): 12 … arw 3 ary 137 as. 24 as_ 2382 asa 48 asb 7 asc 42 asd 7 ase 207 … … the shelbyville runner 1 the sheldon penny 1 the shelf . 10 the shelf adaptors 1 the shelf and 6 the shelf are 1 the shelf at 3 …
  • 13. #RSAC Phase 2: Phrase generation Interval of desired phrase lengths can be set Number of desired words can be set The desired n-gram file is read and loaded to memory as in the example in the picture Starting states (starting with ’.’) are loaded into a separate list A threshold value decides if n-gram should be loaded to the list or ignored From every starting state the phrases are built up char by char (or word by word), by recursively going through the possible state transitions and adding the new chars (or words) to the current phrase until it is long enough _th _ti 6081 e 1727 a 727 i 206 o 235 r 22 u 155 m 16 n 10 d 4 c 8 e 3 g … … 13
  • 14. #RSAC Objectives and entropy calculations Results Background Implementation
  • 15. #RSAC Entropy of the English language Different estimation suggestions: 1,75 bits/char 1,6 bits/char NIST (National Institute of Standards & Technology) suggests: First char: 4 bits Char 2-8: 2 bits Char 9-20: 1,5 bits Char 21 and subsequent: 1 bit Example: A 16 character phrase of the English language has (1*4)+(7*2)+(8*1,5) = 30 bits of entropy 15
  • 16. #RSAC Comparison entropy calculations For every generated list of phrases we define: Target entropy: 𝐻𝐻𝑡𝑡 is calculated according to NIST:s interpretation of the entropy of the English language Potential entropy: 𝐻𝐻𝑝𝑝 = 𝑙𝑙𝑙𝑙 𝑙𝑙2 𝑋𝑋 X is the number of phrases in the list Estimated entropy: Efficiency: 𝑒𝑒 = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝐻𝐻𝑒𝑒𝑒𝑒𝑒𝑒 = 𝑙𝑙𝑙𝑙 𝑙𝑙2 𝑋𝑋 𝑒𝑒 = 𝑙𝑙𝑙𝑙 𝑙𝑙2 𝑋𝑋 − 𝑙𝑙𝑙𝑙 𝑙𝑙2 𝑒𝑒 The objective is that the “potential” and especially the “estimated” entropy values are as close as possible to the “target” entropy. 16
  • 17. #RSAC Some case Illustrations 17 Enough number of phrases in list. About half of them are not valid phrases. Potential entropy is close to Target entropy, Estimated entropy is higher. Small number of phrases in list. All of them are valid phrases, but they do not cover the language. Potential entropy lower than Target entropy, Estimated entropy higher. Large number of phrases in list. Covers the whole language, but many non valid phrases. Potential and Estimated entropy high.
  • 19. #RSAC Testing variables Test sample basis contained of 66 hashes from passwords of length 10-20 n-gram statistics were generated from 3 different source texts English news sites (3 million sentences) English Wikipedia (1 million sentences) General web sites/blogs etc. (1 million sentences) The input data values used for each list can be derived from the selected file names. Example: ’L20W6T5N3WNews’ length of the phrases in the list/file is 20 (L20) phrases consists of exactly 6 words (W6) the threshold value is set to 5 (T5) n-grams of order 3 has been used (N3) n-grams on the word level has been used (C=char, W=words) text source is the one from news sites (Wiki, Web or News) 19
  • 20. #RSAC Phrase List Time generation Time HashCat Cracked hashes Possible outcomes (English) Phrases Generated Target entropy Potential entropy Estimated entropy L10T100N5CNews 4.2 h 46 s 7/15 221 (2.1 mil.) 229.6 (825 mil.) 21 29.6 30.7 L10T0N3WNews 1.5 h 1 s 2/15 223.9 (15.2 mil.) 23.9 26.8 L10T1N3WNews 16 min 0 s 2/15 221.3 (2.6 mil.) 21.3 24.2 20 Results details – Phrase length 10
  • 21. #RSAC Phrase List Time generation Time HashCat Cracked hashes Possible outcomes (English) Phrases Generated Target entropy Potential Entropy Estimated entropy L14T3000N5CNews 1.7 h 6 s 0/15 227 (134 mil.) 226.4 (90 mil.) 27 26.4 - L14T1N8CWiki 23.0 h 4 min 32s 2/15 230.8 (1865 mil.) 30.8 33.7 L14T1N3WNews 20.2 h 31 s 2/15 228.9 (505 mil.) 28.9 31.8 L14W-5T0N3WNews 24.0 h 45 s 2/15 228.2 (312 mil.) 28.2 31.1 21 Results details – Phrase length 14
  • 22. #RSAC Phrase List Time generation Time HashCat Cracked hashes Possible outcomes (English) Phrases Generated Target entropy Potential Entropy Estimated entropy L16W4T5N8CWiki 7.3 h 31 s 1 228.8 (479 mil.) L16W5-6T5N8CWiki 34.7 h 3 min 37 s 0 230.3 (1 355 mil.) Total, group 1 42 h 4 min 8 s 1/24 <230 (<1 100 mil.) 230.8 (1 834 mil.) <30 30.8 35.4 L16W4T0N3WNews 9.2 h 6 s 4 226.0 (67.5 mil.) L16W5T0N3WNews 79.5 h 10min 16 s 1 229.7 (887 mil.) L16W6T1N3WNews 58.7 h 1 min 32 s 0 229.7 (851 mil.) Total, group 2 147.4 h 11min 54 s 5/24 <230 (<1 100 mil.) 230.7 (1 806 mil.) <30 30.7 33.0 L16W5T0N3WWeb 16.6 h 13 s 1 227.6 (199 mil.) 22 Results details – Phrase length 16
  • 23. #RSAC Phrase List Time generation Time HashCat Cracked hashes Possible outcomes (English) Phrases Generated Target entropy Potential Entropy Estimated entropy L20W6T40N8CWiki 12.8 h 18 s 0/12 <236 (<69 000 mil.) 227.9 (244 mil.) <36 27.9 - L20W6T1N3WWeb 52.2 h 2 min 14 s 0/12 229.4 (727 mil.) 29.4 - L20W6T0N3WWeb 960 h 29 min 9 s 1/12 233.0 (8 500 mil.) 33.0 36.6 L20W6T1N3WNews 550.5 h 39 min 3 s 1/12 231.0 (2 131 mil.) 31.0 34.6 23 Results details – Phrase length 20
  • 24. #RSAC L10T100N5C News L10T0N3WN ews L10T1N3WN ews L14T3000N5 CNews L14T1N8CWi ki L14T1N3WN ews L14W- 5T0N3WNew s L16W4- 6T5N8CWiki (Group 1) L16W4-6T0- 1N3WNews (Group 2) L20W6T40N 8CWiki L20W6T1N3 WWeb L20W6T0N3 WWeb L20W6T1N3 WNews Target entropy 21 21 21 27 27 27 27 30 30 36 36 36 36 Potential entropy 29.6 23.9 21.3 26.4 30.8 28.9 28.2 30.8 30.7 27.9 29.4 33 31 Estimated entropy 30.7 26.8 24.2 0 33.7 31.8 31.1 35.4 33 0 0 36.6 34.6 0 5 10 15 20 25 30 35 40 BITS PHRASE LIST 24 Phrase list efficiency
  • 25. #RSAC 30.7 26.8 24.2 0 33.7 31.8 31.1 35.4 33 0 0 36.6 34.6 47 47 47 65.8 65.8 65.8 65.8 75.2 75.2 94 94 94 94 0 10 20 30 40 50 60 70 80 90 100 BITS PHRASE LIST Estimated entropy Brute-force 25 Brute force comparison
  • 26. #RSAC LinkedIn cracking Phrase length Cracked hashes Target scope (estimated total no. of lowercase passwords) 10 18 269 (11%) 168 000 14 1 882 (10%) 18 300 16 612 (10%) 6 100 20 6 (9%) 68 Leaked hashes from LinkedIn 2012 Using the previously generated phrase lists This was not the main objective of this work, so lists are not optimized Estimates of the target scope (number ofcrackable passwords) of each length, based on passwords statistics on already cracked passwords (www.adeptus-mechanicus.com) The target scope below still includes many words beyond the original target scope of this work (non-linguistically correct phrases) 26
  • 27. #RSAC Conclusions Good results compared to brute-force Efficient – and can be further improved Lack of alternative publicly available methods Cracks about 10% of really long passwords Time-memory trade-off If a password policy is based on phrases, it should at least also require Phrases longer than 20 characters Characters from all character sets (alphanumeric + special characters) 27
  • 28. #RSAC Future work & more info Future work Optimization of performance of the implementation Include upper case, numbers and special characters Etc. More info www.simovits.com peder.sparell@simovits.com 28
  • 29. #RSAC “Apply” slide 29 After this session we hope that you will know that passwords as a mean of authentication is more or less obsolete. It is possible to implement linguistic password cracking for long phrases and achieve viable results in a short time. You know where to begin if you want to carry this work further: Source code available on Github: https://github.com/Sparell/Phraser