Netflix is big and dynamic. At Netflix, IP addresses mean nothing in the cloud. This is a big challenge with Amazon VPC Flow Logs. VPC Flow Log entries only present network-level information (L3 and L4), which is virtually meaningless. Our goal is to map each IP address back to an application, at scale, to derive true network-level insight within Amazon VPC. In this session, the Cloud Network Engineering team discusses the temporal nature of IP address utilization in AWS and the problem with looking at OSI Layer 3 and Layer 4 information in the cloud.
23. “Hi there, can someone help me resolve a
network connectivity issue between one
microservice to another?”
- Sr. Platform Engineer
“Does anyone know if there are any
network weather events in us-east-1?
We’ve seen a couple hosts run into
network partitions.”
- Sr. Database Engineer
“I'm thinking this might be due to
networking unpleasantness...”
- Sr. Edge Engineer
“I am seeing what seem to be network
related errors on start-up.”
- Stunning Colleague #1
29. EC2 instance
Foo
EC2 instance
Foo
Auto Scaling group
EC2 instance
Bar
EC2 instance
Bar
Auto Scaling group
EC2 instance
Baz
EC2 instance
Baz
Auto Scaling group
Classic Load
Balancer
Lambda
Function
RDS DB
instance
Application
Load Balancer
ElastiCache Redis
Instance
30. 172.31.16.139 172.31.16.21
Foo Foo
Auto Scaling group
172.31.16.54
Bar
172.31.16.248
Bar
Auto Scaling group
172.31.61.95
Baz
172.16.31.10
Baz
Auto Scaling group
172.31.16.22 172.31.16.19 172.31.16.60172.31.16.133172.31.16.231
53. instance
2 TCP Connections
Classic Load
Balancer
instance
, 12 VPC Flow Log Records
VPC NAT
Gateway
IP Addresses Mean
Nothing
Stateless
Challenges
Fragmented////
54. instance
What I care about
instance
IP Addresses Mean
Nothing
Stateless
Challenges
Fragmented////
55. VPC Flow
Logs
We have a lot of Flow Logs
IP Addresses Mean Nothing
Stateless
Fragmented
Challenges
56. 1,000,000+
Requests Per Second
4 AWS Regions 75+ of accounts
150,000+
EC2 Instances
IP Addresses Mean
Nothing
Stateless
Challenges
We have a lot of Flow LogsFragmented //////
59. What app had these IPs, at this
time, in this routing domain?
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
60. f(domain, ip, time) = app
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
69. IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
Dredge
Amazon VPC Flow Logs
(via Kinesis )
IP Change Events
(Sonar)
Stream Joins
Netflix
Data Pipeline
70. VPC Flow Logs (via Amazon Kinesis)
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
71. Stream Joins
2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
ACCEPT
72. f(domain, ip, time) = app
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
IPv4 Addresses TimestampRouting Domain
2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT
Stream Joins
73. f(0, 172.31.16.139, 1418530010) =
f(0, 172.31.16.21, 1418530010) =
foo
bar
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
IPv4 Addresses TimestampRouting Domain
2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT
Stream Joins
74. 172.31.16.139:20641 = Not Listening Outbound Request
f(0, 172.31.16.139, 1418530010) =
f(0, 172.31.16.21, 1418530010) =
foo
bar
=
IP Addresses Mean
Nothing
Stateless
Solutions
We have a lot of Flow LogsFragmented //////
IPv4 Addresses TimestampRouting Domain
2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT
Stream Joins